To see the other types of publications on this topic, follow the link: Multicollinearity.

Dissertations / Theses on the topic 'Multicollinearity'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 48 dissertations / theses for your research on the topic 'Multicollinearity.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Clark, Patrick Carl Jr. "The Effects of Multicollinearity in Multilevel Models." Wright State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=wright1375956788.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Duxbury, Scott W. "Diagnosing Multicollinearity in Exponential Random Graph Models." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1491393848069144.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Gou, Zhenkun. "Canonical correlation analysis and artificial neural networks." Thesis, University of the West of Scotland, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.269409.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Månsson, Kristofer. "Issues of multicollinearity and conditional heteroscedasticy in time series econometrics." Doctoral thesis, Internationella Handelshögskolan, Högskolan i Jönköping, IHH, Statistik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-31977.

Full text
Abstract:
This doctoral thesis consists of four chapters all related to the field of time series econometrics. The main contribution is firstly the development of robust methods when testing for Granger causality in the presence of generalized autoregressive conditional heteroscedasticity (GARCH) and causality-in-variance (i.e. spillover) effects. The second contribution is the development of different shrinkage estimators for count data models which may be used when the explanatory variables are highly inter-correlated. The first essay investigated the effect of spillover on some tests for causality in a Granger sense. As a remedy to the problem of over-rejection caused by the spillover effects White’s heteroscedasticity consistent covariance matrix is proposed. In the second essay the effect of GARCH errors on the statistical tests for Granger causality is investigated. Here some wavelet denoising methods are proposed and by means of Monte Carlo simulations it is shown that the size properties of the tests based on wavelet filtered data is better than the ones based on raw data. In the third and fourth essays ridge regression estimators for the Poisson and negative binomial (NB) regression models are investigated respectively. Then finally in the fifth essaya Liu type of estimator is proposed for the NB regression model. By using Monte Carlo simulations it is shown that the estimated MSE is lower for the ridge and Liu type of estimators than maximum likelihood (ML).
APA, Harvard, Vancouver, ISO, and other styles
5

Moineddin, Rahim. "Comments on Mallow's C¦p statistics and multicollinearity effects on predictions." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ58663.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Bakshi, Girish. "Comparison of ridge regression and neural networks in modeling multicollinear data." Ohio : Ohio University, 1996. http://www.ohiolink.edu/etd/view.cgi?ohiou1178815205.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Albarracin, Orlando Yesid Esparza. "Generalized autoregressive and moving average models: control charts, multicollinearity, and a new modified model." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-21112017-184544/.

Full text
Abstract:
Recently, in the health surveillance area, control charts have been proposed to decide if the morbidity or mortality of a specific disease reached an epidemic level. This thesis is composed by 3 papers. In the first two papers, CUSUM and EWMA control charts were proposed to monitor count time series with seasonal and trend effects using the Generalized Autoregressive and Moving Average models (GARMA), instead of the independent Generalized Linear Model (GLM) as it is usually used in practice. Different statistics based on transformations, for variables that follow a Negative Binomial distribution, were used in these control charts. In the second paper, two new statistics were proposed based on the ratio of log-likelihood function. Different scenarios describing disease profiles were considered to evaluate the effect of omission of serial correlation in EWMA and CUSUM control charts. The performance of CUSUM and EWMA charts when the serial correlation is neglected in the regression model was measure in terms of average run length (ARL). In summary, when the autocorrelation is neglected, fitting a pure GLM instead of a GARMA model will lead to an increase of false alarms. However, no statistics among the tested ones seem to be robust, in a sense to produce the smallest increase of false alarms in all scenarios. In general, all monitored statistics presented a smaller ARL_0 for higher values of autocorrelation. \\\\ In the last paper, the GARMA models (p, q) with p and q simultaneously different from zero were studied since that two features were observed in practice. One is the multicollinearity, which may lead to a non-convergence of the maximum likelihood, using iteratively reweighted least squares. The second is the inclusion of the same lagged observations into the autoregressive and moving average components confounding the interpretation of the parameters. In a general sense, simulation studies show that the modified model provide estimators closer to the parameters and offer confidence intervals with higher coverage percentage than obtained with the GARMA model, but some restrictions in the parametric space are imposed to guarantee the stationarity of the process. Also, a real data analysis illustrate the GARMA-M fit for daily hospilatization rates of elderly people due to respiratory diseases from October 2012 to April 2015 in São Paulo city, Brazil.
Recentemente, no campo da saúde, gráficos de controle têm sido propostos para monitorar a morbidade ou a mortalidade decorrentes de doenças. Este trabalho está composto por três artigos. Nos dois primeiros artigos, gráficos de controle CUSUM e EWMA foram propostos para monitorar séries temporais de contagens com efeitos sazonais e de tendência usando os modelos Generalized autoregressive and moving average models (GARMA), em vez dos modelos lineares generalizados (GLM), como usualmente são utilizados na prática. Diferentes estatísticas baseadas em transformações, para variávies que seguem uma distribuição Binomial Negativa, foram usadas nestes gráficos de controle. No segundo artigo foram propostas duas novas estatísticas baseadas na razão da função de log-verossimilhança. Diferentes cenários que descrevem perfis de doenças foram considerados para avaliar o efeito da omissão da correlação serial nesses gráficos de controle. Este impacto foi medido em termos do Average Run Lenght (ARL). Notou-se que a negligência da correlação serial induz um aumento de falsos alarmes. Em geral, todas as estatísticas monitoradas apresentaram menores valores de ARL_0 para maiores valores de autocorrelação. No entanto, nenhuma estatística entre as consideradas mostrou ser mais robusta, no sentido de produzir o menor aumento de falsos alarmes nos cenários considerados. No último artigo, foram estudados os modelos GARMA (p, q) com p e q simultaneamente diferentes de zero, uma vez que duas características foram observadas na prática. A primeira é a presença de multicolinearidade, que induz à não-convergência do método de máxima verossimilhança usando mínimos quadrados ponderados reiterados. A segunda é a inclusão dos mesmos termos defasados nos componentes autorregressivos e de médias móveis. Um modelo modificado, GARMA-M, foi apresentado para lidar com a multicolinearidade e melhorar a interpretação dos parâmetros. Em sentido geral, estudos de simulação mostraram que o modelo modificado fornece estimativas mais próximas dos parâmetros e intervalos de confiança com uma cobertura percentual maior do que a obtida nos modelos GARMA. No entanto, algumas restrições no espaço paramétrico são impostas para garantir a estacionariedade do processo. Por último, uma análise de dados reais ilustra o ajuste do modelo GARMA-M para o número de internações diárias de idosos devido a doenças respiratórias de outubro de 2012 a abril de 2015 na cidade de São Paulo, Brasil.
APA, Harvard, Vancouver, ISO, and other styles
8

CROPPER, JOHN PHILIP. "TREE-RING RESPONSE FUNCTIONS. AN EVALUATION BY MEANS OF SIMULATIONS (DENDROCHRONOLOGY RIDGE REGRESSION, MULTICOLLINEARITY)." Diss., The University of Arizona, 1985. http://hdl.handle.net/10150/187946.

Full text
Abstract:
The problem of determining the response of tree ring width growth to monthly climate is examined in this study. The objective is to document which of the available regression methods are best suited to deciphering the complex link between tree growth variation and climate. Tree-ring response function analysis is used to determine which instrumental climatic variables are best associated with tree-ring width variability. Ideally such a determination would be accomplished, or verified, through detailed physiological monitoring of trees in their natural environment. A statistical approach is required because such biological studies on mature trees are currently too time consuming to perform. The use of lagged climatic data to duplicate a biological, rather than a calendar, year has resulted in an increase in the degree of intercorrelation (multicollinearity) of the independent climate variables. The presence of multicollinearity can greatly affect the sign and magnitude of estimated regression coefficients. Using series of known response, the effectiveness of five different regression methods were objectively assessed in this study. The results from each of the 2000 regressions were compared to the known regression weights and a measure of relative efficiency computed. The results indicate that ridge regression analysis is, on average, four times more efficient (average relative efficiency of 4.57) than unbiased multiple linear regression at producing good coefficient estimates. The results from principal components regression are slight improvements over those from multiple linear regression with an average relative efficiency of 1.45.
APA, Harvard, Vancouver, ISO, and other styles
9

Kuroki, Quispe André Francisco, and Taza Gianella Milagros Soto. "Factores que determinan el comportamiento del volumen de exportación de café peruano con partida 090111 según los años 1980 - 2017." Bachelor's thesis, Universidad Peruana de Ciencias Aplicadas (UPC), 2019. http://hdl.handle.net/10757/628233.

Full text
Abstract:
La presente tesis está enfocada en los factores que explican el volumen de exportación del café dentro del periodo de 1980 a 2017 en base al área de cultivo, el precio promedio y el rendimiento del café. El propósito de esta investigación es la elaboración de un modelo estadístico que permita a los productores del sector de café pronosticar sus volúmenes de exportación, nuestra metodología consiste en realizar una investigación cuantitativa, con un diseño concluyente no experimental y un alcance descriptivo correlacional. Los resultaron sacaron a relucir que el precio promedio no es una variable significativa que afecte al volumen de exportación, el área cultivada y el rendimiento son los factores primordiales que el productor debe cuidar para aumentar su volumen. El rendimiento del café es una variable muy sensible y en esencia su buen manejo lleva a aumentar significativamente el volumen del productor.
The present thesis is focused on the factors that explain the export volume of coffee in the period from 1980 to 2017 based on the area of cultivation, average price and coffee yield. The purpose of this research is the development of a statistical model that allows producers in the coffee sector to forecast their export volumes, our methodology is to conduct a quantitative research, with a conclusive non-experimental design and a correlational descriptive scope. The results showed that the average price is not a significant variable that affects the export volume, the cultivated area and the yield are the main factors that the producer must take care of to increase its volume. The yield of coffee is a very sensitive variable and in essence its good management leads to significantly increase the volume of the producer.
Tesis
APA, Harvard, Vancouver, ISO, and other styles
10

Gripencrantz, Sarah. "Evaluating the Use of Ridge Regression and Principal Components in Propensity Score Estimators under Multicollinearity." Thesis, Uppsala universitet, Statistiska institutionen, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-226924.

Full text
Abstract:
Multicollinearity can be present in the propensity score model when estimating average treatment effects (ATEs). In this thesis, logistic ridge regression (LRR) and principal components logistic regression (PCLR) are evaluated as an alternative to ML estimation of the propensity score model. ATE estimators based on weighting (IPW), matching and stratification are assessed in a Monte Carlo simulation study to evaluate LRR and PCLR. Further, an empirical example of using LRR and PCLR on real data under multicollinearity is provided. Results from the simulation study reveal that under multicollinearity and in small samples, the use of LRR reduces bias in the matching estimator, compared to ML. In large samples PCLR yields lowest bias, and typically was found to have the lowest MSE in all estimators. PCLR matched ML in bias under IPW estimation and in some cases had lower bias. The stratification estimator was heavily biased compared to matching and IPW but both bias and MSE improved as PCLR was applied, and for some cases under LRR. The specification with PCLR in the empirical example was usually most sensitive as a strongly correlated covariate was included in the propensity score model.
APA, Harvard, Vancouver, ISO, and other styles
11

Lee, Wonwoo. "Fractional principal components regression: a general approach to biased estimators." Diss., Virginia Polytechnic Institute and State University, 1986. http://hdl.handle.net/10919/49819.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Gatz, Philip L. Jr. "A comparison of three prediction based methods of choosing the ridge regression parameter k." Thesis, Virginia Tech, 1985. http://hdl.handle.net/10919/45724.

Full text
Abstract:
A solution to the regression model y = xβ+ε is usually obtained using ordinary least squares. However, when the condition of multicollinearity exists among the regressor variables, then many qualities of this solution deteriorate. The qualities include the variances, the length, the stability, and the prediction capabilities of the solution. An analysis called ridge regression introduced a solution to combat this deterioration (Hoerl and Kennard, 1970a). The method uses a solution biased by a parameter k. Many methods have been developed to determine an optimal value of k. This study chose to investigate three little used methods of determining k: the PRESS statistic, Mallows' Ck. statistic, and DF-trace. The study compared the prediction capabilities of the three methods using data that contained various levels of both collinearity and leverage. This was completed by using a Monte Carlo experiment.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
13

Pingel, Ronnie. "Some Aspects of Propensity Score-based Estimators for Causal Inference." Doctoral thesis, Uppsala universitet, Statistiska institutionen, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-229341.

Full text
Abstract:
This thesis consists of four papers that are related to commonly used propensity score-based estimators for average causal effects. The first paper starts with the observation that researchers often have access to data containing lots of covariates that are correlated. We therefore study the effect of correlation on the asymptotic variance of an inverse probability weighting and a matching estimator. Under the assumptions of normally distributed covariates, constant causal effect, and potential outcomes and a logit that are linear in the parameters we show that the correlation influences the asymptotic efficiency of the estimators differently, both with regard to direction and magnitude. Further, the strength of the confounding towards the outcome and the treatment plays an important role. The second paper extends the first paper in that the estimators are studied under the more realistic setting of using the estimated propensity score. We also relax several assumptions made in the first paper, and include the doubly robust estimator. Again, the results show that the correlation may increase or decrease the variances of the estimators, but we also observe that several aspects influence how correlation affects the variance of the estimators, such as the choice of estimator, the strength of the confounding towards the outcome and the treatment, and whether constant or non-constant causal effect is present. The third paper concerns estimation of the asymptotic variance of a propensity score matching estimator. Simulations show that large gains can be made for the mean squared error by properly selecting smoothing parameters of the variance estimator and that a residual-based local linear estimator may be a more efficient estimator for the asymptotic variance. The specification of the variance estimator is shown to be crucial when evaluating the effect of right heart catheterisation, i.e. we show either a negative effect on survival or no significant effect depending on the choice of smoothing parameters.   In the fourth paper, we provide an analytic expression for the covariance matrix of logistic regression with normally distributed regressors. This paper is related to the other papers in that logistic regression is commonly used to estimate the propensity score.
APA, Harvard, Vancouver, ISO, and other styles
14

Hansen, John A. "A comparison of parametric and nonparametric techniques used to estimate school district production functions analysis of model response to change in sample size and multicollinearity /." [Bloomington, Ind.] : Indiana University, 2008. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3324516.

Full text
Abstract:
Thesis (Ph.D.)--Indiana University, School of Education, 2008.
Title from PDF t.p. (viewed on May 12, 2009). Source: Dissertation Abstracts International, Volume: 69-08, Section: A, page: 3030. Adviser: Daniel Mueller.
APA, Harvard, Vancouver, ISO, and other styles
15

Williams, Ulyana P. "On Some Ridge Regression Estimators for Logistic Regression Models." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3667.

Full text
Abstract:
The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of independent variables, and log odds ratio has been varied in the design of experiment. Simulation results show that under certain conditions, the ridge regression estimators outperform the maximum likelihood estimator. Moreover, an empirical data analysis supports the main findings of this study. This thesis proposed and recommended some good ridge regression estimators of the logistic regression model for the practitioners in the field of health, physical and social sciences.
APA, Harvard, Vancouver, ISO, and other styles
16

Ndiritu, Gachiri Charles. "An Application of Multiple Regression in Exchange Rate Arrangements." Thesis, University of the Western Cape, 2008. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_1863_1263418792.

Full text
Abstract:

This project "
An application of multiple regression in exchange rate arrangement"
focused on the processes followed by different countries when choosing an exchange rate regime for currency stabilization. It analyses the consequences faced by emerging markets as a result of changes in volatility of developed countries&rsquo
currencies (American Dollar, Japanese Yen, EURO, British Pound and the Canadian Dollar).

APA, Harvard, Vancouver, ISO, and other styles
17

Atems, Bebonchu. "Essays in nonlinear macroeconomic modeling and econometrics." Diss., Kansas State University, 2011. http://hdl.handle.net/2097/11985.

Full text
Abstract:
Doctor of Philosophy
Department of Economics
Lance J. Bachmeier
This dissertation consists of three essays in nonlinear macroeconomic modeling and econometrics. In the first essay, we decompose oil price movements into oil demand (stock market) shocks and oil supply (oil-market) shocks, and examine the response of the stock market to these shocks. We find that when oil prices are “net-increasing”, a stock market shock that causes the S&P 500 to rise by one percentage point will cause the price of oil to rise approximately 0.2 percentage points, with a statistically significant positive effect one day after the stock market shock. On the other hand, the response of the stock market to an oil market shock is a decline of 6.8 percent when the price of oil doubles. For other days, the initial response of the oil market to a stock market shock is the same as in the net oil price increase case (by construction). We then analyze the response of monetary policy to the identified stock market and oil market shocks and find that short-term interest rates respond to the stock market shocks but not the oil market shocks. Finally, we evaluate the predictive power of the decomposed stock market and oil shocks relative to the change in the price of oil. We find statistically significant gains in both the in-sample fit and out-of-sample forecast accuracy when using the identified stock market and oil market shocks rather than the change in the price of oil. The second essay revisits the statistical specification of near-multicollinearity in the logistic regression model using the Probabilistic Reduction approach. We argue that the ceteris paribus clause invoked with near-multicollinearity is rather misleading. This assumption states that one can assess the impact of near-multicollinearity by holding the parameters of the logistic regression model constant, while examining the impact on their standard errors and t-ratios as the correlation (\rho) between the regressors increases. Using the Probabilistic Reduction approach, we derive the parameters (and related statisitics) of the logistic regression model and show that they are functions of \rho , indicating the ceteris paribus clause in the traditional account of near multicollinearity is unattainable. Monte carlo simulations in the paper confirm these findings. We also show that traditional near-multicollinearity diagnostics, such as the variance inflation factor and condition number can fail to detect near-multicollinearity. Overall, the paper finds that near-multicollinearity in the logistic model is highly variable and may not lead to the problems indicated by the traditional account. Therefore, unexpected, unreliable or unstable estimates and inferences should not be blamed on near-multicollinearity. Rather the modeler should return to economic theory or statistical respecification of their model to address these problems. The third essay examines the correlations between income inequality and economic growth using a panel of income distribution data for 3,109 counties of the U.S. We examine the non-spatial dynamic correlations between county inequality and growth using a System GMM approach, and find significant negative relationships between changes in inequality in one period and growth in the subsequent period. We show that this finding is robust across different sample sizes. We further argue that because the space-specific time-invariant variables that affect economic growth and inequality can differ significantly across counties, failure to incorporate spatial effects into a model of growth and inequality may lead to biased results.We assume that dependence among counties only arises from the disturbance process, hence the estimation of a spatial error model. Our results indicate that the bias in the parameter for inequality amounts to about 2.66 percent, while that for initial income amounts to about 21.51 percent.
APA, Harvard, Vancouver, ISO, and other styles
18

Nakamura, Karina Gernhardt. "Multicolinearidade em modelos de regressão logística." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-28052013-222241/.

Full text
Abstract:
Neste trabalho estudamos os efeitos da multicolinearidade em modelos de regressão logística e apresentamos estimadores viesados para que tais efeitos fossem minimizados. Primeiramente, o modelo de regressão logística e o processo para a estimação dos parâmetros foram apresentados. Foram feitos, também, alguns testes para avaliar a significância dos mesmos, bem como técnicas para analisar a qualidade do ajuste do modelo. Em seguida, os efeitos da multicolinearidade na estimação dos parâmetros e na sua inferência foram avaliados, bem como técnicas para o seu diagnóstico. Para amenizar o efeito deste problema, apresentamos dois estimadores alternativos ao de máxima verossimilhança: estimador em cristas e estimador em componentes principais. Comparamos, então, o desempenho dos três estimadores na forma de um estudo de simulação e de uma aplicação em um conjunto de dados reais. O principal resultado obtido foi que, na presença de multicolinearidade, os estimadores alternativos conseguiram um melhor ajuste em comparação ao de máxima verossimilhança, além de minimizar os seus efeitos.
This work proposes the use of some biased estimators to investigate whether is possible minimize the multicollinearity effects in logistic regression models. Initially, the latter model was presented, as well as its fitting process (therefore obtaining the maximum likelihood estimator), some tests to evaluate the significance of the parameters and techniques to analyze goodness of fit were also considered. Furthermore, the effects of multicollinearity in the fitting process and in the parameters inference were discussed, as well as techniques to identify the presence of multicollinearity. In order to diminish the effect of this problem, two alternative estimators were presented: ridge estimator and principal component estimator. Therefore, these three estimators performances were compared using a simulation study and applied in a real data set. The manly conclusion was that, in the presence of multicollinearity, the alternative estimators performed better than the maximum likelihood estimator, besides reducing its effects.
APA, Harvard, Vancouver, ISO, and other styles
19

Evani, Bhanu M. "WEIGHTED QUANTILE SUM REGRESSION FOR ANALYZING CORRELATED PREDICTORS ACTING THROUGH A MEDIATION PATHWAY ON A BIOLOGICAL OUTCOME." VCU Scholars Compass, 2017. http://scholarscompass.vcu.edu/etd/4760.

Full text
Abstract:
Abstract Weighted Quantile Sum Regression for Analyzing Correlated Predictors Acting Through a Mediation Pathway on a Biological Outcome By Bhanu M. Evani, Ph.D. A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University. Virginia Commonwealth University, 2017. Major Director: Robert A. Perera, Asst. Professor, Department of Biostatistics This work examines mediated effects of a set of correlated predictors using the recently developed Weighted Quantile Sum (WQS) regression method. Traditionally, mediation analysis has been conducted using the multiple regression method, first proposed by Baron and Kenny (1986), which has since been advanced by several authors like MacKinnon (2008). Mediation analysis of a highly correlated predictor set is challenging due to the condition of multicollinearity. Weighted Quantile Sum (WQS) regression can be used as an alternative method to analyze the mediated effects, when predictor correlations are high. As part of the WQS method, a weighted quartile sum index (WQSindex) is computed to represent the predictor set as an entity. The predictor variables in classic mediation are then replaced with the WQSindex, allowing for the estimation of the total indirect effect between all the predictors and the outcome. Predictors having a high relative importance in their association with the outcome can be identified by examining the empirical weights for the individual predictors estimated by the WQS regression method. Other constrained optimization methods (e.g. LASSO) focus on reducing dimensionality of the correlated predictors to reduce multicollinearity. WQS regression in the context of mediation is studied using Monte Carlo simulation for mediation models with two and three correlated predictors. WQS regression’s performance is compared to the classic OLS multiple regression and the regularized LASSO regression methods. An application of these three methods to the National Health and Nutrition Examination Survey (NHANES) dataset examines the effect of serum concentrations of Polychlorinated Biphenyls (independent variables) on the liver enzyme, alanine aminotransferase ALT (outcome), with chromosomal telomere length as a potential mediator. Keywords: Multicollinearity, Weighted Quantile Sum Regression, Mediation Analysis
APA, Harvard, Vancouver, ISO, and other styles
20

Carbonera, Roberto. "Atributos físicos e fisiológicos de sementes de aveia preta." Universidade Federal de Santa Maria, 2016. http://repositorio.ufsm.br/handle/1/3273.

Full text
Abstract:
Forage crops practice an important role in animal production in southern Brazil. Among the species, oat stands out due to its higher crop area in winter, occupying an area of 3.8 million hectares in the state of Rio Grande do Sul. For their proper planting and establishment, seeds are produced which must contain high standards of quality, which is measured by analytical laboratories. Given this, the present study aimed to evaluate the physical and physiological attributes of oat seeds, associating quality seeds to the production profile and the possible effects caused by meteorological factors. Was also designed to identify the variables that correlate with the percentage of pure seed and seedling emergence, identify the presence of multicollinearity, the most important variables in relation to the main dependent variable, percentage of normal seedlings, and group the samples for their degrees of similarity. 2,910 samples were assessed, 2,229 seed analysis derived from the seed production process, 357 analyzes for own use of seeds and 324 analyzes tetrazolium analyzed by the laboratory of the UNIJUÍ Agronomy Course seeds, following the methodology described in the rules seed analysis. The results were submitted to analysis of descriptive statistics, the dispersion of the data, the Pearson linear correlation coefficients were estimated, the diagnosis of multicollinearity, the direct and indirect effects through path analysis and grouping between samples. The seeds produced according to the national seed and seedling system showed excellent levels of physical and physiological quality in the years 2006 to 2010. Between 2011- 2014, 14 and 14.5% of the seeds have been compromised by the presence of other seed species cultivated and tolerated harmful, respectively. The proper use of seeds showed wide variability with 18.1 and 31.7% of samples below the standard for germination in the years 2006 to 2010 and 2011-2014, respectively, while the samples analyzed by the tetrazolium test showed disapproval levels 19.4 and 12.5%, respectively. It is noteworthy that the seed quality is related to the years with levels of rainfall and temperatures appropriate to the vegetative development, physiological maturity and harvest. The variable normal seedlings showed the highest correlation, a negative sign with dead seeds. Abnormal seedlings variables and dead seeds showed the highest direct effects on germination percentage, negative sign and cluster analysis revealed the existence of three similarity groups of seeds produced according to the national system of seed plants and four groups in seed own use.
As plantas forrageiras desempenham importante papel na produção animal na Região Sul do Brasil. Dentre as espécies, a aveia preta se destaca por apresentar maior área de cultivo no inverno, ocupando uma área de 3,8 milhões de hectares no Estado do Rio Grande do Sul. Para a sua adequada semeadura e estabelecimento, são produzidas sementes que devem conter elevados padrões de qualidade, que é aferida por laboratórios de análise. Frente a isso, a presente pesquisa teve como objetivos avaliar os atributos físicos e fisiológicos de sementes de aveia preta, associar a qualidade de sementes ao perfil de produção e aos possíveis efeitos provocados por fatores meteorológicos. Visou, ainda, identificar as variáveis que se correlacionam com a porcentagem de sementes puras e a emergência de plântulas, identificar a presença de multicolinearidade, as variáveis mais importantes em relação à variável dependente principal, porcentagem de plântulas normais, e agrupar a amostras por seus graus de parecença. Foram avaliadas 2.910 amostras, sendo 2.229 análises de sementes oriundas do processo de produção de sementes, 357 análises de sementes de uso próprio e 324 análises de tetrazólio analisadas pelo laboratório de análise de sementes do Curso de Agronomia da UNIJUÍ, seguindo a metodologia descrita nas regras de análise de sementes. Os resultados foram submetidos às análises de estatísticas descritivas, à dispersão dos dados, foram estimados os coeficientes de correlação linear de Pearson, o diagnóstico de multicolinearidade, os efeitos diretos e indiretos através da análise de trilha e o agrupamento entre as amostras. As sementes produzidas segundo o sistema nacional de sementes e mudas apresentaram excelentes níveis de qualidade física e fisiológica nos anos de 2006 a 2010. Entre os anos de 2011 a 2014, 14 e 14,5% das sementes foram comprometidas pelas presenças de outras sementes de espécies cultivadas e de nocivas toleradas, respectivamente. As sementes de uso próprio apresentaram ampla variabilidade com 18,1 e 31,7% de amostras abaixo dos padrão para germinação nos anos de 2006 a 2010 e 2011 a 2014, respectivamente, enquanto que as amostras analisadas pelo teste de tetrazólio apresentaram níveis de reprovação de 19,4 e 12,5 %, respectivamente. Destaca-se que a qualidade fisiológica das sementes está relacionada aos anos com níveis de precipitações e temperaturas adequadas ao desenvolvimento vegetativo, maturidade fisiológica e colheita. A variável plântulas normais apresentou maior correlação, de sinal negativo, com sementes mortas. As variáveis plântulas anormais e sementes mortas apresentaram os maiores efeitos diretos sobre porcentagem de germinação, de sinal negativo e a análise de agrupamento revelou a existência de três grupos de parecença em sementes produzidas segundo o sistema nacional de sementes mudas e de quatro grupos em sementes de uso próprio.
APA, Harvard, Vancouver, ISO, and other styles
21

Casarotto, Gabriele. "Relações lineares entre caracteres fenológicos, morfológicos e produtivos em milho." Universidade Federal de Santa Maria, 2013. http://repositorio.ufsm.br/handle/1/5084.

Full text
Abstract:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
This study aimed to verify the existence of linear relationships among phenological, morphological and productive characters of maize cultivars (Zea mays L.) of early and veryearly cycle and transgenic class and also to identify which characters have high correlation and direct effects on grain productivity. Six experiments were performed with early and veryearly and transgenic maize cultivars in the growing seasons 2009-2010 and 2010-2011, in the experimental area of the Department of Plant Science, of Federal University of Santa Maria. In the 2009-2010 harvest were evaluated 36 early cultivars, 22 veryearly and 18 transgenic and 2010-2011 harvest, 23 early, 9 veryearly and 27 transgenic. The experimental design was a randomized block design with three replications. The experimental unit consisted of two rows of five meters in length, spaced at 0,80 m. The seeding rate was adjusted to 62,500 plants ha-1. In each experimental unit it were tagged, randomized, three plants, and it were evaluated 15 characters of each one. The average of these three plants was the value of repetition. It were evaluated phenological (total number of leaves per plant (NFO), phyllochron estimated with the number of expanded leaves(FNFE), phyllochron estimated with the total number of leaves (FNTF) in ° C day leaf-1, the number of days of seeding until male flowering (FM) and number of days of seeding until female flowering (FF)), morphological (plant height (PH) and ear insertion height (AE), in cm) and productive (ear weight (PE), in g, number of kernel rows per ear (NFI), ear length (CE), in cm, ear diameter (DE), in mm, cob weight (PS), in g, cob diameter (DS), in mm, hundred kernel weight (MCG), in g, and grain productivity (PRO) in g ear-1) characters. Analysis of variance (ANOVA) was performed and the means of the cultivars were compared by Scott-Knott test at 5% probability. The linear correlation coefficients of Pearson among 15 evaluated characters were estimated for each experiment. For the path analysis, the PRO was considered the main character and the other characters were considered explanatory ones. It was accomplished multicollinearity diagnosis in the correlation matrix among the explanatory characters and the characters causing high degree of multicollinearity were eliminated. The direct and indirect effects on the PRO were estimated using path analysis and the verification of characters that influence PRO and their contribution in predicting the PRO were estimated by stepwise regression analysis. There are linear relationships among the phenological, morphological and productive characters maize plants. The characters PE and DE showed linear correlation coefficients of Pearson very strong (r≥0,97) and moderate to strong (0,55≤r≤0,78), respectively, with the PRO. In general, the character DE has high correlation and positive direct effects (0,6686 ≤ direct effect ≤ 1,1818) on the PRO. Allied to DE, the CE has a high positive contribution in predicting the PRO. Therefore, they can be used for indirect selection in maize breeding programs.
Este estudo teve como objetivos verificar a existência de relações lineares entre caracteres fenológicos, morfológicos e produtivos de cultivares de milho (Zea mays L.) de ciclos precoce e superprecoce e classe transgênica, e identificar quais caracteres possuem elevada correlação e efeitos diretos sobre a produtividade de grãos. Para isso, foram conduzidos seis experimentos com cultivares precoces, superprecoces e transgênicas de milho, nas safras agrícolas 2009-2010 e 2010-2011, na área experimental do Departamento de Fitotecnia da Universidade Federal de Santa Maria. Na safra 2009-2010 foram avaliadas 36 cultivares precoces, 22 superprecoces e 18 transgênicas e na safra 2010-2011, 23 precoces, 9 superprecoces e 27 transgênicas. Nos seis experimentos, o delineamento experimental foi de blocos casualizados, com três repetições. As unidades experimentais foram constituídas de duas filas de cinco metros de comprimento, espaçadas em 0,80m. A densidade de semeadura foi ajustada para 62.500 plantas ha-1. Em cada unidade experimental foram marcadas, aleatoriamente, três plantas, onde foram avaliados 15 caracteres. A média dessas três plantas constituiu o valor da repetição. Foram avaliados os caracteres fenológicos (número total de folhas por planta (NFO), filocrono estimado com número de expandidas (FNFE), filocrono estimado com o número total de folhas (FNTF), em °C dia folha-1, número de dias da semeadura até o florescimento masculino (FM) e número de dias da semeadura até o florescimento feminino (FF)), morfológicos (altura de planta (AP) e altura de inserção de espiga (AE), em cm) e produtivos (peso de espiga (PE), em g, número de fileiras de grãos por espiga (NFI), comprimento de espiga (CE), em cm, diâmetro de espiga (DE), em mm, peso de sabugo (PS), em g, diâmetro de sabugo (DS), em mm, massa de cem grãos (MCG), em g, e produtividade de grãos (PRO), em g espiga-1). Foi realizada análise de variância individual e as médias das cultivares foram comparadas por meio do teste de Scott-Knott, a 5% de probabilidade. Posteriormente, foram estimados, para cada experimento, os coeficientes de correlação linear de Pearson entre os 15 caracteres avaliados. Para a análise de trilha, a PRO foi considerada o caractere principal e os demais explicativos. Foi realizado o diagnóstico de multicolinearidade na matriz de correlação entre os caracteres explicativos e eliminados os caracteres causadores de alto grau de multicolinearidade. Os efeitos diretos e indiretos sobre a PRO foram estimados por meio de análise de trilha e a verificação dos caracteres que influenciam a PRO e a contribuição deles na predição da PRO foram estimados por meio de análise de regressão stepwise. Existem relações lineares entre os caracteres fenológicos, morfológicos e produtivos de plantas milho. Os caracteres PE e DE possuem coeficientes de correlação linear de Pearson fortíssimos (r≥0,97) e moderados a fortes (0,55≤r≤0,78), respectivamente, com a PRO. De maneira geral, o caractere DE possui elevada correlação e efeitos diretos (0,6686 ≤ efeito direto ≤ 1,1818) positivos sobre a PRO. Aliado ao DE, o CE possui elevada contribuição positiva na predição da PRO. Portanto, podem ser utilizados para seleção indireta em programas de melhoramento genético de milho.
APA, Harvard, Vancouver, ISO, and other styles
22

Лисенко, О. В. "Моделювання причинно-наслідкових зв’язків між тіньовою економікою та соціально-економічними процесами." Master's thesis, Сумський державний університет, 2021. https://essuir.sumdu.edu.ua/handle/123456789/86994.

Full text
Abstract:
У роботі досліджено існування причинно-наслідкових зв’язків між тіньовою економікою та соціально-економічними процесами. Основною метою роботи є побудова економіко-математичних моделей впливу економічних показників та впливу соціальних показників на рівень тіньової економіки. Ключовими методами дослідження є метод головних компонент та багатофакторний кореляційно-регресійний аналіз, які були реалізовані за допомогою програмного забезпечення Statistica.
The paper examines the existence of causal relationships between shadow economy and socio-economic processes. The main purpose of this work is to build economic and mathematical models of the impact of economic indicators and the impact of social indicators on the level of the shadow economy. The key research methods are principal component analysis and multiple correlation and regression analysis, which were implemented using Statistica software.
APA, Harvard, Vancouver, ISO, and other styles
23

Rey, Diana. "A Gasoline Demand Model for the United States Light Vehicle Fleet." Master's thesis, University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2351.

Full text
Abstract:
The United States is the world's largest oil consumer demanding about twenty five percent of the total world oil production. Whenever there are difficulties to supply the increasing quantities of oil demanded by the market, the price of oil escalates leading to what is known as oil price spikes or oil price shocks. The last oil price shock which was the longest sustained oil price run up in history, began its course in year 2004, and ended in 2008. This last oil price shock initiated recognizable changes in transportation dynamics: transit operators realized that commuters switched to transit as a way to save gasoline costs, consumers began to search the market for more efficient vehicles leading car manufactures to close 'gas guzzlers' plants, and the government enacted a new law entitled the Energy Independence Act of 2007, which called for the progressive improvement of the fuel efficiency indicator of the light vehicle fleet up to 35 miles per gallon in year 2020. The past trend of gasoline consumption will probably change; so in the context of the problem a gasoline consumption model was developed in this thesis to ascertain how some of the changes will impact future gasoline demand. Gasoline demand was expressed in oil equivalent million barrels per day, in a two steps Ordinary Least Square (OLS) explanatory variable model. In the first step, vehicle miles traveled expressed in trillion vehicle miles was regressed on the independent variables: vehicles expressed in million vehicles, and price of oil expressed in dollars per barrel. In the second step, the fuel consumption in million barrels per day was regressed on vehicle miles traveled, and on the fuel efficiency indicator expressed in miles per gallon. The explanatory model was run in EVIEWS that allows checking for normality, heteroskedasticty, and serial correlation. Serial correlation was addressed by inclusion of autoregressive or moving average error correction terms. Multicollinearity was solved by first differencing. The 36 year sample series set (1970-2006) was divided into a 30 years sub-period for calibration and a 6 year "hold-out" sub-period for validation. The Root Mean Square Error or RMSE criterion was adopted to select the "best model" among other possible choices, although other criteria were also recorded. Three scenarios for the size of the light vehicle fleet in a forecasting period up to 2020 were created. These scenarios were equivalent to growth rates of 2.1, 1.28, and about 1 per cent per year. The last or more optimistic vehicle growth scenario, from the gasoline consumption perspective, appeared consistent with the theory of vehicle saturation. One scenario for the average miles per gallon indicator was created for each one of the size of fleet indicators by distributing the fleet every year assuming a 7 percent replacement rate. Three scenarios for the price of oil were also created: the first one used the average price of oil in the sample since 1970, the second was obtained by extending the price trend by exponential smoothing, and the third one used a longtime forecast supplied by the Energy Information Administration. The three scenarios created for the price of oil covered a range between a low of about 42 dollars per barrel to highs in the low 100's. The 1970-2006 gasoline consumption trend was extended to year 2020 by ARIMA Box-Jenkins time series analysis, leading to a gasoline consumption value of about 10 millions barrels per day in year 2020. This trend line was taken as the reference or baseline of gasoline consumption. The savings that resulted by application of the explanatory variable OLS model were measured against such a baseline of gasoline consumption. Even on the most pessimistic scenario the savings obtained by the progressive improvement of the fuel efficiency indicator seem enough to offset the increase in consumption that otherwise would have occurred by extension of the trend, leaving consumption at the 2006 levels or about 9 million barrels per day. The most optimistic scenario led to savings up to about 2 million barrels per day below the 2006 level or about 3 millions barrels per day below the baseline in 2020. The "expected" or average consumption in 2020 is about 8 million barrels per day, 2 million barrels below the baseline or 1 million below the 2006 consumption level. More savings are possible if technologies such as plug-in hybrids that have been already implemented in other countries take over soon, are efficiently promoted, or are given incentives or subsidies such as tax credits. The savings in gasoline consumption may in the future contribute to stabilize the price of oil as worldwide demand is tamed by oil saving policy changes implemented in the United States.
M.S.
Department of Civil and Environmental Engineering
Engineering and Computer Science
Civil Engineering MS
APA, Harvard, Vancouver, ISO, and other styles
24

Huschens, Stefan. "Einführung in die Ökonometrie." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-222629.

Full text
Abstract:
Die Kapitel 1 bis 6 im ersten Teil dieses Skriptes beruhen auf einer Vorlesung Ökonometrie I, die zuletzt im WS 2001/02 gehalten wurde, die Kapitel 7 bis 16 beruhen auf einer Vorlesung Ökonometrie II, die zuletzt im SS 2006 gehalten wurde. Das achte Kapitel enthält eine komprimierte Zusammenfassung der Ergebnisse aus dem Teil Ökonometrie I.
APA, Harvard, Vancouver, ISO, and other styles
25

NÓBREGA, Jarley Palmeira. "Um método de aprendizagem seqüencial com filtro de Kalman e Extreme Learning Machine para problemas de regressão e previsão de séries temporais." Universidade Federal de Pernambuco, 2015. https://repositorio.ufpe.br/handle/123456789/15951.

Full text
Abstract:
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-03-15T12:52:14Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Tese_Jarley_Nobrega_CORRIGIDA.pdf: 12392055 bytes, checksum: 30d9ff36e7236d22ddc3a16dd942341f (MD5)
Made available in DSpace on 2016-03-15T12:52:14Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Tese_Jarley_Nobrega_CORRIGIDA.pdf: 12392055 bytes, checksum: 30d9ff36e7236d22ddc3a16dd942341f (MD5) Previous issue date: 2015-08-24
Em aplicações de aprendizagem de máquina, é comum encontrar situações onde o conjunto de entrada não está totalmente disponível no início da fase de treinamento. Uma solução conhecida para essa classe de problema é a realização do processo de aprendizagem através do fornecimento sequencial das instâncias de treinamento. Entre as abordagens mais recentes para esses métodos, encontram-se as baseadas em redes neurais do tipo Single Layer Feedforward Network (SLFN), com destaque para as extensões da Extreme Learning Machine (ELM) para aprendizagem sequencial. A versão sequencial da ELM, chamada de Online Sequential Extreme Learning Machine (OS-ELM), utiliza uma solução recursiva de mínimos quadrados para atualizar os pesos de saída da rede através de uma matriz de covariância. Entretanto, a implementação da OS-ELM e suas extensões sofrem com o problema de multicolinearidade entre os elementos da matriz de covariância. Essa tese introduz um novo método para aprendizagem sequencial com capacidade para tratar os efeitos da multicolinearidade. Chamado de Kalman Learning Machine (KLM), o método proposto utiliza o filtro de Kalman para a atualização sequencial dos pesos de saída de uma SLFN baseada na OS-ELM. Esse trabalho também propõe uma abordagem para a estimativa dos parâmetros do filtro, com o objetivo de diminuir a complexidade computacional do treinamento. Além disso, uma extensão do método chamada de Extended Kalman Learning Machine (EKLM) é apresentada, voltada para problemas onde a natureza do sistema em estudo seja não linear. O método proposto nessa tese foi comparado com alguns dos mais recentes e efetivos métodos para o tratamento de multicolinearidade em problemas de aprendizagem sequencial. Os experimentos executados mostraram que o método proposto apresenta um desempenho melhor que a maioria dos métodos do estado da arte, quando medidos o de erro de previsão e o tempo de treinamento. Um estudo de caso foi realizado, aplicando o método proposto a um problema de previsão de séries temporais para o mercado financeiro. Os resultados confirmaram que o KLM consegue simultaneamente reduzir o erro de previsão e o tempo de treinamento, quando comparado com os demais métodos investigados nessa tese.
In machine learning applications, there are situations where the input dataset is not fully available at the beginning of the training phase. A well known solution for this class of problem is to perform the learning process through the sequential feed of training instances. Among most recent approaches for sequential learning, we can highlight the methods based on Single Layer Feedforward Network (SLFN) and the extensions of the Extreme Learning Machine (ELM) approach for sequential learning. The sequential version of the ELM algorithm, named Online Sequential Extreme Learning Machine (OS-ELM), uses a recursive least squares solution for updating the output weights through a covariance matrix. However, the implementation of OS-ELM and its extensions suffer from the problem of multicollinearity for the hidden layer output matrix. This thesis introduces a new method for sequential learning in which the effects of multicollinearity is handled. The proposed Kalman Learning Machine (KLM) updates sequentially the output weights of an OS-ELM based network by using the Kalman filter iterative procedure. In this work, in order to reduce the computational complexity of the training process, a new approach for estimating the filter parameters is presented. Moreover, an extension of the method, named Extended Kalman Learning Machine (EKLM), is presented for problems where the dynamics of the model are non linear. The proposed method was evaluated by comparing the related state-of-the-art methods for sequential learning based on the original OS-ELM. The results of the experiments show that the proposed method can achieve the lowest forecast error when compared with most of their counterparts. Moreover, the KLM algorithm achieved the lowest average training time when all experiments were considered, as an evidence that the proposed method can reduce the computational complexity for the sequential learning process. A case study was performed by applying the proposed method for a problem of financial time series forecasting. The results reported confirm that the KLM algorithm can decrease the forecast error and the average training time simultaneously, when compared with other sequential learning algorithms.
APA, Harvard, Vancouver, ISO, and other styles
26

Zaldivar, Cynthia. "On the Performance of some Poisson Ridge Regression Estimators." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3669.

Full text
Abstract:
Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo simulation study was conducted to compare performance of the estimators under three experimental conditions: correlation, sample size, and intercept. It is evident from simulation results that all ridge estimators performed better than the ML estimator. We proposed new estimators based on the results, which performed very well compared to the original estimators. Finally, the estimators are illustrated using data on recreational habits.
APA, Harvard, Vancouver, ISO, and other styles
27

Toebe, Marcos. "Não-normalidade multivariada e multicolinearidade em análise de trilha na cultura de milho." Universidade Federal de Santa Maria, 2012. http://repositorio.ufsm.br/handle/1/5057.

Full text
Abstract:
Conselho Nacional de Desenvolvimento Científico e Tecnológico
The path analysis allows evaluation of the direct and indirect effects of the explicative variables on variable of interest, through the breakdown of the correlation coefficients. In order to make the results obtained through the path analysis reliable, some assumptions must be met. Thus, the objectives of this study were to verify the normality and the multicollinearity interference in the corn path analysis and compare alternative methods for estimating the path coefficients. Data from 44 trials of corn cultivars was used, carried out in the state of Rio Grande do Sul, between the crop years 2002/03 and 2004/05. In each cultivar, of each trial, were measured (number of days until the male flowering, plant height, ear insertion height, relative position of the ear, number of plants, number of ears and prolificacy) and the main variable (grain yield). For each trial, descriptive statistics were calculated and univariate and multivariate normality diagnoses were conducted using the Shapiro-Wilk test and the Shapiro-Wilk multivariate generalized by Royston test, respectively. Thereupon, in the trials data that did not present a normal distribution, a transformation of the data by the Box-Cox family of transformations was carried out. The correlation coefficients between the seven explicative variables (correlation matrix X'X) and the correlation coefficients of each explicative variable with the grain yield (correlation matrix X'Y) were calculated for the original and transformed data. Then, the multicollinearity was diagnosed in the correlation matrix X'X, using four methods: variance inflation factor, tolerance, the condition number and the matrix determinant. Finally, the path analysis was performed, using the normal equations system X X �� = X Y, in three forms: traditional path analysis, path analysis under multicollinearity and traditional path analysis, with elimination of variables. The data transformation, to obtain multivariate normality, contributes to the degree of multicollinearity decrease and in the stabilization of the direct effects in path analysis with high degree of multicollinearity. The high degrees of multicollinearity adverse effects in the estimation of the direct effects in path analysis are larger than the multivariate non-normality. The traditional path analysis, with elimination of variables, is more appropriate than the path analysis under multicollinearity.
A análise de trilha permite avaliar os efeitos diretos e indiretos de variáveis explicativas sobre a variável de interesse, por meio do desdobramento dos coeficientes de correlação. Para que os resultados gerados pela análise de trilha apresentem confiabilidade adequada, alguns pressupostos devem ser atendidos. Assim, os objetivos deste trabalho foram: verificar a interferência da não-normalidade multivariada e da multicolinearidade em análise de trilha na cultura de milho e, comparar métodos alternativos de estimação dos coeficientes de trilha. Foram utilizados dados de 44 ensaios de competição de cultivares de milho, conduzidos no estado do Rio Grande do Sul, entre os anos agrícolas de 2002/03 e 2004/05. Em cada cultivar, de cada ensaio, foram mensuradas sete variáveis explicativas (número de dias até o florescimento masculino, estatura de plantas, altura de inserção da espiga, posição relativa da espiga, número de plantas, número de espigas e prolificidade) e a variável principal (produtividade de grãos). Para cada ensaio, foram calculadas estatísticas descritivas e realizado o diagnóstico de normalidade uni e multivariada, por meio dos testes de Shapiro-Wilk e de Shapiro-Wilk multivariado generalizado por Royston, respectivamente. A seguir, nos dados dos ensaios que não apresentaram distribuição normal, foi realizada a transformação dos dados com a utilização da família de transformações Box-Cox. Para os dados originais e os dados transformados, foram calculados os coeficientes de correlação entre as sete variáveis explicativas (matriz de correlação X X) e os coeficientes de correlação de cada variável explicativa com a produtividade de grãos (matriz de correlação X Y). A seguir, foi realizado o diagnóstico de multicolinearidade na matriz de correlação X X, por meio de quatro métodos: fator de inflação de variância, tolerância, número de condição e determinante da matriz. Por fim, foi realizada a análise de trilha, com a utilização do sistema de equações normais X X �� = X Y, por três formas: análise de trilha tradicional, análise de trilha sob multicolinearidade e análise de trilha tradicional, com eliminação de variáveis. A transformação de dados, a fim de obter a normalidade multivariada, contribui para a redução do grau de multicolinearidade e na estabilização das estimativas dos efeitos diretos em análise de trilha com alto grau de multicolinearidade. Os efeitos adversos do alto grau de multicolinearidade na estimativa dos efeitos diretos de análises de trilha são maiores que a não-normalidade multivariada. A análise de trilha tradicional, com eliminação de variáveis, é mais adequada que a análise de trilha sob multicolinearidade.
APA, Harvard, Vancouver, ISO, and other styles
28

Brunelli, Renata Trevisan. "Análise do impacto de perturbações sobre medidas de qualidade de ajuste para modelos de equações estruturais." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-24032013-123415/.

Full text
Abstract:
A Modelagem de Equações Estruturais (SEM, do inglês Structural Equation Modeling) é uma metodologia multivariada que permite estudar relações de causa/efeito e correlação entre um conjunto de variáveis (podendo ser elas observadas ou latentes), simultaneamente. A técnica vem se difundindo cada vez mais nos últimos anos, em diferentes áreas do conhecimento. Uma de suas principais aplicações é na conrmação de modelos teóricos propostos pelo pesquisador (Análise Fatorial Conrmatória). Existem diversas medidas sugeridas pela literatura que servem para avaliar o quão bom está o ajuste de um modelo de SEM. Entretanto, é escassa a quantidade de trabalhos na literatura que listem relações entre os valores de diferentes medidas com possíveis problemas na amostra e na especicação do modelo, isto é, informações a respeito de que possíveis problemas desta natureza impactam quais medidas (e quais não), e de que maneira. Tal informação é importante porque permite entender os motivos pelos quais um modelo pode estar sendo considerado mal-ajustado. O objetivo deste trabalho é investigar como diferentes perturbações na amostragem, especicação e estimação de um modelo de SEM podem impactar as medidas de qualidade de ajuste; e, além disso, entender se o tamanho da amostra influencia esta resposta. Simultaneamente, também se avalia como tais perturbações afetam as estimativas, dado que há casos de perturbações em que os parâmetros continuam sendo bem ajustados, mesmo com algumas medidas indicando um mau ajuste; ao mesmo tempo, há ocasiões em que se indica um bom ajuste, enquanto que os parâmetros são estimados de forma distorcida. Tais investigações serão realizadas a partir de simulações de exemplos de amostras de diferentes tamanhos para cada tipo de perturbação. Então, diferentes especicações de modelos de SEM serão aplicados a estas amostras, e seus parâmetros serão estimados por dois métodos diferentes: Mínimos Quadrados Generalizados e Máxima Verossimilhança. Conhecendo tais resultados, um pesquisador que queira aplicar a técnica de SEM poderá se precaver e, dentre as medidas de qualidade de ajuste disponíveis, optar pelas que mais se adequem às características de seu estudo.
The Structural Equation Modeling (SEM) is a multivariate methodology that allows the study of cause-and-efect relationships and correlation of a set of variables (that may be observed or latent ones), simultaneously. The technique has become more diuse in the last years, in different fields of knowledge. One of its main applications is on the confirmation of theoretical models proposed by the researcher (Confirmatory Factorial Analysis). There are several measures suggested by literature to measure the goodness of t of a SEM model. However, there is a scarce number of texts that list relationships between the values of different of those measures with possible problems that may occur on the sample or the specication of the SEM model, like information concerning what problems of this nature impact which measures (and which not), and how does the impact occur. This information is important because it allows the understanding of the reasons why a model could be considered bad fitted. The objective of this work is to investigate how different disturbances of the sample, the model specification and the estimation of a SEM model are able to impact the measures of goodness of fit; additionally, to understand if the sample size has influence over this impact. It will also be investigated if those disturbances affect the estimates of the parameters, given the fact that there are disturbances for which occurrence some of the measures indicate badness of fit but the parameters are not affected; at the same time, that are occasions on which the measures indicate a good fit and there are disturbances on the estimates of the parameters. Those investigations will be made simulating examples of different size samples for which type of disturbance. Then, SEM models with different specifications will be fitted to each sample, and their parameters will be estimated by two dierent methods: Generalized Least Squares and Maximum Likelihood. Given those answers, a researcher that wants to apply the SEM methodology to his work will be able to be more careful and, among the available measures of goodness of fit, to chose those that are more adequate to the characteristics of his study.
APA, Harvard, Vancouver, ISO, and other styles
29

Shehzad, Muhammad Ahmed. "Pénalisation et réduction de la dimension des variables auxiliaires en théorie des sondages." Phd thesis, Université de Bourgogne, 2012. http://tel.archives-ouvertes.fr/tel-00812880.

Full text
Abstract:
Les enquêtes par sondage sont utiles pour estimer des caractéristiques d'une populationtelles que le total ou la moyenne. Cette thèse s'intéresse à l'étude detechniques permettant de prendre en compte un grand nombre de variables auxiliairespour l'estimation d'un total.Le premier chapitre rappelle quelques définitions et propriétés utiles pour lasuite du manuscrit : l'estimateur de Horvitz-Thompson, qui est présenté commeun estimateur n'utilisant pas l'information auxiliaire ainsi que les techniques decalage qui permettent de modifier les poids de sondage de facon à prendre encompte l'information auxiliaire en restituant exactement dans l'échantillon leurstotaux sur la population.Le deuxième chapitre, qui est une partie d'un article de synthèse accepté pourpublication, présente les méthodes de régression ridge comme un remède possibleau problème de colinéarité des variables auxiliaires, et donc de mauvais conditionnement.Nous étudions les points de vue "model-based" et "model-assisted" dela ridge regression. Cette technique qui fournit de meilleurs résultats en termed'erreur quadratique en comparaison avec les moindres carrés ordinaires peutégalement s'interpréter comme un calage pénalisé. Des simulations permettentd'illustrer l'intérêt de cette technique par compar[a]ison avec l'estimateur de Horvitz-Thompson.Le chapitre trois présente une autre manière de traiter les problèmes de colinéaritévia une réduction de la dimension basée sur les composantes principales. Nousétudions la régression sur composantes principales dans le contexte des sondages.Nous explorons également le calage sur les moments d'ordre deux des composantesprincipales ainsi que le calage partiel et le calage sur les composantes principalesestimées. Une illustration sur des données de l'entreprise Médiamétrie permet deconfirmer l'intérêt des ces techniques basées sur la réduction de la dimension pourl'estimation d'un total en présence d'un grand nombre de variables auxiliaires
APA, Harvard, Vancouver, ISO, and other styles
30

Huang, Sheng-Yao, and 黃生耀. "The Comparison of Different Investment Models Under Multicollinearity." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/63728456035882588154.

Full text
Abstract:
碩士
國立臺灣海洋大學
應用經濟研究所
94
In this thesis, the investment related series are under investigation. We use three econometric methods in our study. We apply stepwise regression method combined with big four theories about making decision for investment behavior. The four theories are generalized accelerator, cash flow, neoclassical and securities valuation. In total, we choose 38 items of variables from investment function. And then we use the tool of multiple regression to proceed all the analysis. After that we compare it with results of principal component and ridge regression. All the data are from AREMOS data bank. The sample period ranges from the first quarter in 1976 through the fourth quarter in 2004 . In the first step, we consider the unit root test to check the stationarity in the data. The finding is that variables in the set are nonstationary in general. However we have done the work both in the differenced form and in the level form. For the former solution, we follow the tradition procedure suggestion to take difference prior to the analysis. For the latter one, we adopt the theory under the framework of cointegration regression. Judging from all the results, we have found stepwise regression method under the guidance of economic theories performs best. And the outcomes from principal component and ridge regression are not reliable if the data range wide and normalization is not applied. We hope this result is good for all the private agencies and organizations in the government. Key words: investment function, stepwise regression, principal component, ridge regression.
APA, Harvard, Vancouver, ISO, and other styles
31

Pan, Li-Hsiang, and 潘立翔. "A Study on the effect of Multicollinearity in Polynomial model." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/17227311806609276791.

Full text
Abstract:
碩士
淡江大學
數學學系碩士班
99
In this paper the process of regression analysis of linear prone to the problem, and most of the information in the past, collinearity problems are to; (1) the high correlation coefficient with each other predictors just take a important variable into analysis, (2) ridge regression, (3) principal component regression a total of three methods to solve linear problems come and go, but some of them are not local, so paper Cipian aims to investigate a new method to solve the collinearity problem, and with the ridge regression and principal component regression to do more.
APA, Harvard, Vancouver, ISO, and other styles
32

Bhattacharya, Indranil. "Feature Selection under Multicollinearity & Causal Inference on Time Series." Thesis, 2017. http://etd.iisc.ernet.in/2005/3980.

Full text
Abstract:
In this work, we study and extend algorithms for Sparse Regression and Causal Inference problems. Both the problems are fundamental in the area of Data Science. The goal of regression problem is to nd out the \best" relationship between an output variable and input variables, given samples of the input and output values. We consider sparse regression under a high-dimensional linear model with strongly correlated variables, situations which cannot be handled well using many existing model selection algorithms. We study the performance of the popular feature selection algorithms such as LASSO, Elastic Net, BoLasso, Clustered Lasso as well as Projected Gradient Descent algorithms under this setting in terms of their running time, stability and consistency in recovering the true support. We also propose a new feature selection algorithm, BoPGD, which cluster the features rst based on their sample correlation and do subsequent sparse estimation using a bootstrapped variant of the projected gradient descent method with projection on the non-convex L0 ball. We attempt to characterize the efficiency and consistency of our algorithm by performing a host of experiments on both synthetic and real world datasets. Discovering causal relationships, beyond mere correlation, is widely recognized as a fundamental problem. The Causal Inference problems use observations to infer the underlying causal structure of the data generating process. The input to these problems is either a multivariate time series or i.i.d sequences and the output is a Feature Causal Graph where the nodes correspond to the variables and edges capture the direction of causality. For high dimensional datasets, determining the causal relationships becomes a challenging task because of the curse of dimensionality. Graphical modeling of temporal data based on the concept of \Granger Causality" has gained much attention in this context. The blend of Granger methods along with model selection techniques, such as LASSO, enables efficient discovery of a \sparse" sub-set of causal variables in high dimensional settings. However, these temporal causal methods use an input parameter, L, the maximum time lag. This parameter is the maximum gap in time between the occurrence of the output phenomenon and the causal input stimulus. How-ever, in many situations of interest, the maximum time lag is not known, and indeed, finding the range of causal e ects is an important problem. In this work, we propose and evaluate a data-driven and computationally efficient method for Granger causality inference in the Vector Auto Regressive (VAR) model without foreknowledge of the maximum time lag. We present two algorithms Lasso Granger++ and Group Lasso Granger++ which not only constructs the hypothesis feature causal graph, but also simultaneously estimates a value of maxlag (L) for each variable by balancing the trade-o between \goodness of t" and \model complexity".
APA, Harvard, Vancouver, ISO, and other styles
33

Liu, Yu-Chen, and 劉育呈. "Analyzing the Factors Affecting Survival of Cancer Patients under Multicollinearity Problem." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/ar7a28.

Full text
Abstract:
碩士
國立臺北科技大學
經營管理系碩士班
101
In Taiwan, cancer stands as top one of the most lethal disease. Not only affect the quality of life of the patient and their families, but also cause a huge medical expenses and years of potential life lost. In order to reduce the incidence of cancer effectively, we try to find out the causes of low cancer rate by analyzing the pattern and availability of survival influence factors. The objective of this research is to investigate factors which influence survival time of cancer patients. Because survival time can be impacted by factors which are highly correlated with each other, to appropriate treatment of medical, the problem of multicollinearity must be solved. Therefore, this research proposes a new solution, Cox proportional hazards model combines independent component analysis method, to eliminate the multicollinearity among explanatory variables. To evaluate the performance of the proposed method relative to alternative approaches, we report one experiment study based on the dataset from Monte Carlo simulation experiments. The result shows that the proposed approach can solve the problem of multicollinearity and the significant effect between survival time and factors with cancer patients.
APA, Harvard, Vancouver, ISO, and other styles
34

Riley, Fransell Rena Copeland. "Testing the equality of regression coefficients and a pooling methodology from multiple samples when the data is multicollinear." 2009. http://hdl.handle.net/10106/1737.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Lee, Shih-Yun, and 李詩芸. "Model selection in regression analysis on the data with multicollinearity and missing covariates." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/68017326545657148002.

Full text
Abstract:
碩士
國立陽明大學
公共衛生研究所
92
In the research of public health, the problems of multicollinearity and missing values are frequently encountered. When the covariates are related to each other, this is called multicollinearity. Serious multicollinearity in regression analysis will make unstableness of regression parameter estimates, which result in the increase of standard errors, probably even in interfering or misleading the variable selection of the model. When missing values exist in important variables, certain degrees of information will lose, and the results of analysis will be likely to bias. In the past, there are few researches to discuss the effect on variable selection of both multicollinearity and missing values at the same time. The purpose of this research is to compare (1) whether the missing values of covariates exist or not, (2) whether covariates exist with multicollinearity or not ,and (3) what the effect on the variable selection of the model is when the missing values and multicollinearity exist at the same time. Use a sample of Obstructive Sleep Apnea Syndrome (OSAS) patients’ records of a hospital center as the population to do various simulation studies. Study different sample sizes, data structures, proportions of missingness, magnitude of multicollinearity, the criteria of variables in/out the model and selection methods in different conditions to the effect of the variable selection in regression models. According to the data characters, this research will study most of linear regression and part of logistic regression. The followings research three subjects—multicollinearity, missing values and variable selection. The result of linear regression shows that the missing values have little effect on the proportion of choosing the correct model and only a little effect on a small sample size. Multicollinearity has a substantial effect on the proportion of choosing the correct model, and candidate variables or the true model which have collinearity will decrease the proportion of choosing the correct model. When neither the true model nor candidate variables have collinearity, different levels of R2, criteria of variables in/out the model and selection methods have little effect on the correct model. When the true model with quadratic forms or candidate variables with interactions, the result is bad at small sample sizes if criteria of variables in/out the model are strict, but is better at large sample sizes. The bigger the R2 is, the better the proportion of choices will be, no matter the sample size is large or small. When the true model has collinearity in itself and candidate variables with interactions, that is, when serious multicollinearity exists, the method of backward selection is better than the method of forward stepwise selection on choosing the correct model. According to the result of the P values, when true models with collinearity, if the samples are large enough, the covariates in estimated models won’t interfere but become significant. When true models are without collinearity, as long as the covariates in estimated models don’t appear in true models, the result won’t be significant no matter whether there is the interference of collinearity or not. The result of logistic regression shows that selection methods, no matter whether it is forward stepwise or backward selection, have little difference. The criterion of variables in/out the model with small sample sizes is 0.05 worse than 0.1, but better than 0.1 with large sample sizes. The existence of missing values will slightly decrease the proportion of choosing the correct model. In conclusion, the sample size will effect the result greatly.
APA, Harvard, Vancouver, ISO, and other styles
36

Chen, Hsin-Fen, and 陳杏棻. "A Solution to Cox regression with Multicollinearity - An application of Independent Component Analysis." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/s72wmw.

Full text
Abstract:
博士
國立臺北科技大學
管理學院管理博士班
104
The purpose of this study is to solve the multicollinearity problem in Cox regression model. The Cox regression model has been widely used to describe the relationship between survival information and covariates. Multicollinearity refers to that there exist one or several approximate linear relations among explanatory variable. Multicollinearity troubles many researchers because when multicollinearity is present, the collective power of explanation is considerably less than the sum of their individual power. Moreover, the presence of multicollinearity invalidates the ordinary least square (OLS) estimation, which assumes that explanatory variables are uncorrelated with each other, and makes it impossible to estimate the unique effects of individual variables in the analysis. So, the problem of multicollinearity must be taken care. Therefore, this study proposes a new solution: Independent Component Analysis - Cox Regression (ICA-CR) to eliminate the multicollinearity among explanatory variables. To evaluate the performance of the proposed method relative to alternative approaches, such as Cox regression, ridge regression, principal component regression, A dataset from one of the biggest mutual fund brokers in Taiwan was used to illustrate the proposed approach. Two Monte Carlo simulation experiments with various degrees of multicollinearity, censored rate, and sample size were conducted. The result shows that the proposed ICA-CR approach could successfully solve the multicollinearity problem in the data. The mutual fund holding time was impacted by economic environments in a significantly different way during and after financial crisis. This result indicates that, after financial crisis, mutual fund investors have adjusted their risk tolerance and can response to the financial environment more rationally.
APA, Harvard, Vancouver, ISO, and other styles
37

Lin, Yu-Wei. "Gram-Schmidt Transformation Minimization Algorithm and Its Applications to Regression Analysis with Multicollinearity." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2708200816471900.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Fang, Wei-Quan, and 方偉泉. "A Note on Parameters Estimation in Linear Regression Models Subject to Measurement Error and Multicollinearity." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/75369753916933474623.

Full text
Abstract:
博士
中原大學
應用數學研究所
104
In recent years, big data applications and intelligence have become ubiquitous and keep discovering more insights from the academia and the industry/economy alike. Data analyses in those researches may sometimes require complex statistical methods where, in general, no closed-form solutions can be directly used because of imprecision of measurements and/or unexpected errors. In addition, there have been, to date, a number of literature reviews that some statistical results may lead to undesirable conclusions due to collinearity in data structure. Hence investigators should pay attention to such situations when digging the data and drawing the information. In this dissertation, we study the mis-measured and collinear issues in linear models. To be a tool for interpreting the possibility of cause-effect relations, regression analysis plays an important role for a long time. One of the main goals of this study is to remind the readership that classical estimation approach, least-squares method, may need to correct for the biases of parameter coefficients in certain applications. According to the aforementioned, we propose two new methods to correct such biases and give an outlook on further extended works. It is also our hope that some of the estimation approaches proposed in this dissertation will contribute to the subsequent development of a more general theory for biases correction in regression analysis.
APA, Harvard, Vancouver, ISO, and other styles
39

Jurczyk, Tomáš. "Robustifikace statistických a ekonometrických metod regrese." Doctoral thesis, 2016. http://www.nusl.cz/ntk/nusl-351516.

Full text
Abstract:
Title: Robustification of statistical and econometrical regression methods Author: Mgr. Tomáš Jurczyk Department: Department of probability and mathematical statistics Supervisor: prof. RNDr. Jan Ámos Víšek CSc., IES FSV UK Praha Abstract: Multicollinearity and outlier presence are two problems of data which can occur during the regression analysis. In this thesis we are interested mainly in situations where combined outlier-multicollinearity problem is present. We will show first the behavior of classical methods developed for overcoming one of these problems. We will investigate the functionality of methods proposed as robust multicollinearity detectors as well. We will prove that proposed two-step procedures (in one step typically based on robust regression methods) are failing in outlier detection and therefore also multicollinearity detection, if the strong multicollinearity is present in the majority of the data. We will propose a new one-step method as a candidate for the robust detector of multicollinearity as well as the robust ridge regression estimate. We will derive its properties, behavior and propose the diagnostic tools derived from that method. Keywords: multicollinearity, outliers, robust detector of multicollinearity, ro- bust ridge regression 1
APA, Harvard, Vancouver, ISO, and other styles
40

Liu, Xiaoming. "A class of generalized shrunken least squares estimators in linear model." 2010. http://hdl.handle.net/1993/4188.

Full text
Abstract:
Modern data analysis often involves a large number of variables, which gives rise to the problem of multicollinearity in regression models. It is well-known that in a linear model, when the design matrix X is nearly singular, then the ordinary least squares (OLS) estimator may perform poorly because of its numerical instability and large variance. To overcome this problem, many linear or nonlinear biased estimators are studied. In this work we consider a class of generalized shrunken least squares (GSLS) estimators that include many well-known linear biased estimators proposed in the literature. We compare these estimators under the mean square error and matrix mean square error criteria. Moreover, a simulation study and two numerical examples are used to illustrate some of the theoretical results.
APA, Harvard, Vancouver, ISO, and other styles
41

Parandvash, G. Hossein. "On the incorporation of nonnumeric information into the estimation of economic relationships in the presence of multicollinearity." Thesis, 1987. http://hdl.handle.net/1957/26851.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Chen, Ai-Chun, and 陳愛群. "A class of Liu-type estimators based on ridge regression under multicollinearity with an application to mixture experiments." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/bquhze.

Full text
Abstract:
碩士
國立中央大學
統計研究所
103
In the linear regression, the least square estimator does not perform well in terms of mean squared error when multicollinearity exists. The problem of multicollinearity occurs in industrial mixture experiments, where regressors are constrained.Hoerl and Kennard (1970) proposed the ordinary ridge estimator to overcome the problem of the least squared estimator under multicollinearity. Recently, the ridge regression is successfully applied to mixture experiments. However, the application of ridge becomes difficult if the linear model has the intercept term and the regressors are standardized as occurring in mixture experiments. This paper considers a special class of Liu-type estimators (Liu, 2003) with intercept. We derive the theoretical formula of the mean squared error for the proposed method. We perform simulations to compare the proposed estimator with the ridge estimator in terms of mean squared error. We demonstrate this special class using the dataset on Portland cement with mixture experiment (Woods et al., 1932).
APA, Harvard, Vancouver, ISO, and other styles
43

Γρηγοριάδου, Μαρία. "Παραβιάσεις των βασικών υποθέσεων του γραμμικού μοντέλου παλινδρόμησης." Thesis, 2014. http://hdl.handle.net/10889/8276.

Full text
Abstract:
Το στατιστικό μοντέλο είναι μία τυποποίηση στοχαστικών σχέσεων μεταξύ μεταβλητών σε μορφή μαθηματικών εξισώσεων με σκοπό την όσο το δυνατόν πιο ακριβή περιγραφή ενός συστήματος (φαινομένου ή γεγονότος). Σχεδόν σε κάθε σύστημα, υπάρχουν μεταβλητές ποσότητες που αλλάζουν. Ένα ενδιαφέρον ζήτημα είναι η μελέτη των επιδράσεων που αυτές οι μεταβλητές ασκούν (ή φαίνεται να ασκούν) πάνω σε άλλες. Η μελέτη αυτή είναι το αντικείμενο της ανάλυσης παλινδρόμησης, μίας ευρέως χρησιμοποιούμενης στατιστικής τεχνικής, την οποία χρησιμοποιούμε για να ανιχνεύσουμε και να μοντελοποιήσουμε σχέσεις και εξαρτήσεις μεταξύ μεταβλητών. Όταν οι σχέσεις μεταξύ των μεταβλητών είναι γραμμικές, προκύπτουν τα λεγόμενα γραμμικά παλινδρομικά μοντέλα. Τα στατιστικά μοντέλα παλινδρόμησης, βασίζονται σε κάποιες βασικές υποθέσεις, τις οποίες υποχρεούμαστε να ελέγχουμε πριν την ανάλυση του μοντέλου. Στην πράξη, όμως, οι υποθέσεις αυτές συχνά παραβιάζονται. Όταν δε, έχουμε να κάνουμε με δεδομένα του πραγματικού κόσμου, η παραβίαση των υποθέσεων αυτών είναι τόσο συχνή που αποτελεί στη συντριπτική πλειοψηφία τον κανόνα παρά την εξαίρεση. Η παρούσα διπλωματική εργασία πραγματεύεται το σημαντικότατο θέμα που ανακύπτει σε περιπτώσεις στις οποίες κάποιες από τις βασικές υποθέσεις που διέπουν το γραμμικό μοντέλο παλινδρόμησης παραβιάζονται. Σκοπός της εργασίας αυτής είναι : α)να αναλυθούν οι αιτίες που προκαλούν την κάθε παραβίαση και οι επιπτώσεις που έχει αυτή στο μοντέλο, β)να καταγραφούν οι βασικότεροι τρόποι ανίχνευσης των παραβιάσεων στο υπόδειγμα, γ)να βρεθούν τρόποι αντιμετώπισης των "προβληματικών καταστάσεων". Τα αποτελέσματα δείχνουν ότι ο συνδυασμός της καθεστηκυίας γνώσης (του θεωρητικού υποβάθρου) για το αντικείμενο και των σύγχρονων μεθόδων και ιδεών μπορούν να μειώσουν σημαντικά τις δυσμενείς επιπτώσεις που επιφέρουν οι παραβιάσεις των κανόνων στο μοντέλο, και παράλληλα μας επιτρέπει να "περισώσουμε" ικανοποιητικό ποσό πληροφορίας.
The statistical model is a standarization of stochastic relationships between variables in a form of mathematical equations in order to accurately describe a system, either phenomena, or facts. Almost every system includes some variable amounts that change.The interesting question is to investigate the effects those variables have (or appear to have) on other variables. This kind of investigation is the object of the regression analysis,a widely used statistical technic, which is used so as to detect relations and dependences between variables. Linear regression models are created when there are linear relations between variables. In addition, statistical models are based on some significant assumptions, that we are obliged to validate before we analyze the model. However, these assumptions are often violated in practise. Especially when we have to face with <> data, the violation is too frecuent that ends to be the rule instead the exception. The current thesis addresses the important subject which arises when some basic assumptions of the linear regression model are violated.The purpose of writing this thesis is : a)to analyse the reasons why the basic assumptions are violated and how these violations effect to our model b)to report the main methods in order to scan the model for violations c)to find ways to fight the problems The investigation results to the fact that if we combine the theoretical backround and the modern methods and techniques, we can reduce the adverse consecuences -and occasionally even reverse the damages- that the violations breed to the model, with simultaneous <> of a quite satisfactory amount of information.
APA, Harvard, Vancouver, ISO, and other styles
44

Bartel, Joseph. "A study on the effects of multicollinearity, autocorrelation and four sampling designs on the predictive ability of the 1994 and 1995 variable-exponent taper functions." Thesis, 1999. http://hdl.handle.net/2429/9003.

Full text
Abstract:
In British Columbia, government, industry and consulting firms have used taper functions since the late sixties. Most recently, Kozak's (1988) variable exponent model has been used since 1989. One practical problem with the model is that, it does not estimate total or merchantable volume without bias. These biases were found to be more pronounced for red cedar (Thuja plicata Donn ex D.Don) and western hemlock (Tsuga heterophylla (raf). Sarg.). Because of this problem, a second equation known as the 1994 equation was developed. However, reviewers identified some theoretical problems concerning multicollinearity and autocorrelation in the 1994 equation. These prompted the development of a third equation that possesses a lesser amount of multicollinearity referred to as the 1995 equation. The three principal objectives of this research were: (1) to study the effects of multicollinearity and autocorrelation on the predictive ability of the 1994 and 1995 variableexponent taper functions; (2) to study the effects of four sampling strategies on the predictive ability of the 1994 and 1995 taper equations; and (3) to examine the possibility of localizing the 1994 taper equations. The effects of multicollinearity and autocorrelation and the four sampling designs were studied using Monte Carlo simulations. The results of the study indicated that the presence of severe multicollinearity and autocorrelation in the data did not seriously affect the predictive ability of the equations. Stratified random sampling, with equal allocation of observations selected from each stratum, gave the smallest variability of the estimated coefficients compared to simple random sampling, and stratified random sampling, with the number of samples proportional to the size of the strata. However, the average estimated regression coefficients were somewhat different from the population parameters.Therefore, simple random sampling is recommended for selecting trees from the population if the main objective is the estimation of the population parameters. If the equations are to be used for prediction, then a wider range of the data (stratified sampling) should be used. The results indicated that no adjustment or scaling is required for the western hemlock equation for the two subzones studied.
APA, Harvard, Vancouver, ISO, and other styles
45

Meňhartová, Ivana. "Metody dynamické analýzy složení portfolia." Master's thesis, 2012. http://www.nusl.cz/ntk/nusl-305048.

Full text
Abstract:
Title: Methods of dynamical analysis of portfolio composition Author: Ivana Meňhartová Department: Department of Probability and Mathematical Statistics Supervisor: Mgr. Tomáš Hanzák, KPMS, MFF UK Abstract: In the presented thesis we study methods used for dynamic analysis of portfolio based on it's revenues. The thesis focuses on Kalman filter and local- ly weighted regression as two basic methods for dynamic analysis. It describes in detail theory for these methods as well as their utilization and it discusses their proper settings. Practical applications of both methods on artificial data and real data from Prague stock-exchange are presented. Using artificial data we demonstrate practical importance of Kalman filter's assumptions. Afterwards we introduce term multicolinearity as a possible complication to real data applicati- ons. At the end of the thesis we compare results and usage of both methods and we introduce possibility of enhancing Kalman filter by projection of estimations or by CUSUM tests (change detection tests). Keywords: Kalman filter, locally weighted regression, multicollinearity, CUSUM test
APA, Harvard, Vancouver, ISO, and other styles
46

"A Spatial Statistical Framework for Evaluating Landscape Pattern and Its Impacts on the Urban Thermal Environment." Doctoral diss., 2016. http://hdl.handle.net/2286/R.I.39433.

Full text
Abstract:
abstract: Urban growth, from regional sprawl to global urbanization, is the most rapid, drastic, and irreversible form of human modification to the natural environment. Extensive land cover modifications during urban growth have altered the local energy balance, causing the city warmer than its surrounding rural environment, a phenomenon known as an urban heat island (UHI). How are the seasonal and diurnal surface temperatures related to the land surface characteristics, and what land cover types and/or patterns are desirable for ameliorating climate in a fast growing desert city? This dissertation scrutinizes these questions and seeks to address them using a combination of satellite remote sensing, geographical information science, and spatial statistical modeling techniques. This dissertation includes two main parts. The first part proposes to employ the continuous, pixel-based landscape gradient models in comparison to the discrete, patch-based mosaic models and evaluates model efficiency in two empirical contexts: urban landscape pattern mapping and land cover dynamics monitoring. The second part formalizes a novel statistical model called spatially filtered ridge regression (SFRR) that ensures accurate and stable statistical estimation despite the existence of multicollinearity and the inherent spatial effect. Results highlight the strong potential of local indicators of spatial dependence in landscape pattern mapping across various geographical scales. This is based on evidence from a sequence of exploratory comparative analyses and a time series study of land cover dynamics over Phoenix, AZ. The newly proposed SFRR method is capable of producing reliable estimates when analyzing statistical relationships involving geographic data and highly correlated predictor variables. An empirical application of the SFRR over Phoenix suggests that urban cooling can be achieved not only by altering the land cover abundance, but also by optimizing the spatial arrangements of urban land cover features. Considering the limited water supply, rapid urban expansion, and the continuously warming climate, judicious design and planning of urban land cover features is of increasing importance for conserving resources and enhancing quality of life.
Dissertation/Thesis
Doctoral Dissertation Geography 2016
APA, Harvard, Vancouver, ISO, and other styles
47

Berger, Swetlana. "Scale effects on genomic modelling and prediction." Doctoral thesis, 2015. http://hdl.handle.net/11858/00-1735-0000-0022-6086-8.

Full text
Abstract:
In dieser Arbeit wird eine neue Methode für den skalenunabhängigen Vergleich von LD-Strukturen in unterschiedlichen genomischen Regionen vorgeschlagen. Verschiedene Aspekte durch Skalen verursachter Probleme – von der Präzision der Schätzung der Marke-reffekte bis zur Genauigkeit der Vorhersage für neue Individuen - wurden untersucht. Darüber hinaus, basierend auf den Leistungsvergleichen von unterschiedlichen statistischen Methoden, wurden Empfehlungen für die Verwendungen der untersuchten Methoden gege-ben.
In dieser Arbeit wird eine neue Methode für den skalenunabhängigen Vergleich von LD-Strukturen in unterschiedlichen genomischen Regionen vorgeschlagen. Verschiedene Aspekte durch Skalen verursachter Probleme – von der Präzision der Schätzung der Marke-reffekte bis zur Genauigkeit der Vorhersage für neue Individuen - wurden untersucht. Darüber hinaus, basierend auf den Leistungsvergleichen von unterschiedlichen statistischen Methoden, wurden Empfehlungen für die Verwendungen der untersuchten Methoden gegeben
APA, Harvard, Vancouver, ISO, and other styles
48

Onishi, Tamaki. "Institutional influence on the manifestation of entrepreneurial orientation: A case of social investment funders." Thesis, 2014. http://hdl.handle.net/1805/4656.

Full text
Abstract:
Indiana University-Purdue University Indianapolis (IUPUI)
Linking the new institutionalism to entrepreneurial orientation (EO), my dissertation investigates institutional forces and entrepreneurial forces—two contradicting types of forces—as main effects and moderating effects upon practices and performance of organizations embedded in the institutional duality. The case chosen observes unique hybrid funders that this study collectively calls social investment funders (SIF), which integrate philanthropy and venture capital investment to create and implement a venture philanthropy model for a pursuit of their mission. A theoretical framework is developed to propose regulative and normative pressures from two dominant institutions governing SIFs. Original data collected from 146 organizations are scrutinized by moderated multiple regressions for two empirical studies: Study 1 for effects on SIFs’ venture philanthropy practices, and Study 2 for effects on SIFs’ social and financial performance. Multiple imputations, diagnostic analyses, and several post hoc analyses are also conducted for robustness of data and results from multiple regression analyses. Results from these analyses find that EO and venture capital institutional forces both enhance SIFs’ venture philanthropy practices. A hypothesis postulated for a negative relationship between the nonprofit status and venture philanthropy practices is also supported. Results from moderated regression analyses, along with a subgroup and EO subdimension analyses, confirm a moderating effect between EO and the nonprofit status, i.e., a regulative institutional pressure. A positive relationship is found in EO- financial performance, but not in EO-social performance. While support is lent to hypotheses posited for a social/financial performance relationship with donors’/investors’ demand for social outcomes, and with the management team’s training in business, the overall results remain mixed for Study 2. Nonetheless, this dissertation appears to be the first study to theorize and test EO as a micro-level condition enabling organizations to strategically shape and resist institutional pressures, and it reinforces that organizations’ behavior is not merely a product of their passive conformity to environmental forces, but of the agency, also. As such, this study aims to contribute to scholarly efforts by the “agency camp” of the new institutionalism and EO, answering a call from the leading scholars of both EO (Miller) and the new institutionalism (Oliver).
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography