To see the other types of publications on this topic, follow the link: Generalized linear models.

Dissertations / Theses on the topic 'Generalized linear models'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Generalized linear models.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Mackinnon, Murray J. "Collinearity in generalized linear models." Thesis, University of British Columbia, 1986. http://hdl.handle.net/2429/25711.

Full text
Abstract:
The concept of collinearity for generalized linear models is introduced and compared to that for standard linear models. Two approaches for detecting collinearity are presented and shown to lead to the same diagnostic procedure. These are analysed for the Poisson, gamma, inverse Gaussian, pth order, binomial proportion and negative binomial models. A bound is derived for the degree of collinearity in a generalized linear model in terms of that of the standard linear model. Estimation methods based on ridge, prior likelihood and principal components are proposed, and briefly illustrated with a Monte Carlo simulation of a gamma model.
Business, Sauder School of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
2

Benghiat, Sonia. "Diagnostics for generalized linear models." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ64046.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Creagh-Osborne, Jane. "Latent variable generalized linear models." Thesis, University of Plymouth, 1998. http://hdl.handle.net/10026.1/1885.

Full text
Abstract:
Generalized Linear Models (GLMs) (McCullagh and Nelder, 1989) provide a unified framework for fixed effect models where response data arise from exponential family distributions. Much recent research has attempted to extend the framework to include random effects in the linear predictors. Different methodologies have been employed to solve different motivating problems, for example Generalized Linear Mixed Models (Clayton, 1994) and Multilevel Models (Goldstein, 1995). A thorough review and classification of this and related material is presented. In Item Response Theory (IRT) subjects are tested using banks of pre-calibrated test items. A useful model is based on the logistic function with a binary response dependent on the unknown ability of the subject. Item parameters contribute to the probability of a correct response. Within the framework of the GLM, a latent variable, the unknown ability, is introduced as a new component of the linear predictor. This approach affords the opportunity to structure intercept and slope parameters so that item characteristics are represented. A methodology for fitting such GLMs with latent variables, based on the EM algorithm (Dempster, Laird and Rubin, 1977) and using standard Generalized Linear Model fitting software GLIM (Payne, 1987) to perform the expectation step, is developed and applied to a model for binary response data. Accurate numerical integration to evaluate the likelihood functions is a vital part of the computational process. A study of the comparative benefits of two different integration strategies is undertaken and leads to the adoption, unusually, of Gauss-Legendre rules. It is shown how the fitting algorithms are implemented with GLIM programs which incorporate FORTRAN subroutines. Examples from IRT are given. A simulation study is undertaken to investigate the sampling distributions of the estimators and the effect of certain numerical attributes of the computational process. Finally a generalized latent variable model is developed for responses from any exponential family distribution.
APA, Harvard, Vancouver, ISO, and other styles
4

Vasconcelos, Julio Cezar Souza. "Modelo linear parcial generalizado simétrico." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-26072017-105153/.

Full text
Abstract:
Neste trabalho foi proposto o modelo linear parcial generalizado simétrico, com base nos modelos lineares parciais generalizados e nos modelos lineares simétricos, em que a variável resposta segue uma distribuição que pertence à família de distribuições simétricas, considerando um preditor linear que possui uma parte paramétrica e uma não paramétrica. Algumas distribuições que pertencem a essa classe são as distribuições: Normal, t-Student, Exponencial potência, Slash e Hiperbólica, dentre outras. Uma breve revisão dos conceitos utilizados ao longo do trabalho foram apresentados, a saber: análise residual, influência local, parâmetro de suavização, spline, spline cúbico, spline cúbico natural e algoritmo backfitting, dentre outros. Além disso, é apresentada uma breve teoria dos modelos GAMLSS (modelos aditivos generalizados para posição, escala e forma). Os modelos foram ajustados utilizando o pacote gamlss disponível no software livre R. A seleção de modelos foi baseada no critério de Akaike (AIC). Finalmente, uma aplicação é apresentada com base em um conjunto de dados reais da área financeira do Chile.
In this work we propose the symmetric generalized partial linear model, based on the generalized partial linear models and symmetric linear models, that is, the response variable follows a distribution that belongs to the symmetric distribution family, considering a linear predictor that has a parametric and a non-parametric component. Some distributions that belong to this class are distributions: Normal, t-Student, Power Exponential, Slash and Hyperbolic among others. A brief review of the concepts used throughout the work was presented, namely: residual analysis, local influence, smoothing parameter, spline, cubic spline, natural cubic spline and backfitting algorithm, among others. In addition, a brief theory of GAMLSS models is presented (generalized additive models for position, scale and shape). The models were adjusted using the package gamlss available in the free R software. The model selection was based on the Akaike criterion (AIC). Finally, an application is presented based on a set of real data from Chile\'s financial area.
APA, Harvard, Vancouver, ISO, and other styles
5

Stroinski, Krzysztof Jerzy. "Generalized linear models in motor insurance." Thesis, Heriot-Watt University, 1987. http://hdl.handle.net/10399/1044.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Holmberg, Henrik. "Generalized linear models with clustered data." Doctoral thesis, Umeå universitet, Statistik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-52902.

Full text
Abstract:
In situations where a large data set is partitioned into many relatively small groups, and where the members within a group have some common unmeasured characteristics, the number of parameters requiring estimation tends to increase with sample size if a fixed effects model is applied. This fact causes the assumptions underlying asymptotic results to be violated. The first paper in this thesis considers two possible solutions to this problem, a random intercepts model and a fixed effects model, where asymptotics are replaced by a simple form of bootstrapping. A profiling approach is introduced in the fixed effects case, which makes it computationally efficient even with a huge number of groups. The grouping effect is mainly seen as a nuisance in this paper. In the second paper the effect of misspecifying the distribution of the random effects in a generalized linear mixed model for binary data is studied. One problem with mixed effects models is that the distributional assumptions about the random effects are not easily checked from real data. Models with Gaussian, logistic and Cauchy distributional assumptions are used for parameter estimation on data simulated using the same three distributions. The eect of these assumptions on parameter estimation is presented. Two criteria for model selection are investigated, the Akaike information criterion and a criterion based on a X2 statistic. The estimators for fixed effects parameters are quite robust against misspecification of the random effects distribution, at least with the distributions used in this paper. Even when the true random effects distribution is Cauchy, models assuming a Gaussian or a logistic distribution regularly produce estimates with less bias. In the third paper the results from the first two papers are applied to infant mortality data. We found that there was significant clustering of infant mortality in the Skellefteå region in the years 1831-1890. An "ad hoc" method for comparing the magnitude of unexplained clustering after a model is applied is also presented. The last paper of this thesis is concerned with the problem of testing for spatial clustering caused by autocorrelation. A test that is robust against heteroscedasticity is proposed. In a simulation study the properties of the proposed statistic, K, are investigated. The power of the test based on K is compared to that of Moran's I in the simulation study. Both tests are then applied to mortality data from Swedish municipalities.
APA, Harvard, Vancouver, ISO, and other styles
7

Gory, Jeffrey J. "Marginally Interpretable Generalized Linear Mixed Models." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1497966698387606.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Jiang, Dingfeng. "Concave selection in generalized linear models." Diss., University of Iowa, 2012. https://ir.uiowa.edu/etd/2902.

Full text
Abstract:
A family of concave penalties, including the smoothly clipped absolute deviation (SCAD) and minimax concave penalties (MCP), has been shown to have attractive properties in variable selection. The computation of concave penalized solutions, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm to compute the solutions of concave penalized generalized linear models (GLM). In contrast to the existing algorithms that uses local quadratic or local linear approximation of the penalty, the MMCD majorizes the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy avoids the computation of scaling factors in iterative steps, hence improves the efficiency of coordinate descent. Under certain regularity conditions, we establish the theoretical convergence property of the MMCD algorithm. We implement this algorithm in a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size. Grouping structure among predictors exists in many regression applications. We first propose an l2 grouped concave penalty to incorporate such group information in a regression model. The l2 grouped concave penalty performs group selection and includes group Lasso as a special case. An efficient algorithm is developed and its theoretical convergence property is established under certain regularity conditions. The group selection property of the l2 grouped concave penalty is desirable in some applications; while in other applications selection at both group and individual levels is needed. Hence, we propose an l1 grouped concave penalty for variable selection at both individual and group levels. An efficient algorithm is also developed for the l1 grouped concave penalty. Simulation studies are performed to evaluate the finite-sample performance of the two grouped concave selection methods. The new grouped penalties are also used in analyzing two motivation datasets. The results from both the simulation and real data analyses demonstrate certain benefits of using grouped penalties. Therefore, the proposed concave group penalties are valuable alternatives to the standard concave penalties.
APA, Harvard, Vancouver, ISO, and other styles
9

Sammut, Fiona. "Using generalized linear models to model compositional response data." Thesis, University of Warwick, 2016. http://wrap.warwick.ac.uk/89876/.

Full text
Abstract:
This work proposes a multivariate logit model which models the influence of explanatory variables on continuous compositional response variables. This multivariate logit model generalizes an elegant method that was suggested previously by Wedderburn (1974) for the analysis of leaf blotch data in the special case of J = 2, leading to our naming this new approach as the generalized Wedderburn method. In contrast to the logratio modelling approach devised by Aitchison (1982, J. Roy Stat. Soc. B.), the multivariate logit model used under the generalized Wedderburn approach models the expectation of a compositional response variable directly and is also able to handle zeros in the data. The estimation of the parameters in the new model is carried out using the technique of generalized estimating equations (GEE). This technique relies on the specification of a working variance-covariance structure. A working variance-covariance structure which caters for the specific variability arising in compositional data is derived. The GEE estimator that is used to estimate the parameters of the multivariate logit model is shown to be invariant to the values of the correlation and dispersion parameters in the working variance-covariance structure. Due to this invariance property and the fact that the estimating equations used under the generalized Wedderburn method are linear and unbiased, the GEE estimator achieves full efficiency across a wide class of potential dispersion and correlation matrices for the compositional response variables. As for any other GEE estimator, the estimator used in the generalized Wedderburn method is also asymptotically unbiased and consistent, provided that the marginal mean model specification is correct. The theoretical results derived in this thesis are substantiated by simulation experiments, and properties of the new model are also studied empirically on some classic datasets from the literature.
APA, Harvard, Vancouver, ISO, and other styles
10

Zulj, Valentin. "On The Jackknife Averaging of Generalized Linear Models." Thesis, Uppsala universitet, Statistiska institutionen, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-412831.

Full text
Abstract:
Frequentist model averaging has started to grow in popularity, and it is considered a good alternative to model selection. It has recently been applied favourably to gen- eralized linear models, where it has mainly been purposed to aid the prediction of probabilities. The performance of averaging estimators has largely been compared to that of models selected using AIC or BIC, without much discussion of model screening. In this paper, we study the performance of model averaging in classification problems, and evaluate performances with reference to a single prediction model tuned using cross-validation. We discuss the concept of model screening and suggest two methods of constructing a candidate model set; averaging over the models that make up the LASSO regularization path, and the so called LASSO-GLM hybrid. By means of a Monte Carlo simulation study, we conclude that model averaging does not necessarily offer any improvement in classification rates. In terms of risk, however, we see that both methods of model screening are efficient, and their errors are more stable than those achieved by the cross-validated model of comparison.
APA, Harvard, Vancouver, ISO, and other styles
11

Li, Zuojing. "Longitudinal data analysis using generalized linear models." Thesis, University of Ottawa (Canada), 2006. http://hdl.handle.net/10393/27267.

Full text
Abstract:
In this work we examine various conditions under which the usual asymptotic results (i.e. the weak consistency, the asymptotic normality and the strong consistency) hold for the regressor parameter beta which arises in a linear model (Chapter 2), a generalized linea model (GLM) with a fully specified likelihood (Chapter 3) or as a root of the generalized estimating equation (GEE) associated with a sequence of longitudinal observations (Chapter 4). Our main references for each of these chapters are [12], [9], respectively [20]. We provide detailed proofs of the results found in the above-mentioned references, and we extend the results of [9] to, the case of stochastic regressors (Section 3.4). Finally, in Chapter 5, we identify a fundamental mistake appearing in the recent article [4], which examines the strong consistency of the regressor parameter beta in a GLM for which the likelihood of the density is not specified. In Section 5.2, we give a correction to the main theorem of [4], as well as some new results concerning the weak consistency and asymptotic normality of beta.
APA, Harvard, Vancouver, ISO, and other styles
12

Hamzah, Nor Aishah. "Robust regression estimation in generalized linear models." Thesis, University of Bristol, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294372.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Byrne, Evan. "Inference in Generalized Linear Models with Applications." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1555152640361367.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Min, Min. "Asymptotic normality in generalized linear mixed models." College Park, Md.: University of Maryland, 2007. http://hdl.handle.net/1903/7758.

Full text
Abstract:
Thesis (Ph. D.) -- University of Maryland, College Park, 2007.
Thesis research directed by: Dept. of Mathematics. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
APA, Harvard, Vancouver, ISO, and other styles
15

Utami, Zuliana Sri. "Penalized regression methods with application to generalized linear models, generalized additive models, and smoothing." Thesis, University of Essex, 2017. http://repository.essex.ac.uk/20908/.

Full text
Abstract:
Recently, penalized regression has been used for dealing problems which found in maximum likelihood estimation such as correlated parameters and a large number of predictors. The main issues in this regression is how to select the optimal model. In this thesis, Schall’s algorithm is proposed as an automatic selection of weight of penalty. The algorithm has two steps. First, the coefficient estimates are obtained with an arbitrary penalty weight. Second, an estimate of penalty weight λ can be calculated by the ratio of the variance of error and the variance of coefficient. The iteration is continued from step one until an estimate of penalty weight converge. The computational cost is minimized because the optimal weight of penalty could be obtained within a small number of iterations. In this thesis, Schall’s algorithm is investigated for ridge regression, lasso regression and two-dimensional histogram smoothing. The proposed algorithm are applied to real data sets and simulation data sets. In addition, a new algorithm for lasso regression is proposed. The performance of results of the algorithm was almost comparable in all applications. Schall’s algorithm can be an efficient algorithm for selection of weight of penalty.
APA, Harvard, Vancouver, ISO, and other styles
16

Carlos, Monteiro Ponce de Leon Antonio. "Optimum experimental design for model discrimination and generalized linear models." Thesis, London School of Economics and Political Science (University of London), 1993. http://etheses.lse.ac.uk/2434/.

Full text
Abstract:
The main subject of this thesis concerns the optimum design of experiments for discriminating between two rival mathematical models. In addition, optimality of designs for parameter estimation is investigated although restricted to binary response models. Optimal design theory and generalized linear models form the background for this work. The former provides the tools for construction of the optimum designs whereas the latter provides the framework in which the methods are developed. For model discrimination the procedures which are proposed may not only be applied to compare two regression models but also to compare two generalized linear models as long as they belong to the same subclass. The principle of the so called T-optimality criterion, originally introduced for discriminating between two regression models, is extended to other classes such as generalized linear models. Within each context a theorem based on the General Equivalence Theorem from the optimal design theory is shown to hold thus allowing both constructing and checking optimum designs. Optimum experimental designs to estimate the parameters of a binary response model is the other subject of this thesis. Initially, well known link functions such as logit, probit and complementary log-log are considered. Later, this range is widened by introducing a family of link functions which includes the logit and the complementary log-log links as particular members. One common feature of these two problems is that classical optimal designs depend on the unknown values of the model parameters. Therefore, only locally optimal designs can be obtained unless observations may be taken sequentially, in which case several methods to search for the optimum are available in the literature. As an alternative to locally and sequentially optimal experiments, Bayesian designs are introduced for both model discrimination and parameter estimation.
APA, Harvard, Vancouver, ISO, and other styles
17

Emond, Mary Jane. "Efficient estimation in the generalized semilinear model /." Thesis, Connect to this title online; UW restricted, 1993. http://hdl.handle.net/1773/9543.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Petry, Sebastian. "Regularization approaches for generalized linear models and single index models." Diss., lmu, 2011. http://nbn-resolving.de/urn:nbn:de:bvb:19-143983.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Dunn, Peter Kenneth. "Likelihood-based inference for tweedie generalized linear models /." St. Lucia, Qld, 2001. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe16472.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Rusch, Thomas, and Achim Zeileis. "Gaining Insight with Recursive Partitioning of Generalized Linear Models." Taylor and Francis, 2013. http://dx.doi.org/10.1080/00949655.2012.658804.

Full text
Abstract:
Recursive partitioning algorithms separate a feature space into a set of disjoint rectangles. Then, usually, a constant in every partition is fitted. While this is a simple and intuitive approach, it may still lack interpretability as to how a specific relationship between dependent and independent variables may look. Or it may be that a certain model is assumed or of interest and there is a number of candidate variables that may non-linearly give rise to different model parameter values. We present an approach that combines generalized linear models with recursive partitioning that offers enhanced interpretability of classical trees as well as providing an explorative way to assess a candidate variable's in uence on a parametric model. This method conducts recursive partitioning of a generalized linear model by (1) fitting the model to the data set, (2) testing for parameter instability over a set of partitioning variables, (3) splitting the data set with respect to the variable associated with the highest instability. The outcome is a tree where each terminal node is associated with a generalized linear model. We will show the method's versatility and suitability to gain additional insight into the relationship of dependent and independent variables by two examples, modelling voting behaviour and a failure model for debt amortization, and compare it to alternative approaches.
APA, Harvard, Vancouver, ISO, and other styles
21

Rusch, Thomas, and Achim Zeileis. "Gaining Insight With Recursive Partitioning Of Generalized Linear Models." WU Vienna University of Economics and Business, 2011. http://epub.wu.ac.at/3143/6/paperEPUB.pdf.

Full text
Abstract:
Recursive partitioning algorithms separate a feature space into a set of disjoint rectangles. Then, usually, a constant in every partition is fitted. While this is a simple and intuitive approach, it may still lack interpretability as to how a specific relationship between dependent and independent variables may look. Or it may be that a certain model is assumed or of interest and there is a number of candidate variables that may non-linearily give rise to different model parameter values. We present an approach that combines generalized linear models with recursive partitioning that offers enhanced interpretability of classical trees as well as providing an explorative way to assess a candidate variable's influence on a parametric model. This method conducts recursive partitioning of a the generalized linear model by (1) fitting the model to the data set, (2) testing for parameter instability over a set of partitioning variables, (3) splitting the data set with respect to the variable associated with the highest instability. The outcome is a tree where each terminal node is associated with a generalized linear model. We will show the methods versatility and suitability to gain additional insight into the relationship of dependent and independent variables by two examples, modelling voting behaviour and a failure model for debt amortization.
Series: Research Report Series / Department of Statistics and Mathematics
APA, Harvard, Vancouver, ISO, and other styles
22

Ulbricht, Jan [Verfasser]. "Variable Selection in Generalized Linear Models / Jan Ulbricht." München : Verlag Dr. Hut, 2010. http://d-nb.info/1008331422/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Sidumo, Bonelwa. "Generalized linear models, with applications in fisheries research." Thesis, Rhodes University, 2018. http://hdl.handle.net/10962/61102.

Full text
Abstract:
Gambusia affinis (G. affinis) is an invasive fish species found in the Sundays River Valley of the Eastern Cape, South Africa, The relative abundance and population dynamics of G. affinis were quantified in five interconnected impoundments within the Sundays River Valley, This study utilised a G. affinis data set to demonstrate various, classical ANOVA models. Generalized linear models were used to standardize catch per unit effort (CPUE) estimates and to determine environmental variables which influenced the CPUE, Based on the generalized linear model results dam age, mean temperature, Oreochromis mossambicus abundance and Glossogobius callidus abundance had a significant effect on the G. affinis CPUE. The Albany Angling Association collected data during fishing tag and release events. These data were utilized to demonstrate repeated measures designs. Mixed-effects models provided a powerful and flexible tool for analyzing clustered data such as repeated measures data and nested data, lienee it has become tremendously popular as a framework for the analysis of bio-behavioral experiments. The results show that the mixed-effects methods proposed in this study are more efficient than those based on generalized linear models. These data were better modeled with mixed-effects models due to their flexibility in handling missing data.
APA, Harvard, Vancouver, ISO, and other styles
24

Bate, Steven Mark. "Generalized linear models for large dependent data sets." Thesis, University College London (University of London), 2004. http://discovery.ucl.ac.uk/1446542/.

Full text
Abstract:
Generalized linear models (GLMs) were originally used to build regression models for independent responses. In recent years, however, effort has focused on extending the original GLM theory to enable it to be applied to data which exhibit dependence in the responses. This thesis focuses on some specific extensions of the GLM theory for dependent responses. A new hypothesis testing technique is proposed for the application of GLMs to cluster dependent data. The test is based on an adjustment to the 'independence' likelihood ratio test, which allows for the within cluster dependence. The performance of the new test, in comparison to established techniques, is explored. The application of the generalized estimating equations (GEE) methodology to model space-time data is also investigated. The approach allows for the temporal dependence via the covariates and models the spatial dependence using techniques from geostatistics. The application area of climatology has been used to motivate much of the work undertaken. A key attribute of climate data sets, in addition to exhibiting dependence both spatially and temporally, is that they are typically large in size, often running into millions of observations. Therefore, throughout the thesis, particular attention has focused on computational issues, to enable analysis to be undertaken in a feasible time frame. For example, we investigate the use of the GEE one-step estimator in situations where the application of the full algorithm is impractical. The final chapter of this thesis presents a climate case study. This involves wind speeds over northwestern Europe, which we analyse using the techniques developed.
APA, Harvard, Vancouver, ISO, and other styles
25

Zhang, Ying. "Bayesian D-Optimal Design for Generalized Linear Models." Diss., Virginia Tech, 2006. http://hdl.handle.net/10919/30147.

Full text
Abstract:
Bayesian optimal designs have received increasing attention in recent years, especially in biomedical and clinical trials. Bayesian design procedures can utilize the available prior information of the unknown parameters so that a better design can be achieved. However, a difficulty in dealing with the Bayesian design is the lack of efficient computational methods. In this research, a hybrid computational method, which consists of the combination of a rough global optima search and a more precise local optima search, is proposed to efficiently search for the Bayesian D-optimal designs for multi-variable generalized linear models. Particularly, Poisson regression models and logistic regression models are investigated. Designs are examined for a range of prior distributions and the equivalence theorem is used to verify the design optimality. Design efficiency for various models are examined and compared with non-Bayesian designs. Bayesian D-optimal designs are found to be more efficient and robust than non-Bayesian D-optimal designs. Furthermore, the idea of the Bayesian sequential design is introduced and the Bayesian two-stage D-optimal design approach is developed for generalized linear models. With the incorporation of the first stage data information into the second stage, the two-stage design procedure can improve the design efficiency and produce more accurate and robust designs. The Bayesian two-stage D-optimal designs for Poisson and logistic regression models are evaluated based on simulation studies. The Bayesian two-stage optimal design approach is superior to the one-stage approach in terms of a design efficiency criterion.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
26

Winterer, Carrie Genevieve. "Predicting Twitter Time Series Using Generalized Linear Models." Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case1528400283595732.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Stephenson, William T. (William Thomas). "Approximate cross validation for sparse generalized linear models." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/121742.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 59-60).
Cross validation (CV) is an effective yet computationally expensive tool for assessing the out of sample error for many methods in machine learning and statistics. Previous work has shown that methods to approximate CV can be very accurate and computationally cheap, but only for low dimensional problems. In this thesis, a modification of existing methods is developed to extend the high accuracy of these techniques to high dimensional settings.
by William T. Stephenson.
S.M.
S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
28

Greenaway, Mark Jonathan. "Numerically Stable Approximate Bayesian Methods for Generalized Linear Mixed Models and Linear Model Selection." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20233.

Full text
Abstract:
Approximate Bayesian inference methods offer methodology for fitting Bayesian models as fast alternatives to Markov Chain Monte Carlo methods that sometimes have only a slight loss of accuracy. In this thesis, we consider variable selection for linear models, and zero inflated mixed models. Variable selection for linear regression models are ubiquitous in applied statistics. We use the popular g-prior (Zellner, 1986) for model selection of linear models with normal priors where g is a prior hyperparameter. We derive exact expressions for the model selection Bayes Factors in terms of special functions depending on the sample size, number of covariates and R-squared of the model. We show that these expressions are accurate, fast to evaluate, and numerically stable. An R package blma for doing Bayesian linear model averaging using these exact expressions has been released on GitHub. We extend the Particle EM method of (Rockova, 2017) using Particle Variational Approximation and the exact posterior marginal likelihood expressions to derive a computationally efficient algorithm for model selection on data sets with many covariates. Our algorithm performs well relative to existing algorithms, completing in 8 seconds on a model selection problem with a sample size of 600 and 7200 covariates. We consider zero-inflated models that have many applications in areas such as manufacturing and public health, but pose numerical issues when fitting them to data. We apply a variational approximation to zero-inflated Poisson mixed models with Gaussian distributed random effects using a combination of VB and the Gaussian Variational Approximation (GVA). We also incorporate a novel parameterisation of the covariance of the GVA using the Cholesky factor of the precision matrix, similar to Tan and Nott (2018) to resolve associated numerical difficulties.
APA, Harvard, Vancouver, ISO, and other styles
29

Hercz, Daniel. "Flexible modeling with generalized additive models and generalized linear mixed models: comprehensive simulation and case studies." Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=114300.

Full text
Abstract:
This thesis compares GAMs and GLMMs in the context of modeling nonlinear curves. The study contains a comprehensive simulation and a few real life data analyses. The simulation uses thousands of generated datasets to compare and contrast the two models' (and linear models as a benchmark) fit, extent of nonlinearity, and shape of the resulting curve. The data analyses extend the results of the simulation to GLMM/GAM curves of lung function with measures of smoking as the independent variable. An additional and larger real life data analysis with dichotomous outcomes rounds out the study and allow for more representative results.
Cette these compare des GAM et GLMM dans le cadre de la modélisation des courbes non-linéaires. L'étude comprend une simulation complète et quelques analyses réelles. La simulation utilise des milliers de 'datasets' générés pour comparer forme entres les deux modèles (et les modèles linéaires comme point de repère), l'étendue de la non-linéarité, et la forme de la courbe obtenue. Les analyses d'étendre les résultats de la simulation à courbes de la fonction pulmonaire avec de GLMM / GAM avec mesures du tabagisme (la variable indépendante). Un autre analyse réelle avec les résultats dichotomiques complète l'étude et que les résultats soient plus représentatifs.
APA, Harvard, Vancouver, ISO, and other styles
30

Feng, Zhenghui. "Estimation and selection in additive and generalized linear models." HKBU Institutional Repository, 2012. https://repository.hkbu.edu.hk/etd_ra/1435.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Ma, Renjun. "An orthodox BLUP approach to generalized linear mixed models." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape10/PQDD_0024/NQ38934.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Ahlgren, Marcus. "Claims Reserving using Gradient Boosting and Generalized Linear Models." Thesis, KTH, Matematisk statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229406.

Full text
Abstract:
One fundamental function of an insurance company revolves around calculating the expected claims costs for which the insurer has to compensate its policyholders for. This is the process of claims reserving which is practised by actuaries using statistical methods. Over the last few decades statistical learning methods have become increasingly popular due to their ability to find complex patterns in any type of data. However, they have not been widely adapted within the insurance sector. In this thesis we evaluate the capability of claims reserving with the method of gradient boosting, a non-parametric statistical learning method that has proven to be successful within multiple other disciplines which has made it very popular. The gradient boosting technique is compared with the generalized linear model(GLM) which is widely used for modelling claims. We compare the models by using a claims data set provided by Länsförsäkringar AB which allows us to train the models and evaluate their performance on data not yet seen by the models. The models were implemented using R. The results show that the GLM has a lower prediction error. Also, the gradient boosting method requires more fine tuning to handle claims data properly while the GLM already possesses certain features that makes it suitable for claims reserving without making as many adjustments in the model implementation. The advantage of capturing complex dependencies in data is not fully utilized in this thesis since we only work with 6 predictor variables. It is more likely that gradient boosting can compete with GLM when predicting more complicated claims.
En av de centrala verksamheterna ett försäkringsbolag arbetar med handlar om att uppskatta skadekostnader för att kunna ersätta försäkringstagarna. Denna procedur kallas reservsättning och utförs av aktuarier med hjälp av statistiska metoder. Under de senaste årtiondena har statistiska inlärningsmetoder blivit mer och mer populära tack vare deras förmåga att hitta komplexa mönster i alla typer av data. Dock har intresset för dessa varit relativt lågt inom försäkringsbranschen till förmån för mer traditionella försäkringsmatematiska metoder. I den här masteruppsatsen undersöker vi förmågan att reservsätta med metoden \textit{gradient boosting}, en icke-parametrisk statistisk inlärningsmetod som har visat sig fungera mycket väl inom en rad andra områden vilket har gjort metoden mycket populär. Vi jämför denna metod med generaliserade linjära modeller(GLM) som är en av de vanliga metoderna vid reservsättning. Vi jämför modellerna med hjälp av ett dataset tillhandahålls av Länsförsäkringar AB. Modellerna implementerades med R. 80\% av detta dataset används för att träna modellerna och resterande 20\% används för att evaluera modellernas prediktionsförmåga på okänd data. Resultaten visar att GLM har ett lägre prediktionsfel. Gradient boosting kräver att ett antal hyperparametrar justeras manuellt för att få en välfungerande modell medan GLM inte kräver lika mycket korrigeringar varför den är mer praktiskt lämpad. Fördelen med att kunna modellerna komplexa förhållanden i data utnyttjas inte till fullo i denna uppsats då vi endast arbetar med sex prediktionsvariabler. Det är sannolikt att gradient boosting skulle ge bättre resultat med mer komplicerade datastrukturer.​
APA, Harvard, Vancouver, ISO, and other styles
33

Tang, On-yee, and 鄧安怡. "Estimation for generalized linear mixed model via multipleimputations." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2005. http://hub.hku.hk/bib/B30687652.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Yam, Ho-kwan, and 任浩君. "On a topic of generalized linear mixed models and stochastic volatility model." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2002. http://hub.hku.hk/bib/B29913342.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Petry, Sebastian [Verfasser]. "Regularization Approaches for Generalized Linear Models and Single Index Models / Sebastian Petry." München : Verlag Dr. Hut, 2012. http://d-nb.info/1023435241/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Wu, Ka-yui Karl, and 胡家銳. "On some extensions of generalized linear models with varying dispersion." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B48199370.

Full text
Abstract:
When dealing with exponential family distributions, a constant dispersion is often assumed since it simplifies both model formulation and estimation. In contrast, heteroscedasticity is a common feature of almost every empirical data set. In this dissertation, the dispersion parameter is no longer considered as constant throughout the entire sample, but defined as the expected deviance of the individual response yi and its expected value _i such that it will be expressed as a linear combination of some covariates and their coefficients. At the same time, the dispersion regression is an essential part of a double Generalized Linear Model in which mean and dispersion are modelled in two interlinked and pseudo-simultaneously estimated submodels. In other words, the deviance is a function of the response mean which on the other hand depends on the dispersion. Due to the mutual dependency, the estimation algorithm will be iterated as long as the improvement of the one parameter leads to significant changes of the other until it is not the case. If appropriate covariates are chosen, the model’s goodness of fit should be improved by the property that the dispersion is estimated by external information instead of being a constant. In the following, the advantage of dispersion modelling will be shown by its application on three different types of data: a) zero-inflated data, b) non-linear time series data, and c) clinical trials data. All these data follow distributions of the exponential family for which the application of the Generalized Linear Model is justified, but require certain extensions of modelling methodologies. In this dissertation, The enhanced goodness of fit given that the constant dispersion assumption is dropped will be shown in the above listed examples. In fact, by formulating and carrying out score and Wald tests on testing for the possible occurrence of varying dispersion, evidence of heterogeneous dispersion could be found to be present in the data sets considered. Furthermore, although model formulation, asymptotic properties and computational effort are more extensive when dealing with the double models, the benefits and advantages in terms of improved fitting results and more efficient parameter estimates appear to justify the additional effort not only for the types of data introduced, but also generally for empirical data analysis, on different types of data as well.
published_or_final_version
Statistics and Actuarial Science
Doctoral
Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
37

Ten, Eyck Patrick. "Problems in generalized linear model selection and predictive evaluation for binary outcomes." Diss., University of Iowa, 2015. https://ir.uiowa.edu/etd/6003.

Full text
Abstract:
This manuscript consists of three papers which formulate novel generalized linear model methodologies. In Chapter 1, we introduce a variant of the traditional concordance statistic that is associated with logistic regression. This adjusted c − statistic as we call it utilizes the differences in predicted probabilities as weights for each event/non- event observation pair. We highlight an extensive comparison of the adjusted and traditional c-statistics using simulations and apply these measures in a modeling application. In Chapter 2, we feature the development and investigation of three model selection criteria based on cross-validatory c-statistics: Model Misspecification Pre- diction Error, Fitting Sample Prediction Error, and Sum of Prediction Errors. We examine the properties of the corresponding selection criteria based on the cross- validatory analogues of the traditional and adjusted c-statistics via simulation and illustrate these criteria in a modeling application. In Chapter 3, we propose and investigate an alternate approach to pseudo- likelihood model selection in the generalized linear mixed model framework. After outlining the problem with the pseudo-likelihood model selection criteria found using the natural approach to generalized linear mixed modeling, we feature an alternate approach, implemented using a SAS macro, that obtains and applies the pseudo-data from the full model for fitting all candidate models. We justify the propriety of the resulting pseudo-likelihood selection criteria using simulations and implement this new method in a modeling application.
APA, Harvard, Vancouver, ISO, and other styles
38

Tang, On-yee. "Estimation for generalized linear mixed model via multiple imputations." Click to view the E-thesis via HKUTO, 2005. http://sunzi.lib.hku.hk/hkuto/record/B30687652.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Holanda, Amanda Amorim. "Modelos lineares parciais aditivos generalizados com suavização por meio de P-splines." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-31052018-113859/.

Full text
Abstract:
Neste trabalho apresentamos os modelos lineares parciais generalizados com uma variável explicativa contínua tratada de forma não paramétrica e os modelos lineares parciais aditivos generalizados com no mínimo duas variáveis explicativas contínuas tratadas de tal forma. São utilizados os P-splines para descrever a relação da variável resposta com as variáveis explicativas contínuas. Sendo assim, as funções de verossimilhança penalizadas, as funções escore penalizadas e as matrizes de informação de Fisher penalizadas são desenvolvidas para a obtenção das estimativas de máxima verossimilhança penalizadas por meio da combinação do algoritmo backfitting (Gauss-Seidel) e do processo iterativo escore de Fisher para os dois tipos de modelo. Em seguida, são apresentados procedimentos para a estimação do parâmetro de suavização, bem como dos graus de liberdade efetivos. Por fim, com o objetivo de ilustração, os modelos propostos são ajustados à conjuntos de dados reais.
In this work we present the generalized partial linear models with one continuous explanatory variable treated nonparametrically and the generalized additive partial linear models with at least two continuous explanatory variables treated in such a way. The P-splines are used to describe the relationship among the response and the continuous explanatory variables. Then, the penalized likelihood functions, penalized score functions and penalized Fisher information matrices are derived to obtain the penalized maximum likelihood estimators by the combination of the backfitting (Gauss-Seidel) algorithm and the Fisher escoring iterative method for the two types of model. In addition, we present ways to estimate the smoothing parameter as well as the effective degrees of freedom. Finally, for the purpose of illustration, the proposed models are fitted to real data sets.
APA, Harvard, Vancouver, ISO, and other styles
40

Dagalp, Rukiye Esener. "Estimators For Generalized Linear Measurement Error Models With Interaction Terms." NCSU, 2001. http://www.lib.ncsu.edu/theses/available/etd-20011019-142524.

Full text
Abstract:

The primary objectives of this research are to develop andstudy estimators for generalized linear measurement errormodels when the mean function contains error-free predictorsas well as predictors measured with error and interactions between error-free and error-prone predictors. Attention is restricted to generalized linear models in canonical form with independent additive Gaussian measurement error in the error-prone predictors.Estimators appropriate for the functional (Fuller, 1987, Ch.1) version of the measurement error model are derived and studied. The estimators are also appropriate in the structural version of the model and thus the methods developed in this research are functional in the sense of Carroll, Ruppert and Stefanski (1995, Ch. 6).The primary approach to the development of estimators in this research is the conditional-score method proposed byStefanski and Carroll (1987) and described by Carroll et al.(1995, Ch. 6). Sufficient statistics for the unobserved predictors are obtained and the conditional distribution of the observed data given these sufficient statistics is derived. The latter admits unbiased score functions that arefree of the nuisance parameters (the unobserved predictors) and are used to construct unbiased estimating equations for model parameters.Estimators for the parameters of the model of interest are also derived using the corrected approach proposed by Nakamura (1990) and Stefanski (1989). These are also functional estimators in the sense of Carroll et al. (1995, Ch. 6) that are less dependent on the exponential-family model assumptions and thus provide a benchmark against whichto compare the conditional-score estimators.Large-sample distribution approximations for both theconditional-score and corrected-score estimators are derivedand the performance of the estimators and the adequacy of the large-sample distribution theory are studied via Monte Carlo simulation.

APA, Harvard, Vancouver, ISO, and other styles
41

Merl, Daniel M. "Detecting patterns of natural selection using bayesian generalized linear models /." Diss., Digital Dissertations Database. Restricted to UC campuses, 2006. http://uclibs.org/PID/11984.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Pan, Yiyang. "A robust fit for generalized partial linear partial additive models." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44647.

Full text
Abstract:
In regression studies, semi-parametric models provide both flexibility and interpretability. In this thesis, we focus on a robust model fitting algorithm for a family of semi-parametric models – the Generalized Partial Linear Partial Addi- tive Models (GAPLMs), which is a hybrid of the widely-used Generalized Linear Models (GLMs) and Generalized Additive Models (GAMs). The traditional model fitting algorithms are mainly based on likelihood proce- dures. However, the resulting fits can be severely distorted by the presence of a small portion of atypical observations (also known as “outliers”), which deviate from the assumed model. Furthermore, the traditional model diag- nostic methods might also fail to detect outliers. In order to systematically solve these problems, we develop a robust model fitting algorithm which is resistant to the effect of outliers. Our method combines the backfitting algorithm and the generalized Speckman estimator to fit the “partial linear partial additive” styled models. Instead of using the likelihood-based weights and adjusted response from the generalized local scoring algorithm (GLSA), we apply the robust weights and adjusted response derived form the robust quasi-likelihood proposed by Cantoni and Ronchetti (2001). We also extend previous methods by proposing a model prediction algorithm for GAPLMs. To compare our robust method with the non-robust one given by the R function gam::gam(), which uses the backfitting algorithm and the GLSA, we report the results of a simulation study. The simulation results show that our robust fit can effectively resist the damage of outliers and it performs similarly to non-robust fit in clean datasets. Moreover, our robust algorithm is observed to be helpful in identifying outliers, by comparing the fitted values with the observed response variable. In the end, we apply our method to analyze the global phytoplankton data. We interpret the outliers reported by our robust fit with an exploratory analysis and we see some interesting patterns of those outliers in the dataset. We believe our result can provide more information for the relative research.
APA, Harvard, Vancouver, ISO, and other styles
43

鄧沛權 and Pui-kuen Tang. "Bayesian analysis of errors-in-variables in generalized linear models." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1992. http://hub.hku.hk/bib/B31232802.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Tang, Pui-kuen. "Bayesian analysis of errors-in-variables in generalized linear models /." [Hong Kong : University of Hong Kong], 1992. http://sunzi.lib.hku.hk/hkuto/record.jsp?B1325330X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Yan, Huey. "Generalized Minimum Penalized Hellinger Distance Estimation and Generalized Penalized Hellinger Deviance Testing for Generalized Linear Models: The Discrete Case." DigitalCommons@USU, 2001. https://digitalcommons.usu.edu/etd/7066.

Full text
Abstract:
In this dissertation, robust and efficient alternatives to quasi-likelihood estimation and likelihood ratio tests are developed for discrete generalized linear models. The estimation method considered is a penalized minimum Hellinger distance procedure that generalizes a procedure developed by Harris and Basu for estimating parameters of a single discrete probability distribution from a random sample. A bootstrap algorithm is proposed to select the weight of the penalty term. Simulations are carried out to compare the new estimators with quasi-likelihood estimation. The robustness of the estimation procedure is demonstrated by simulation work and by Hapel's α-influence curve. Penalized minimum Hellinger deviance tests for goodness-of-fit and for testing nested linear hypotheses are proposed and simulated. A nonparametric bootstrap algorithm is proposed to obtain critical values for the testing procedure.
APA, Harvard, Vancouver, ISO, and other styles
46

Sima, Adam. "Accounting for Model Uncertainty in Linear Mixed-Effects Models." VCU Scholars Compass, 2013. http://scholarscompass.vcu.edu/etd/2950.

Full text
Abstract:
Standard statistical decision-making tools, such as inference, confidence intervals and forecasting, are contingent on the assumption that the statistical model used in the analysis is the true model. In linear mixed-effect models, ignoring model uncertainty results in an underestimation of the residual variance, contributing to hypothesis tests that demonstrate larger than nominal Type-I errors and confidence intervals with smaller than nominal coverage probabilities. A novel utilization of the generalized degrees of freedom developed by Zhang et al. (2012) is used to adjust the estimate of the residual variance for model uncertainty. Additionally, the general global linear approximation is extended to linear mixed-effect models to adjust the standard errors of the parameter estimates for model uncertainty. Both of these methods use a perturbation method for estimation, where random noise is added to the response variable and, conditional on the observed responses, the corresponding estimate is calculated. A simulation study demonstrates that when the proposed methodologies are utilized, both the variance and standard errors are inflated for model uncertainty. However, when a data-driven strategy is employed, the proposed methodologies show limited usefulness. These methods are evaluated with a trial assessing the performance of cervical traction in the treatment of cervical radiculopathy.
APA, Harvard, Vancouver, ISO, and other styles
47

Maekawa, Eduardo Shigueiti. "Estimativa do custo da colheita mecanizada de cana-de-açúcar utilizando modelos de regressão." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/11/11152/tde-30092016-101059/.

Full text
Abstract:
A colheita mecanizada é uma das mais significativas e onerosas operações do processo de produção de cana-de-açúcar, tornando-se importante o entendimento das relações que envolvem o seu custo. Atualmente, as metodologias para estimar o custo da colheita partem do conceito de custo fixo e variável. No entanto, considerando a complexidade desse processo, faz-se necessário avaliar métodos capazes de relacionar os parâmetros operacionais com o custo final. Neste contexto, a modelagem estatística por meio da regressão permite tratar tais relações e prever tendências. O objetivo deste trabalho foi desenvolver um modelo empírico para o cálculo do custo da colheita mecanizada de cana-de-açúcar. Desenvolveu-se um modelo linear generalizado (MLG) e um modelo linear generalizado misto (MLGM) ambos com distribuição gama, utilizando indicadores operacionais e dados de custo de 20 usinas do setor sucroalcooleiro. Por meio do MLGM, obteve-se uma aderência satisfatória quando comparado aos modelos MLG, nulo (média) e linear (supondo normalidade). Os indicadores que explicaram o custo foram: produtividade (t maq-1), consumo (l t-1), horímetro (h) e número de operadores por colhedora (nop).
The mechanized harvesting of sugarcane is one of the most significant and costly operations of the production process, thus it is important to understand the relationships involving its cost. Currently, methods to estimate these costs rise from the concept of fixed and variable cost. However, considering the complexity of the harvesting process, it is necessary to evaluate techniques to relate the operating parameters with the final cost. In this context, statistical modeling by regression allows to treat such relationship and predict trends. The objective of this study was to develop an empirical model to calculate the cost of mechanical harvesting of sugarcane. A generalized linear model (GLM) and a generalized linear mixed model (GLMM) both with gamma distribution was developed using operational indicators and cost data from 20 plants in the sugarcane industry. Through the GLMM, satisfactory adhesion was obtained when compared to the GLM, null model (average) and linear (assuming normality). The indicators that explained the cost were: productivity (t mach-1), consumption (l t-1), hourmeter (h) and number of operators per harvester (nop).
APA, Harvard, Vancouver, ISO, and other styles
48

Celik, Gul. "Parameter Estimation In Generalized Partial Linear Models With Conic Quadratic Programming." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12612531/index.pdf.

Full text
Abstract:
In statistics, regression analysis is a technique, used to understand and model the relationship between a dependent variable and one or more independent variables. Multiple Adaptive Regression Spline (MARS) is a form of regression analysis. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models non-linearities and interactions. MARS is very important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. In our study, we analyzed Generalized Partial Linear Models (GPLMs), which are particular semiparametric models. GPLMs separate input variables into two parts and additively integrates classical linear models with nonlinear model part. In order to smooth this nonparametric part, we use Conic Multiple Adaptive Regression Spline (CMARS), which is a modified form of MARS. MARS is very benefical for high dimensional problems and does not require any particular class of relationship between the regressor variables and outcome variable of interest. This technique offers a great advantage for fitting nonlinear multivariate functions. Also, the contribution of the basis functions can be estimated by MARS, so that both the additive and interaction effects of the regressors are allowed to determine the dependent variable. There are two steps in the MARS algorithm: the forward and backward stepwise algorithms. In the first step, the model is constructed by adding basis functions until a maximum level of complexity is reached. Conversely, in the second step, the backward stepwise algorithm reduces the complexity by throwing the least significant basis functions from the model. In this thesis, we suggest not using backward stepwise algorithm, instead, we employ a Penalized Residual Sum of Squares (PRSS). We construct PRSS for MARS as a Tikhonov Regularization Problem. We treat this problem using continuous optimization techniques which we consider to become an important complementary technology and alternative to the concept of the backward stepwise algorithm. Especially, we apply the elegant framework of Conic Quadratic Programming (CQP) an area of convex optimization that is very well-structured, hereby, resembling linear programming and, therefore, permitting the use of interior point methods. At the end of this study, we compare CQP with Tikhonov Regularization problem for two different data sets, which are with and without interaction effects. Moreover, by using two another data sets, we make a comparison between CMARS and two other classification methods which are Infinite Kernel Learning (IKL) and Tikhonov Regularization whose results are obtained from the thesis, which is on progress.
APA, Harvard, Vancouver, ISO, and other styles
49

Karlsson, Sofia. "Purchase behaviour analysis in the retail industry using Generalized Linear Models." Thesis, KTH, Matematisk statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-234684.

Full text
Abstract:
This master thesis uses applied mathematicalstatistics to analyse purchase behaviour based on customer data of the Swedishbrand Indiska. The aim of the study is to build a model that can helppredicting the sales quantities of different product classes and identify whichfactors are the most significant in the different models and furthermore, tocreate an algorithm that can provide suggested product combinations in thepurchasing process. Generalized linear models with a Negative binomial distributionare applied to retrieve the predicted sales quantity. Moreover, conditionalprobability is used in the algorithm which results in a product recommendationengine based on the calculated conditional probability that the suggestedcombinations are purchased.From the findings, it can be concluded that all variables considered in themodels; original price, purchase month, colour, cluster, purchase country andchannel are significant for the predicted outcome of the sales quantity foreach product class. Furthermore, by using conditional probability andhistorical sales data, an algorithm can be constructed which createsrecommendations of product combinations of either one or two products that canbe bought together with an initial product that a customer shows interest in.
Matematisk statistik tillämpas i denna masteruppsats för att analysera köpbeteende baserat på kunddata från det svenska varumärket Indiska. Syftet med studien är att bygga modeller som kan hjälpa till att förutsäga försäljningskvantiteter för olika produktklasser och identifiera vilka faktorer som är mest signifikanta i de olika modellerna och därtill att skapa en algoritm som ger förslag på rekommenderade produktkombinationer i köpprocessen. Generaliserade linjära modeller med en negativ binomialfördelning utvecklades för att beräkna den förutspådda försäljningskvantiteten för de olika produktklasserna. Dessutom används betingad sannolikhet i algoritmen som resulterar i en produktrekommendationsmotor som baseras på den betingade sannolikheten att de föreslagna produktkombinationerna är inköpta.Från resultaten kan slutsatsen dras att alla variabler som beaktas i modellerna; originalpris, inköpsmånad, produktfärg, kluster, inköpsland och kanal är signifikanta för det predikterade resultatet av försäljningskvantiteten för varje produktklass. Vidare är det möjligt att, med hjälp av betingad sannolikhet och historisk försäljningsdata, konstruera en algoritm som skapar rekommendationer av produktkombinationer av en eller två produkter som kan köpas tillsammans med en produkt som en kund visar intresse för.
APA, Harvard, Vancouver, ISO, and other styles
50

Jiang, Jinzhu. "Feature Screening for High-Dimensional Variable Selection In Generalized Linear Models." Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1626826068909307.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography