Dissertationen: „Regresión de Ridge“

1

Williams, Ulyana P. „On Some Ridge Regression Estimators for Logistic Regression Models“. FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3667.

Der volle Inhalt der Quelle

Annotation:

The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of independent variables, and log odds ratio has been varied in the design of experiment. Simulation results show that under certain conditions, the ridge regression estimators outperform the maximum likelihood estimator. Moreover, an empirical data analysis supports the main findings of this study. This thesis proposed and recommended some good ridge regression estimators of the logistic regression model for the practitioners in the field of health, physical and social sciences.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Mahmood, Nozad. „Sparse Ridge Fusion For Linear Regression“. Master's thesis, University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5986.

Der volle Inhalt der Quelle

Annotation:

For a linear regression, the traditional technique deals with a case where the number of observations n more than the number of predictor variables p (n>p). In the case nM.S.
Masters
Statistics
Sciences
Statistical Computing

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Younker, James. „Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors“. Thesis, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/22662.

Der volle Inhalt der Quelle

Annotation:

A common problem in multiple regression analysis is having to engage in a bias variance trade-off in order to maximize the performance of a model. A number of methods have been developed to deal with this problem over the years with a variety of strengths and weaknesses. Of these approaches the ridge estimator is one of the most commonly used. This paper conducts an examination of the properties of the ridge estimator and several alternatives in both deterministic and stochastic environments. We find the ridge to be effective when the sample size is small relative to the number of predictors. However, we also identify a few cases where some of the alternative estimators can outperform the ridge estimator. Additionally, we provide examples of applications where these cases may be relevant.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Kuhl, Mark R. „Ridge regression signal processing applied to multisensor position fixing“. Ohio : Ohio University, 1990. http://www.ohiolink.edu/etd/view.cgi?ohiou1183651058.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Zaldivar, Cynthia. „On the Performance of some Poisson Ridge Regression Estimators“. FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3669.

Der volle Inhalt der Quelle

Annotation:

Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo simulation study was conducted to compare performance of the estimators under three experimental conditions: correlation, sample size, and intercept. It is evident from simulation results that all ridge estimators performed better than the ML estimator. We proposed new estimators based on the results, which performed very well compared to the original estimators. Finally, the estimators are illustrated using data on recreational habits.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Wissel, Julia. „A new biased estimator for multivariate regression models with highly collinear variables“. Doctoral thesis, kostenfrei, 2009. http://www.opus-bayern.de/uni-wuerzburg/volltexte/2009/3638/.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Bakshi, Girish. „Comparison of ridge regression and neural networks in modeling multicollinear data“. Ohio : Ohio University, 1996. http://www.ohiolink.edu/etd/view.cgi?ohiou1178815205.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Li, Ying. „A Comparison Study of Principle Component Regression, Partial Least Square Regression and Ridge Regression with Application to FTIR Data“. Thesis, Uppsala University, Department of Statistics, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-127983.

Der volle Inhalt der Quelle

Annotation:

Least squares estimator may fail when the number of explanatory vari-able is relatively large in comparison to the sample or if the variablesare almost collinear. In such a situation, principle component regres-sion, partial least squares regression and ridge regression are oftenproposed methods and widely used in many practical data analysis,especially in chemometrics. They provide biased coecient estima-tors with the relatively smaller variation than the variance of the leastsquares estimator. In this paper, a brief literature review of PCR,PLS and RR is made from a theoretical perspective. Moreover, a dataset is used, in order to examine their performance on prediction. Theconclusion is that for prediction PCR, PLS and RR provide similarresults. It requires substantial verication for any claims as to thesuperiority of any of the three biased regression methods.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Silva, Tatiane Cazarin da. „Algoritmos primais-duais de ponto fixo aplicados ao problema Ridge Regression“. reponame:Repositório Institucional da UFPR, 2016. http://hdl.handle.net/1884/43736.

Der volle Inhalt der Quelle

Annotation:

Orientador : Prof. Dr. Ademir Alves Ribeiro
Coorientador : Profª. Drª. Gislaine Aparecida Periçaro
Tese (doutorado) - Universidade Federal do Paraná, Setor de Tecnologia, Programa de Pós-Graduação em Métodos Numéricos em Engenharia. Defesa: Curitiba, 08/06/2016
Inclui referências : f. 60-64
Área de concentração : Progressão matemática
Resumo: Neste trabalho propomos algoritmos para resolver uma formulação primal-dual geral de ponto fixo aplicada ao problema de Ridge Regression. Estudamos a formulação primal para problemas de quadrados mínimos regularizado, em especial na norma L2, nomeados Ridge Regression e descrevemos a dualidade convexa para essa classe de problemas. Nossa estratégia foi considerar as formulações primal e dual conjuntamente, e minimizar o gap de dualidade entre elas. Estabelecemos o algoritmo de ponto fixo primal-dual, nomeado SRP e uma reformulação para esse método, contribuição principal da tese, a qual mostrou-se mais eficaz e robusta, designada por método acc-SRP, ou versão acelerada do método SRP. O estudo teórico dos algoritmos foi feito por meio da análise de propriedades espectrais das matrizes de iteração associadas. Provamos a convergência linear dos algoritmos e apresentamos alguns exemplos numéricos comparando duas variantes para cada algoritmo proposto. Mostramos também que o nosso melhor método, acc-SRP, possui excelente desempenho numérico na resolução de problemas muito mal-condicionados quando comparado ao Método de Gradientes Conjugados, o que o torna computacionalmente mais atraente. Palavras-chave: Métodos primais-duais, Ridge Regression, ponto fixo, dualidade, métodos acelerados
Abstract: In this work we propose algorithms for solving a fixed-point general primal-dual formulation applied to the Ridge Regression problem. We study the primal formulation for regularized least squares problems, especially L2-norm, named Ridge Regression and then describe convex duality for that class of problems. Our strategy was to consider together primal and dual formulations and minimize the duality gap between them. We established the primal-dual fixed point algorithm, named SRP and a reformulation for this method, the main contribution of the thesis, which was more efficient and robust, called acc-SRP method or accelerated version of the SRP method. The theoretical study of the algorithms was done through the analysis of the spectral properties of the associated iteration matrices. We proved the linear convergence of algorithms and some numerical examples comparing two variants for each algorithm proposed were presented. We also showed that our best method, acc-SRP, has excellent numerical performance for solving very ill-conditioned problems, when compared to the conjugate gradient method, which makes it computationally more attractive. Key-words: Primal-dual methods, ridge regression, fixed point, duality, accelerated methods.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Saha, Angshuman. „Application of ridge regression for improved estimation of parameters in compartmental models /“. Thesis, Connect to this title online; UW restricted, 1998. http://hdl.handle.net/1773/8945.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

Petersson, David, und Emil Backman. „Change Point Detection and Kernel Ridge Regression for Trend Analysis on Financial Data“. Thesis, KTH, Skolan för teknikvetenskap (SCI), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230729.

Der volle Inhalt der Quelle

Annotation:

The investing market can be a cold ruthless place for the layman. In order to get the chance of making money in this business one must place countless hours on research, with many different parameters to handle in order to reach success. To reduce the risk, one must look to many different companies operating in multiple fields and industries. In other words, it can be a hard task to manage this feat. With modern technology, there is now lots of potential to handle this tedious analysis autonomously using machine learning and clever algorithms. With this approach, the amount of analyzes is only limited by the capacity of the computer. Resulting in a number far greater than if done by hand. This study aims at exploring the possibilities to modify and implement efficient algorithms in the field of finance. The study utilizes the power of kernel methods in order to algorithmically analyze the patterns found in financial data efficiently. By combining the powerful tools of change point detection and nonlinear regression the computer can classify the different trends and moods in the market. The study culminates to a tool for analyzing data from the stock market in a way that minimizes the influence from short spikes and drops, and instead is influenced by the underlying pattern. But also, an additional tool for predicting future movements in the price.
Aktiemarknaden kan vara en hård och oförlåtande plats att investera sina pengar i som novis. För att ha någon chans att gå med vinst krävs oräkneligt många timmars efterforskning av företag och dess möjligheter. Vidare bör man sprida sina investeringar över flertalet oberoende branscher och på så sätt minska risken för stora förluster. Med många aktörer och en stor mängd parametrar som måste falla samman kan detta verka näst intill omöjligt att klara av som privatperson. Med modern teknologi finns nu stor potential till att kunna hantera dessa analyser autonomt med maskininlärning. Om man ser på problemet från denna infallsvinkel inser man snart att analysförmågan enbart begränsas av vilken datorkraft man besitter. Denna studie utforskar möjligheterna kring maskininlärning inom teknisk analys genom att kombinera effektiva algoritmer på ett nytänkande sätt. Genom att utnyttja kraften bakom kernel-metoder kan mönster i finansiella data analyseras effektivt. En ny kombination, av ickelinjär regression och algoritmer som är kapabla till att hitta brytpunkter i mönster, föreslås. Slutprodukten från denna studie är ett analysverktyg som minimerar influensen från plötsliga händelser och istället ger större vikt till de underliggande mönstren i finansiella data. Det introduceras också ett ytterligare verktyg som kan användas för att estimera framtida prisrörelser.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

CROPPER, JOHN PHILIP. „TREE-RING RESPONSE FUNCTIONS. AN EVALUATION BY MEANS OF SIMULATIONS (DENDROCHRONOLOGY RIDGE REGRESSION, MULTICOLLINEARITY)“. Diss., The University of Arizona, 1985. http://hdl.handle.net/10150/187946.

Der volle Inhalt der Quelle

Annotation:

The problem of determining the response of tree ring width growth to monthly climate is examined in this study. The objective is to document which of the available regression methods are best suited to deciphering the complex link between tree growth variation and climate. Tree-ring response function analysis is used to determine which instrumental climatic variables are best associated with tree-ring width variability. Ideally such a determination would be accomplished, or verified, through detailed physiological monitoring of trees in their natural environment. A statistical approach is required because such biological studies on mature trees are currently too time consuming to perform. The use of lagged climatic data to duplicate a biological, rather than a calendar, year has resulted in an increase in the degree of intercorrelation (multicollinearity) of the independent climate variables. The presence of multicollinearity can greatly affect the sign and magnitude of estimated regression coefficients. Using series of known response, the effectiveness of five different regression methods were objectively assessed in this study. The results from each of the 2000 regressions were compared to the known regression weights and a measure of relative efficiency computed. The results indicate that ridge regression analysis is, on average, four times more efficient (average relative efficiency of 4.57) than unbiased multiple linear regression at producing good coefficient estimates. The results from principal components regression are slight improvements over those from multiple linear regression with an average relative efficiency of 1.45.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

Gatz, Philip L. Jr. „A comparison of three prediction based methods of choosing the ridge regression parameter k“. Thesis, Virginia Tech, 1985. http://hdl.handle.net/10919/45724.

Der volle Inhalt der Quelle

Annotation:

A solution to the regression model y = xβ+ε is usually obtained using ordinary least squares. However, when the condition of multicollinearity exists among the regressor variables, then many qualities of this solution deteriorate. The qualities include the variances, the length, the stability, and the prediction capabilities of the solution. An analysis called ridge regression introduced a solution to combat this deterioration (Hoerl and Kennard, 1970a). The method uses a solution biased by a parameter k. Many methods have been developed to determine an optimal value of k. This study chose to investigate three little used methods of determining k: the PRESS statistic, Mallows' C_k. statistic, and DF-trace. The study compared the prediction capabilities of the three methods using data that contained various levels of both collinearity and leverage. This was completed by using a Monte Carlo experiment.
Master of Science

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Björkström, Anders. „Regression methods in multidimensional prediction and estimation“. Doctoral thesis, Stockholm University, Department of Mathematics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-7025.

Der volle Inhalt der Quelle

Annotation:

In regression with near collinear explanatory variables, the least squares predictor has large variance. Ordinary least squares regression (OLSR) often leads to unrealistic regression coefficients. Several regularized regression methods have been proposed as alternatives. Well-known are principal components regression (PCR), ridge regression (RR) and continuum regression (CR). The latter two involve a continuous metaparameter, offering additional flexibility.

For a univariate response variable, CR incorporates OLSR, PLSR, and PCR as special cases, for special values of the metaparameter. CR is also closely related to RR. However, CR can in fact yield regressors that vary discontinuously with the metaparameter. Thus, the relation between CR and RR is not always one-to-one. We develop a new class of regression methods, LSRR, essentially the same as CR, but without discontinuities, and prove that any optimization principle will yield a regressor proportional to a RR, provided only that the principle implies maximizing some function of the regressor's sample correlation coefficient and its sample variance. For a multivariate response vector we demonstrate that a number of well-established regression methods are related, in that they are special cases of basically one general procedure. We try a more general method based on this procedure, with two meta-parameters. In a simulation study we compare this method to ridge regression, multivariate PLSR and repeated univariate PLSR. For most types of data studied, all methods do approximately equally well. There are cases where RR and LSRR yield larger errors than the other methods, and we conclude that one-factor methods are not adequate for situations where more than one latent variable are needed to describe the data. Among those based on latent variables, none of the methods tried is superior to the others in any obvious way.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Gripencrantz, Sarah. „Evaluating the Use of Ridge Regression and Principal Components in Propensity Score Estimators under Multicollinearity“. Thesis, Uppsala universitet, Statistiska institutionen, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-226924.

Der volle Inhalt der Quelle

Annotation:

Multicollinearity can be present in the propensity score model when estimating average treatment effects (ATEs). In this thesis, logistic ridge regression (LRR) and principal components logistic regression (PCLR) are evaluated as an alternative to ML estimation of the propensity score model. ATE estimators based on weighting (IPW), matching and stratification are assessed in a Monte Carlo simulation study to evaluate LRR and PCLR. Further, an empirical example of using LRR and PCLR on real data under multicollinearity is provided. Results from the simulation study reveal that under multicollinearity and in small samples, the use of LRR reduces bias in the matching estimator, compared to ML. In large samples PCLR yields lowest bias, and typically was found to have the lowest MSE in all estimators. PCLR matched ML in bias under IPW estimation and in some cases had lower bias. The stratification estimator was heavily biased compared to matching and IPW but both bias and MSE improved as PCLR was applied, and for some cases under LRR. The specification with PCLR in the empirical example was usually most sensitive as a strongly correlated covariate was included in the propensity score model.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Herrault, Pierre-Alexis. „Extraction de fragments forestiers et caractérisation de leurs évolutions spatio-temporelles pour évaluer l'effet de l'histoire sur la biodiversité : une approche multi-sources“. Thesis, Toulouse 2, 2015. http://www.theses.fr/2015TOU20018/document.

Der volle Inhalt der Quelle

Annotation:

La biodiversité dans les paysages dépend des caractéristiques de ce paysage mais peut aussi être influencée par son histoire. En effet, certaines espèces ne réagissent pas immédiatement à une perturbation mais peuvent montrer un temps de réponse plus ou moins long. De ce fait, la prise en compte de l'évolution de l'habitat des espèces est devenue un enjeu important en écologie depuis quelques années, pour mieux comprendre la présence ou la diversité biologique actuelle. L'objectif de cette thèse en géomatique s'inscrit dans ce contexte applicatif d'écologie historique. Le sujet que nous traitons porte sur l'extraction automatique d'îlots boisés et la caractérisation de leur évolution spatio-temporelle depuis le milieu du XIXe siècle pour modéliser l'effet de leur trajectoire historique sur la diversité actuelle en syrphes forestiers (Diptera : Syrphidae). Le site d'étude est un paysage agri-forestier des Coteaux de Gascogne. La démarche générale proposée se compose de trois étapes principales : (1) la constitution de la base de données spatiales des îlots boisés intégrant plusieurs sources de données hétérogènes, (2) l'appariement des îlots boisés aux différentes dates avec la caractérisation de leur évolution spatio-temporelle, (3) la modélisation statistique des relations espèces-habitats intégrant l'histoire comme un des facteurs explicatifs de la diversité en syrphes observée. Plusieurs contributions méthodologiques ont été apportées à cette démarche. Nous avons proposé une nouvelle méthode de correction géométrique fondée sur la régression ridge à noyau pour rendre compatible les données spatiales anciennes et actuelles mobilisées. Nous avons également développé une approche et un outil de vectorisation automatique des forêts dans les dessins-minutes de la carte d'Etat-Major du XIXe siècle. Enfin, une première évaluation de l'impact de l'incertitude spatiale sur la réponse des modèles espèces-habitats a également été initiée. D'un point de vue écologique, les résultats ont révélé un effet significatif de la continuité temporelle des habitats sur la diversité en syrphes forestiers. Nous avons montré que les forêts les plus isolées présentaient une dette d'extinction ou un crédit de colonisation selon le type d'évolutions apparues au cours de la dernière période étudiée (1979-2010). Il s'est avéré qu'une durée de 30 ans n'était pas suffisante aux syrphes forestiers pour qu'ils retrouvent un été d'équilibre à la suite d'une évolution spatiale de leur habitat isolé
Biodiversity in landscapes depends on landscape spatial patterns but can also be influenced by landscape history. Indeed, some species are likely to respond in the longer term to habitat disturbances. Therefore, in recent years, landscape dynamics have become a possible factor to explain current biodiversity. The aim of this thesis in GIS is part of this historical ecology context. We are dealing with automatic extraction of forest patches and characterization of their spatiotemporal evolution. The objective is to evaluate forest dynamics effects on current diversity of forest hoverflies. (Diptera: Syrphidae) in the agri-forestry landscape of Coteaux de Gascogne. The proposed general approach consists of three main steps: (1) the forest spatial database production from heterogeneous sources, (2) forest patches matching and characterization of their spatiotemporal evolution, (3) species-habitat modeling while integrating history as one of the factors likely to explain hoverflies diversity. Several methodological contributions were made. We proposed a new geometric correction approach based on kernel ridge regression to make consistent past and present selected data sources. We also developed an automatic extraction approach of forest from Historical Map of France of the 19th century. Finally, spatial uncertainty effects on ecological models responses have been assessed. From an ecological viewpoint, a significant effect from historical continuity of patches on forest hoverflies diversity was revealed. The most isolated fragments presented an extinction debt or a colonization credit according to area dynamics occurred in the last time-period (1970-2010). As it turns out, 30 years was not sufficient for forest hoverflies to reach new equilibrium after isolated habitat changes

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Pascual, Francisco L. „Essays on the optimal selection of series functions“. Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2007. http://wwwlib.umi.com/cr/ucsd/fullcit?p3274811.

Der volle Inhalt der Quelle

Annotation:

Thesis (Ph. D.)--University of California, San Diego, 2007.
Title from first page of PDF file (viewed October 4, 2007). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

Trenkler, Dietrich. „Verallgemeinerte Ridge Regression : eine Untersuchung von theoretischen Eigenschaften und der Operationalität verzerrter Schätzer im linearen Modell /“. Frankfurt a. M : Hain, 1986. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=015371082&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

Semprevivo, Riccardo. „Realization and Performance Characterization of a Myoelectric Control System for Robotic Hands Based on Kernel Ridge Regression“. Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Den vollen Inhalt der Quelle finden

Annotation:

In the field of human-robot interaction, the research is still far to find a solution to a stable control for hand prosthesis. In particular, one of the most promising methodology is represented by the use of electromyographic signals(EMG) of the muscles as a interface between the human and the artificial limb. The EMG is already used to control robotic systems that present a little number of degrees of freedom (d.o.f.), but for more complex controls able to regulate 6 or more hand's degrees of freedom several problems persist. The nonstationarity of the EMG and the nonlinear relation related to the hand configuration, became the main problem that we have to manage. The reason is that the EMG signals change over time under the influence of various factors. One of the state of the art approaches to address this problem is the use of incremental Ridge Regression with Random Fourier Features (iRRRFF) for the myocontrol algorithm, the iRRRFF is a machine learning algorithm for nonlinear mapping that is also able to update the model with new data. This enables the possibility of a continuous adaptation to the changes in the signals. In this work we implement this control in a Matlab/Simulink environment and we create a standard procedure for its use. We use an acquisition scheme that permits to acquire 8 EMG signals of the muscles around the forearm to test the performance of this type of myocontrol and, especially, we focus on the influence of carrying out different training protocols. This was tested involving two type of subject: one expert user that already used a similar EMG system in the past and a naïve user.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Shah, Smit. „Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions“. FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/1853.

Der volle Inhalt der Quelle

Annotation:

Multiple linear regression model plays a key role in statistical inference and it has extensive applications in business, environmental, physical and social sciences. Multicollinearity has been a considerable problem in multiple regression analysis. When the regressor variables are multicollinear, it becomes difficult to make precise statistical inferences about the regression coefficients. There are some statistical methods that can be used, which are discussed in this thesis are ridge regression, Liu, two parameter biased and LASSO estimators. Firstly, an analytical comparison on the basis of risk was made among ridge, Liu and LASSO estimators under orthonormal regression model. I found that LASSO dominates least squares, ridge and Liu estimators over a significant portion of the parameter space for large dimension. Secondly, a simulation study was conducted to compare performance of ridge, Liu and two parameter biased estimator by their mean squared error criterion. I found that two parameter biased estimator performs better than its corresponding ridge regression estimator. Overall, Liu estimator performs better than both ridge and two parameter biased estimator.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Zhai, Jing, Chiu-Hsieh Hsu und Z. John Daye. „Ridle for sparse regression with mandatory covariates with application to the genetic assessment of histologic grades of breast cancer“. BIOMED CENTRAL LTD, 2017. http://hdl.handle.net/10150/622811.

Der volle Inhalt der Quelle

Annotation:

Background: Many questions in statistical genomics can be formulated in terms of variable selection of candidate biological factors for modeling a trait or quantity of interest. Often, in these applications, additional covariates describing clinical, demographical or experimental effects must be included a priori as mandatory covariates while allowing the selection of a large number of candidate or optional variables. As genomic studies routinely require mandatory covariates, it is of interest to propose principled methods of variable selection that can incorporate mandatory covariates. Methods: In this article, we propose the ridge-lasso hybrid estimator (ridle), a new penalized regression method that simultaneously estimates coefficients of mandatory covariates while allowing selection for others. The ridle provides a principled approach to mitigate effects of multicollinearity among the mandatory covariates and possible dependency between mandatory and optional variables. We provide detailed empirical and theoretical studies to evaluate our method. In addition, we develop an efficient algorithm for the ridle. Software, based on efficient Fortran code with R-language wrappers, is publicly and freely available at https://sites.google.com/site/zhongyindaye/software. Results: The ridle is useful when mandatory predictors are known to be significant due to prior knowledge or must be kept for additional analysis. Both theoretical and comprehensive simulation studies have shown that the ridle to be advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves. A microarray gene expression analysis of the histologic grades of breast cancer has identified 24 genes, in which 2 genes are selected only by the ridle among current methods and found to be associated with tumor grade. Conclusions: In this article, we proposed the ridle as a principled sparse regression method for the selection of optional variables while incorporating mandatory ones. Results suggest that the ridle is advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Shulga, Yelena A. „Model-based calibration of a non-invasive blood glucose monitor“. Digital WPI, 2006. https://digitalcommons.wpi.edu/etd-theses/58.

Der volle Inhalt der Quelle

Annotation:

This project was dedicated to the problem of improving a non-invasive blood glucose monitor being developed by the VivaScan Corporation. The company has made some progress in the non-invasive blood glucose device development and approached WPI for a statistical assistance in the improvement of their model in order to predict the glucose level more accurately. The main goal of this project was to improve the ability of the non-invasive blood glucose monitor to predict the glucose values more precisely. The goal was achieved by finding and implementing the best regression model. The methods included ordinary least squared regression, partial least squares regression, robust regression method, weighted least squares regression, local regression, and ridge regression. VivaScan calibration data for seven patients were analyzed in this project. For each of these patients, the individual regression models were built and compared based on the two factors that evaluate the model prediction ability. It was determined that partial least squares and ridge regressions are two best methods among the others that were considered in this work. Using these two methods gave better glucose prediction. The additional problem of data reduction to minimize the data collection time was also considered in this work.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Pelawa, Watagoda Lasanthi Chathurika Ranasinghe. „INFERENCE AFTER VARIABLE SELECTION“. OpenSIUC, 2017. https://opensiuc.lib.siu.edu/dissertations/1424.

Der volle Inhalt der Quelle

Annotation:

This thesis presents inference for the multiple linear regression model Y = beta_1 x_1 + ... + beta_p x_p + e after model or variable selection, including prediction intervals for a future value of the response variable Y_f, and testing hypotheses with the bootstrap. If n is the sample size, most results are for n/p large, but prediction intervals are developed that may increase in average length slowly as p increases for fixed n if the model is sparse: k predictors have nonzero coefficients beta_i where n/k is large.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

Binard, Carole. „Estimation de fonctions de régression : sélection d'estimateurs ridge, étude de la procédure PLS1 et applications à la modélisation de la signature génique du cancer du poumon“. Thesis, Nice, 2016. http://www.theses.fr/2016NICE4015.

Der volle Inhalt der Quelle

Annotation:

Cette thèse porte sur l’estimation d'une fonction de régression fournissant la meilleure relation entredes variables pour lesquelles on possède un certain nombre d’observations. Une première partie portesur une étude par simulation de deux méthodes automatiques de sélection du paramètre de laprocédure d'estimation ridge. D'un point de vue plus théorique, on présente et compare ensuite deuxméthodes de sélection d'un multiparamètre intervenant dans une procédure d'estimation d'unefonction de régression sur l'intervalle [0,1]. Dans une deuxième partie, on étudie la qualité del'estimateur PLS1, d'un point de vue théorique, à travers son risque quadratique et, plus précisément,le terme de variance dans la décomposition biais/variance de ce risque. Enfin, dans une troisièmepartie, une étude statistique sur données réelles est menée afin de mieux comprendre la signaturegénique de cellules cancéreuses à partir de la signature génique des sous-types cellulaires constituantle stroma tumoral associé
This thesis deals with the estimation of a regression function providing the best relationship betweenvariables for which we have some observations. In a first part, we complete a simulation study fortwo automatic selection methods of the ridge parameter. From a more theoretical point of view, wethen present and compare two selection methods of a multiparameter, that is used in an estimationprocedure of a regression function on [0,1]. In a second part, we study the quality of the PLS1estimator through its quadratic risk and, more precisely, the variance term in its bias/variancedecomposition. In a third part, a statistical study is carried out in order to explain the geneticsignature of cancer cells thanks to the genetic signatures of cellular subtypes which compose theassociated tumor stroma

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

Jansson, Daniel, und Nils Niklasson. „En analys av statens samhällssatsningar och dess effektivitet för att reducera brottslighet“. Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-275665.

Der volle Inhalt der Quelle

Annotation:

Through an analysis of the Swedish state budget, models have been developed to deepen the understanding of the effects that government expenditures have on reducing crime. This has been modeled by examining selected crime categories using the mathematical methods Ridge Regression, Lasso Regression and Principal Component Analysis. Combined with a qualitative study of previous research on the economic aspects of crime, an analysis has been conducted. The mathematical methods indicate that it may be more effective to invest in crime prevention measures, such as increased social protection and focus on vulnerable groups, rather than more direct efforts such as increased resources for the police force. However, the result contradicts some of the accepted economic conclusions on the subject, as these highlight the importance of increasing the number of police officers and harsher penalties. These do however also mention the importance of crime prevention measures such as reducing the gaps in society, which is in line with the results of this work. The conclusion should however be used with caution as the models are based on a number of assumptions and could be improved upon further analysis of these, together with more data points that would strengthen the validity of the analysis more.
Genom en analys av Sveriges statsbudget har modeller tagits fram för att försöka förstå de effekter olika samhällssatsningar har på brottslighet i Sverige. Detta har modellerats genom att undersöka utvalda brottskategorier med hjälp av de matematiska metoderna Ridge Regression, Lasso Regression samt Principal Component Analysis. Tillsammans med en kvalitativ undersökning av tidigare forskning gällande nationalekonomiska aspekter kring brottslighet har en analys sedan genomförts. De matematiska metoderna tyder på att det kan vara mer effektivt att satsa på brottsförebyggande åtgärder, såsom ökat socialt skydd och fokus på utsatta grupper, istället för mer direkta satsningar på brottsförhindrande åtgärder som exempelvis ökade resurser till polisväsendet. Däremot motsäger resultatet en del av de vedertagna nationalekonomiska slutsatserna om ämnet, då dessa belyser vikten av ökade antalet poliser och hårdare straff. De lyfter även fram vikten av brottsförebyggande åtgärder såsom att minska klyftorna i samhället, vilket går i linje med resultatet av detta arbete. Slutsatsen ska dock användas med försiktighet då modellerna bygger på flertalet antaganden och skulle kunna förbättras vid ytterligare analys utav dessa, tillsammans med fler datapunkter som skulle stärka validiteten.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

26

Arale, Brännvall Marian. „Accelerating longitudinal spinfluctuation theory for iron at high temperature using a machine learning method“. Thesis, Linköpings universitet, Teoretisk Fysik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-170314.

Der volle Inhalt der Quelle

Annotation:

In the development of materials, the understanding of their properties is crucial. For magnetic materials, magnetism is an apparent property that needs to be accounted for. There are multiple factors explaining the phenomenon of magnetism, one being the effect of vibrations of the atoms on longitudinal spin fluctuations. This effect can be investigated by simulations, using density functional theory, and calculating energy landscapes. Through such simulations, the energy landscapes have been found to depend on the magnetic background and the positions of the atoms. However, when simulating a supercell of many atoms, to calculate energy landscapes for all atoms consumes many hours on the supercomputer. In this thesis, the possibility of using machine learning models to accelerate the approximation of energy landscapes is investigated. The material under investigation is body-centered cubic iron in the paramagnetic state at 1043 K. Machine learning enables statistical predictions to be made on new data based on patterns found in a previous set of data. Kernel ridge regression is used as the machine learning method. An important issue when training a machine learning model is the representation of the data in the so called descriptor (feature vector representation) or, more specific to this case, how the environment of an atom in a supercell is accounted for and represented properly. Four different descriptors are developed and compared to investigate which one yields the best result and why. Apart from comparing the descriptors, the results when using machine learning models are compared to when using other methods to approximate the energy landscapes. The machine learning models are also tested in a combined atomistic spin dynamics and ab initio molecular dynamics simulation (ASD-AIMD) where they were used to approximate energy landscapes and, from that, magnetic moment magnitudes at 1043 K. The results of these simulations are compared to the results from two other cases: one where the magnetic moment magnitudes are set to a constant value and one where they are set to their magnitudes at 0 K. From these investigations it is found that using machine learning methods to approximate the energy landscapes does, to a large degree, decrease the errors compared to the other approximation methods investigated. Some weaknesses of the respective descriptors were detected and if, in future work, these are accounted for, the errors have the potential of being lowered further.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

27

Aghi, Nawar, und Ahmad Abdulal. „House Price Prediction“. Thesis, Högskolan Kristianstad, Fakulteten för naturvetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hkr:diva-20945.

Der volle Inhalt der Quelle

Annotation:

This study proposes a performance comparison between machine learning regression algorithms and Artificial Neural Network (ANN). The regression algorithms used in this study are Multiple linear, Least Absolute Selection Operator (Lasso), Ridge, Random Forest. Moreover, this study attempts to analyse the correlation between variables to determine the most important factors that affect house prices in Malmö, Sweden. There are two datasets used in this study which called public and local. They contain house prices from Ames, Iowa, United States and Malmö, Sweden, respectively.The accuracy of the prediction is evaluated by checking the root square and root mean square error scores of the training model. The test is performed after applying the required pre-processing methods and splitting the data into two parts. However, one part will be used in the training and the other in the test phase. We have also presented a binning strategy that improved the accuracy of the models.This thesis attempts to show that Lasso gives the best score among other algorithms when using the public dataset in training. The correlation graphs show the variables' level of dependency. In addition, the empirical results show that crime, deposit, lending, and repo rates influence the house prices negatively. Where inflation, year, and unemployment rate impact the house prices positively.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

28

Casagrande, Marcelo Henrique. „Comparação de métodos de estimação para problemas com colinearidade e/ou alta dimensionalidade (p > n)“. Universidade Federal de São Carlos, 2016. https://repositorio.ufscar.br/handle/ufscar/7954.

Der volle Inhalt der Quelle

Annotation:

Submitted by Bruna Rodrigues (bruna92rodrigues@yahoo.com.br) on 2016-10-06T11:48:12Z No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T13:58:41Z (GMT) No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T13:58:47Z (GMT) No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5)
Made available in DSpace on 2016-10-20T13:58:52Z (GMT). No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) Previous issue date: 2016-04-29
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
This paper presents a comparative study of the predictive power of four suitable regression methods for situations in which data, arranged in the planning matrix, are very poorly multicolinearity and / or high dimensionality, wherein the number of covariates is greater the number of observations. In this study, the methods discussed are: principal component regression, partial least squares regression, ridge regression and LASSO. The work includes simulations, wherein the predictive power of each of the techniques is evaluated for di erent scenarios de ned by the number of covariates, sample size and quantity and intensity ratios (e ects) signi cant, highlighting the main di erences between the methods and allowing for the creating a guide for the user to choose which method to use based on some prior knowledge that it may have. An application on real data (not simulated) is also addressed.
Este trabalho apresenta um estudo comparativo do poder de predi c~ao de quatro m etodos de regress~ao adequados para situa c~oes nas quais os dados, dispostos na matriz de planejamento, apresentam s erios problemas de multicolinearidade e/ou de alta dimensionalidade, em que o n umero de covari aveis e maior do que o n umero de observa c~oes. No presente trabalho, os m etodos abordados s~ao: regress~ao por componentes principais, regress~ao por m nimos quadrados parciais, regress~ao ridge e LASSO. O trabalho engloba simula c~oes, em que o poder preditivo de cada uma das t ecnicas e avaliado para diferentes cen arios de nidos por n umero de covari aveis, tamanho de amostra e quantidade e intensidade de coe cientes (efeitos) signi cativos, destacando as principais diferen cas entre os m etodos e possibilitando a cria c~ao de um guia para que o usu ario possa escolher qual metodologia usar com base em algum conhecimento pr evio que o mesmo possa ter. Uma aplica c~ao em dados reais (n~ao simulados) tamb em e abordada

APA, Harvard, Vancouver, ISO und andere Zitierweisen

29

GALLI, FABIAN. „Predicting PV self-consumption in villas with machine learning“. Thesis, KTH, Skolan för industriell teknik och management (ITM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-300433.

Der volle Inhalt der Quelle

Annotation:

In Sweden, there is a strong and growing interest in solar power. In recent years, photovoltaic (PV) system installations have increased dramatically and a large part are distributed grid connected PV systems i.e. rooftop installations. Currently the electricity export rate is significantly lower than the import rate which has made the amount of self-consumed PV electricity a critical factor when assessing the system profitability. Self-consumption (SC) is calculated using hourly or sub-hourly timesteps and is highly dependent on the solar patterns of the location of interest, the PV system configuration and the building load. As this varies for all potential installations it is difficult to make estimations without having historical data of both load and local irradiance, which is often hard to acquire or not available. A method to predict SC using commonly available information at the planning phase is therefore preferred. There is a scarcity of documented SC data and only a few reports treating the subject of mapping or predicting SC. Therefore, this thesis is investigating the possibility of utilizing machine learning to create models able to predict the SC using the inputs: Annual load, annual PV production, tilt angle and azimuth angle of the modules, and the latitude. With the programming language Python, seven models are created using regression techniques, using real load data and simulated PV data from the south of Sweden, and evaluated using coefficient of determination (R2) and mean absolute error (MAE). The techniques are Linear Regression, Polynomial regression, Ridge Regression, Lasso regression, K-Nearest Neighbors (kNN), Random Forest, Multi-Layer Perceptron (MLP), as well as the only other SC prediction model found in the literature. A parametric analysis of the models is conducted, removing one variable at a time to assess the model’s dependence on each variable. The results are promising, with five out of eight models achieving an R2 value above 0.9 and can be considered good for predicting SC. The best performing model, Random Forest, has an R2 of 0.985 and a MAE of 0.0148. The parametric analysis also shows that while more input data is helpful, using only annual load and PV production is sufficient to make good predictions. This can only be stated for model performance for the southern region of Sweden, however, and are not applicable to areas outside the latitudes or country tested.
I Sverige finns ett starkt och växande intresse för solenergi. De senaste åren har antalet solcellsanläggningar ökat dramatiskt och en stor del är distribuerade nätanslutna solcellssystem, dvs takinstallationer. För närvarande är elexportpriset betydligt lägre än importpriset, vilket har gjort mängden egenanvänd solel till en kritisk faktor vid bedömningen av systemets lönsamhet. Egenanvändning (EA) beräknas med tidssteg upp till en timmes längd och är i hög grad beroende av solstrålningsmönstret för platsen av intresse, PV-systemkonfigurationen och byggnadens energibehov. Eftersom detta varierar för alla potentiella installationer är det svårt att göra uppskattningar utan att ha historiska data om både energibehov och lokal solstrålning, vilket ofta inte är tillgängligt. En metod för att förutsäga EA med allmän tillgänglig information är därför att föredra. Det finns en brist på dokumenterad EA-data och endast ett fåtal rapporter som behandlar kartläggning och prediktion av EA. I denna uppsats undersöks möjligheten att använda maskininlärning för att skapa modeller som kan förutsäga EA. De variabler som ingår är årlig energiförbrukning, årlig solcellsproduktion, lutningsvinkel och azimutvinkel för modulerna och latitud. Med programmeringsspråket Python skapas sju modeller med hjälp av olika regressionstekniker, där energiförbruknings- och simulerad solelproduktionsdata från södra Sverige används. Modellerna utvärderas med hjälp av determinationskoefficienten (R2) och mean absolute error (MAE). Teknikerna som används är linjär regression, polynomregression, Ridge regression, Lasso regression, K-nearest neighbor regression, Random Forest regression, Multi-Layer Perceptron regression. En additionell linjär regressions-modell skapas även med samma metodik som används i en tidigare publicerad rapport. En parametrisk analys av modellerna genomförs, där en variabel exkluderas åt gången för att bedöma modellens beroende av varje enskild variabel. Resultaten är mycket lovande, där fem av de åtta undersökta modeller uppnår ett R2-värde över 0,9. Den bästa modellen, Random Forest, har ett R2 på 0,985 och ett MAE på 0,0148. Den parametriska analysen visar också att även om ingångsdata är till hjälp, är det tillräckligt att använda årlig energiförbrukning och årlig solcellsproduktion för att göra bra förutsägelser. Det måste dock påpekas att modellprestandan endast är tillförlitlig för södra Sverige, från var beräkningsdata är hämtad, och inte tillämplig för områden utanför de valda latituderna eller land.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

30

Schwarz, Patrick. „Prediction with Penalized Logistic Regression : An Application on COVID-19 Patient Gender based on Case Series Data“. Thesis, Karlstads universitet, Handelshögskolan (from 2013), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-85642.

Der volle Inhalt der Quelle

Annotation:

The aim of the study was to evaluate dierent types of logistic regression to find the optimal model to predict the gender of hospitalized COVID-19 patients. The models were based on COVID-19 case series data from Pakistan using a set of 18 explanatory variables out of which patient age and BMI were numerical and the rest were categorical variables, expressing symptoms and previous health issues. Compared were a logistic regression using all variables, a logistic regression that used stepwise variable selection with 4 explanatory variables, a logistic Ridge regression model, a logistic Lasso regression model and a logistic Elastic Net regression model. Based on several metrics assessing the goodness of fit of the models and the evaluation of predictive power using the area under the ROC curve the Elastic Net that was only using the Lasso penalty had the best result and was able to predict 82.5% of the test cases correctly.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

31

Elghriany, Ahmed F. „Investigating Correlations of Pavement Conditions with Crash Rates on In-Service U.S. Highways“. University of Akron / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=akron1448454032.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

32

Salazar, Ruiz Enriqueta. „Desarrollo de modelos predictivos de contaminantes ambientales“. Doctoral thesis, Universitat Politècnica de València, 2008. http://hdl.handle.net/10251/2504.

Der volle Inhalt der Quelle

Annotation:

El desarrollo de modelos matemáticos predictivos de distinto tipos de fenómenos son aplicaciones fundamentales y útiles de las técnicas de Minería de Datos. Un buen modelo se convierte en una excelente herramienta científica que requiere de la existencia y disposición de grandes volúmenes de datos, además de habilidad y considerable tiempo aplicado del investigador para integrar los conocimientos más relevantes y característicos del fenómeno en estudio. En el caso concreto de ésta tesis, los modelos de predicción desarrollados se enfocaron en la predicción contaminantes ambientales como el valor medio de Partículas Finas (PM2.5) presentes en el aire respirable con un tiempo de anticipación de 8 horas y del Ozono Troposférico Máximo (O3) con 24 horas de anticipación. Se trabajó con un interesante conjunto de técnicas de predicción partiendo con herramientas de naturaleza paramétrica tan sencillas como Persistencia, Modelación Lineal Multivariante, así como la técnica semi-paramétrica: Regresión Ridge además de herramientas de naturaleza no paramétrica como Redes Neuronales Artificiales (ANN) como Perceptron Multicapa (MLP), Perceptrón Multi Capa Cuadrática (SMLP), Función de Base Radial (RBF) y Redes Elman, así como Máquinas de Vectores Soporte (SVM), siendo las técnicas no paramétricas las que generalizaron mejor los fenómenos modelizados.
Salazar Ruiz, E. (2008). Desarrollo de modelos predictivos de contaminantes ambientales [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2504
Palancia

APA, Harvard, Vancouver, ISO und andere Zitierweisen

33

Wei, Zhaoyi. „Real-Time Optical Flow Sensor Design and its Application on Obstacle Detection“. Diss., CLICK HERE for online access, 2009. http://contentdm.lib.byu.edu/ETD/image/etd2916.pdf.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

34

Sawert, Marcus. „Predicting deliveries from suppliers : A comparison of predictive models“. Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-39314.

Der volle Inhalt der Quelle

Annotation:

In the highly competitive environment that companies find themselves in today, it is key to have a well-functioning supply chain. For manufacturing companies, having a good supply chain is dependent on having a functioning production planning. The production planning tries to fulfill the demand while considering the resources available. This is complicated by the uncertainties that exist, such as the uncertainty in demand, in manufacturing and in supply. Several methods and models have been created to deal with production planning under uncertainty, but they often overlook the complexity in the supply uncertainty, by considering it as a stochastic uncertainty. To improve these models, a prediction based on earlier data regarding the supplier or item could be used to see when the delivery is likely to arrive. This study looked to compare different predictive models to see which one could best be suited for this purpose. Historic data regarding earlier deliveries was gathered from a large international manufacturing company and was preprocessed before used in the models. The target value that the models were to predict was the actual delivery time from the supplier. The data was then tested with the following four regression models in Python: Linear regression, ridge regression, Lasso and Elastic net. The results were calculated by cross-validation and presented in the form of the mean absolute error together with the standard deviation. The results showed that the Elastic net was the overall best performing model, and that the linear regression performed worst.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

35

Hofmarcher, Paul, Stefan Kerbl, Bettina Grün, Michael Sigmund und Kurt Hornik. „Model Uncertainty and Aggregated Default Probabilities: New Evidence from Austria“. WU Vienna University of Economics and Business, 2012. http://epub.wu.ac.at/3383/1/Report116.pdf.

Der volle Inhalt der Quelle

Annotation:

Understanding the determinants of aggregated default probabilities (PDs) has attracted substantial research over the past decades. This study addresses two major difficulties in understanding the determinants of aggregate PDs: Model uncertainty and multicollinearity among the regressors. We present Bayesian Model Averaging (BMA) as a powerful tool that overcomes model uncertainty. Furthermore, we supplement BMA with ridge regression to mitigate multicollinearity. We apply our approach to an Austrian dataset. Our findings suggest that factor prices like short term interest rates and energy prices constitute major drivers of default rates, while firms' profits reduce the expected number of failures. Finally, we show that the results of our baseline model are fairly robust to the choice of the prior model size.
Series: Research Report Series / Department of Statistics and Mathematics

APA, Harvard, Vancouver, ISO und andere Zitierweisen

36

Ladejobi, Olufunmilayo Olubukola. „Testing new genetic and genomic approaches for trait mapping and prediction in wheat (Triticum aestivum) and rice (Oryza spp)“. Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/277449.

Der volle Inhalt der Quelle

Annotation:

Advances in molecular marker technologies have led to the development of high throughput genotyping techniques such as Genotyping by Sequencing (GBS), driving the application of genomics in crop research and breeding. They have also supported the use of novel mapping approaches, including Multi-parent Advanced Generation Inter-Cross (MAGIC) populations which have increased precision in identifying markers to inform plant breeding practices. In the first part of this thesis, a high density physical map derived from GBS was used to identify QTLs controlling key agronomic traits of wheat in a genome-wide association study (GWAS) and to demonstrate the practicability of genomic selection for predicting the trait values. The results from GBS were compared to a previous study conducted on the same association mapping panel using a less dense physical map derived from diversity arrays technology (DArT) markers. GBS detected more QTLs than DArT markers although some of the QTLs were detected by DArT markers alone. Prediction accuracies from the two marker platforms were mostly similar and largely dependent on trait genetic architecture. The second part of this thesis focused on MAGIC populations, which incorporate diversity and novel allelic combinations from several generations of recombination. Pedigrees representing a wild rice MAGIC population were used to model MAGIC populations by simulation to assess the level of recombination and creation of novel haplotypes. The wild rice species are an important reservoir of beneficial genes that have been variously introgressed into rice varieties using bi-parental population approaches. The level of recombination was found to be highly dependent on the number of crosses made and on the resulting population size. Creation of MAGIC populations require adequate planning in order to make sufficient number of crosses that capture optimal haplotype diversity. The third part of the thesis considers models that have been proposed for genomic prediction. The ridge regression best linear unbiased prediction (RR-BLUP) is based on the assumption that all genotyped molecular markers make equal contributions to the variations of a phenotype. Information from underlying candidate molecular markers are however of greater significance and can be used to improve the accuracy of prediction. Here, an existing Differentially Penalized Regression (DiPR) model which uses modifications to a standard RR-BLUP package and allows two or more marker sets from different platforms to be independently weighted was used. The DiPR model performed better than single or combined marker sets for predicting most of the traits both in a MAGIC population and an association mapping panel. Overall the work presented in this thesis shows that while these techniques have great promise, they should be carefully evaluated before introduction into breeding programmes.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

37

Linton, Thomas. „Forecasting hourly electricity consumption for sets of households using machine learning algorithms“. Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186592.

Der volle Inhalt der Quelle

Annotation:

To address inefficiency, waste, and the negative consequences of electricity generation, companies and government entities are looking to behavioural change among residential consumers. To drive behavioural change, consumers need better feedback about their electricity consumption. A monthly or quarterly bill provides the consumer with almost no useful information about the relationship between their behaviours and their electricity consumption. Smart meters are now widely dispersed in developed countries and they are capable of providing electricity consumption readings at an hourly resolution, but this data is mostly used as a basis for billing and not as a tool to assist the consumer in reducing their consumption. One component required to deliver innovative feedback mechanisms is the capability to forecast hourly electricity consumption at the household scale. The work presented by this thesis is an evaluation of the effectiveness of a selection of kernel based machine learning methods at forecasting the hourly aggregate electricity consumption for different sized sets of households. The work of this thesis demonstrates that k-Nearest Neighbour Regression and Gaussian process Regression are the most accurate methods within the constraints of the problem considered. In addition to accuracy, the advantages and disadvantages of each machine learning method are evaluated, and a simple comparison of each algorithms computational performance is made.
För att ta itu med ineffektivitet, avfall, och de negativa konsekvenserna av elproduktion så vill företag och myndigheter se beteendeförändringar bland hushållskonsumenter. För att skapa beteendeförändringar så behöver konsumenterna bättre återkoppling när det gäller deras elförbrukning. Den nuvarande återkopplingen i en månads- eller kvartalsfaktura ger konsumenten nästan ingen användbar information om hur deras beteenden relaterar till deras konsumtion. Smarta mätare finns nu överallt i de utvecklade länderna och de kan ge en mängd information om bostäders konsumtion, men denna data används främst som underlag för fakturering och inte som ett verktyg för att hjälpa konsumenterna att minska sin konsumtion. En komponent som krävs för att leverera innovativa återkopplingsmekanismer är förmågan att förutse elförbrukningen på hushållsskala. Arbetet som presenteras i denna avhandling är en utvärdering av noggrannheten hos ett urval av kärnbaserad maskininlärningsmetoder för att förutse den sammanlagda förbrukningen för olika stora uppsättningar av hushåll. Arbetet i denna avhandling visar att "k-Nearest Neighbour Regression" och "Gaussian Process Regression" är de mest exakta metoder inom problemets begränsningar. Förutom noggrannhet, så görs en utvärdering av fördelar, nackdelar och prestanda hos varje maskininlärningsmetod.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

38

Solomon, Mary Joanna. „Multivariate Analysis of Korean Pop Music Audio Features“. Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1617105874719868.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

39

Nakamura, Karina Gernhardt. „Multicolinearidade em modelos de regressão logística“. Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-28052013-222241/.

Der volle Inhalt der Quelle

Annotation:

Neste trabalho estudamos os efeitos da multicolinearidade em modelos de regressão logística e apresentamos estimadores viesados para que tais efeitos fossem minimizados. Primeiramente, o modelo de regressão logística e o processo para a estimação dos parâmetros foram apresentados. Foram feitos, também, alguns testes para avaliar a significância dos mesmos, bem como técnicas para analisar a qualidade do ajuste do modelo. Em seguida, os efeitos da multicolinearidade na estimação dos parâmetros e na sua inferência foram avaliados, bem como técnicas para o seu diagnóstico. Para amenizar o efeito deste problema, apresentamos dois estimadores alternativos ao de máxima verossimilhança: estimador em cristas e estimador em componentes principais. Comparamos, então, o desempenho dos três estimadores na forma de um estudo de simulação e de uma aplicação em um conjunto de dados reais. O principal resultado obtido foi que, na presença de multicolinearidade, os estimadores alternativos conseguiram um melhor ajuste em comparação ao de máxima verossimilhança, além de minimizar os seus efeitos.
This work proposes the use of some biased estimators to investigate whether is possible minimize the multicollinearity effects in logistic regression models. Initially, the latter model was presented, as well as its fitting process (therefore obtaining the maximum likelihood estimator), some tests to evaluate the significance of the parameters and techniques to analyze goodness of fit were also considered. Furthermore, the effects of multicollinearity in the fitting process and in the parameters inference were discussed, as well as techniques to identify the presence of multicollinearity. In order to diminish the effect of this problem, two alternative estimators were presented: ridge estimator and principal component estimator. Therefore, these three estimators performances were compared using a simulation study and applied in a real data set. The manly conclusion was that, in the presence of multicollinearity, the alternative estimators performed better than the maximum likelihood estimator, besides reducing its effects.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

40

Sheppard, Therese. „Extending covariance structure analysis for multivariate and functional data“. Thesis, University of Manchester, 2010. https://www.research.manchester.ac.uk/portal/en/theses/extending-covariance-structure-analysis-for-multivariate-and-functional-data(e2ad7f12-3783-48cf-b83c-0ca26ef77633).html.

Der volle Inhalt der Quelle

Annotation:

For multivariate data, when testing homogeneity of covariance matrices arising from two or more groups, Bartlett's (1937) modified likelihood ratio test statistic is appropriate to use under the null hypothesis of equal covariance matrices where the null distribution of the test statistic is based on the restrictive assumption of normality. Zhang and Boos (1992) provide a pooled bootstrap approach when the data cannot be assumed to be normally distributed. We give three alternative bootstrap techniques to testing homogeneity of covariance matrices when it is both inappropriate to pool the data into one single population as in the pooled bootstrap procedure and when the data are not normally distributed. We further show that our alternative bootstrap methodology can be extended to testing Flury's (1988) hierarchy of covariance structure models. Where deviations from normality exist, we show, by simulation, that the normal theory log-likelihood ratio test statistic is less viable compared with our bootstrap methodology. For functional data, Ramsay and Silverman (2005) and Lee et al (2002) together provide four computational techniques for functional principal component analysis (PCA) followed by covariance structure estimation. When the smoothing method for smoothing individual profiles is based on using least squares cubic B-splines or regression splines, we find that the ensuing covariance matrix estimate suffers from loss of dimensionality. We show that ridge regression can be used to resolve this problem, but only for the discretisation and numerical quadrature approaches to estimation, and that choice of a suitable ridge parameter is not arbitrary. We further show the unsuitability of regression splines when deciding on the optimal degree of smoothing to apply to individual profiles. To gain insight into smoothing parameter choice for functional data, we compare kernel and spline approaches to smoothing individual profiles in a nonparametric regression context. Our simulation results justify a kernel approach using a new criterion based on predicted squared error. We also show by simulation that, when taking account of correlation, a kernel approach using a generalized cross validatory type criterion performs well. These data-based methods for selecting the smoothing parameter are illustrated prior to a functional PCA on a real data set.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

41

Bodily, John M. „An Optical Flow Implementation Comparison Study“. Diss., CLICK HERE for online access, 2009. http://contentdm.lib.byu.edu/ETD/image/etd2818.pdf.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

42

Gama, Lorenna Eleamen da Silva. „Equações monoespecíficas de incremento em área basal de Handroanthus serratifolius (Vahl) S.O.Grose (ipê amarelo) e Handroanthus impetiginosus (Mart. ex DC.) Mattos (ipê roxo) da floresta tropical pluvial do Acre“. Universidade Federal de Santa Maria, 2017. http://repositorio.ufsm.br/handle/1/13302.

Der volle Inhalt der Quelle

Annotation:

In natural forests rarely considers the forest structure and the growth rate of the species as criteria for management, which is based on species groups with distinct characteristics. Given this assumption, this work was developed seeking to advance in the knowledge of the rate of growth of timber species exploited in the East of the State of Acre in order to contribute to the sustainable exploitation of these forests. For this was modeled the growth of Handroanthus impetiginosus (Mart. Ex DC.) Mattos (ipê roxo) and Handroanthus serratifolius (Vahl) S. O. Grose (ipê amarelo) with measured data with the technique of growth ring analysis. In the regression model were investigated covariates associated with size and morphology of crown, competitive status, sanity and the load of lianas in the crown, as descriptors of the basal area increment rate. The study was carried out with Amazon rainforest trees, measured in the municipality of Porto Acre, Acre state, in particular area under sustainable forest management adopted by IMAC – Institute of environment of Acre. With the Pressler borer were collected, the height of the dbh, four rolls of increment of 0.5 mm in diameter and approximately 10 cm in length, obeying the cardinal points. The rolls were extracted from trees H. impetiginosus sample (n = 30) and H. serratifolius (n = 35) in a diametric amplitude between 13.5 cm to 88.1 cm dbh, totaling 260 rolls of increment. The width of the growth rings was measured on the rays towards bark/pith, in each roll with increment of tablet, with magnifying glass attached and TSAP-WinTM Scientific software. The width of the growth rings were rebuilt the dimensions of the dbh and the increment rates corresponding to the period from 2011 to 2014. The regression model was adjusted for Ordinary Least Squares considering hypothetical normal distribution, Generalized Linear Models considering distribution Gamma and logarithmic link function. The selection of the covariates considered the correlation with periodic basal area growth. The selected model had selected variables the diameter (dbh), height (h), h/d ratio and Hegyi competition index. The presence of multicollinearity between the covariates was corrected by Ridge Regression procedure. Based on statistical criteria and residual evaluation, adjusted growth model with the addition of the constant K = 0,024 to the coefficients of the model proved to be suitable to describe the variation of periodic annual increment in basal area (IPAg).
Em florestas naturais, raramente considera-se a estrutura da floresta e o ritmo de crescimento das espécies como critérios para o manejo. Quando considerado, é normalmente baseado em grupos de espécies com características distintas. Diante desse pressuposto, este trabalho foi desenvolvido buscando avançar no conhecimento do ritmo de crescimento das espécies madeireiras exploradas no leste do estado do Acre visando contribuir com a exploração sustentável dessas florestas. Para isso, foi modelado o crescimento de Handroanthus impetiginosus (Mart. Ex DC.) Mattos (ipê roxo) e Handroanthus serratifolius (Vahl) S.O. Grose (ipê amarelo) a partir de dados obtidos com a técnica de análise de anéis de crescimento. No modelo de regressão, foram investigadas covariáveis associadas ao tamanho e à morfometria da copa, ao status competitivo, à sanidade e à carga de lianas na copa, como descritoras da taxa de incremento em área basal. O estudo foi desenvolvido com árvores da Floresta Amazônica, mensuradas no município de Porto Acre, estado do Acre, em área particular, sob manejo florestal sustentável, aprovado pelo Instituto do Meio Ambiente do Acre – IMAC. Com o Trado de Pressler, foram coletados, à altura do dap, quatro rolos de incremento de 0,5 mm de diâmetro e de, aproximadamente, 10 cm de comprimento, obedecendo os pontos cardeais. Os rolos foram extraídos de árvores amostra de H. impetiginosus (n=30) e H. serratifolius (n=35), em uma amplitude diamétrica entre 13,5 a 88,1 cm, totalizando 260 rolos de incremento. A largura dos anéis de crescimento foi medida sobre os raios no sentido casca/medula, em cada rolo de incremento com auxílio de mesa digitalizadora, com lupa acoplada e software TSAP-WinTM Scientific. A partir da largura dos anéis de crescimento, foram reconstruídas as dimensões do dap e as taxas de incremento correspondente ao período de 2011 a 2014. O modelo de regressão foi ajustado por Mínimos Quadrados Ordinários, considerando distribuição normal hipotética, com os Mínimos Quadrados Generalizados, considerando distribuição Gama e função de ligação logarítmica. A seleção das covariáveis considerou a correlação com o crescimento periódico anual em área basal. O modelo selecionado teve como variáveis selecionadas o logaritmo do diâmetro (lnd), altura (h), relação altura/diâmetro (h/d) e índice de competição de Hegyi (IC). A presença de multicolinearidade entre as covariáveis foi corrigida pelo procedimento de Regressão de Cumeeira. Com base nos critérios estatísticos e na avaliação residual, o modelo de crescimento ajustado com a adição da constante K=0,024 aos coeficientes do modelo demonstrou ser adequado para descrever a variação de incremento periódico anual em área basal (IPAg).

APA, Harvard, Vancouver, ISO und andere Zitierweisen

43

Moller, Jurgen Johann. „The implementation of noise addition partial least squares“. Thesis, Stellenbosch : University of Stellenbosch, 2009. http://hdl.handle.net/10019.1/3362.

Der volle Inhalt der Quelle

Annotation:

Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2009.
When determining the chemical composition of a specimen, traditional laboratory techniques are often both expensive and time consuming. It is therefore preferable to employ more cost effective spectroscopic techniques such as near infrared (NIR). Traditionally, the calibration problem has been solved by means of multiple linear regression to specify the model between X and Y. Traditional regression techniques, however, quickly fail when using spectroscopic data, as the number of wavelengths can easily be several hundred, often exceeding the number of chemical samples. This scenario, together with the high level of collinearity between wavelengths, will necessarily lead to singularity problems when calculating the regression coefficients. Ways of dealing with the collinearity problem include principal component regression (PCR), ridge regression (RR) and PLS regression. Both PCR and RR require a significant amount of computation when the number of variables is large. PLS overcomes the collinearity problem in a similar way as PCR, by modelling both the chemical and spectral data as functions of common latent variables. The quality of the employed reference method greatly impacts the coefficients of the regression model and therefore, the quality of its predictions. With both X and Y subject to random error, the quality the predictions of Y will be reduced with an increase in the level of noise. Previously conducted research focussed mainly on the effects of noise in X. This paper focuses on a method proposed by Dardenne and Fernández Pierna, called Noise Addition Partial Least Squares (NAPLS) that attempts to deal with the problem of poor reference values. Some aspects of the theory behind PCR, PLS and model selection is discussed. This is then followed by a discussion of the NAPLS algorithm. Both PLS and NAPLS are implemented on various datasets that arise in practice, in order to determine cases where NAPLS will be beneficial over conventional PLS. For each dataset, specific attention is given to the analysis of outliers, influential values and the linearity between X and Y, using graphical techniques. Lastly, the performance of the NAPLS algorithm is evaluated for various

APA, Harvard, Vancouver, ISO und andere Zitierweisen

44

Brusamento, Donato. „Improving pattern recognition based myocontrol of prosthetic hands via user-in-the-loop“. Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Den vollen Inhalt der Quelle finden

Annotation:

Il controllo di mani protesiche basato su elettromiografia (EMG) ha le potenzialità di ristabilire funzioni motorie ai pazienti che hanno subito un’amputazione, migliorando sensibilimente la qualità della vita. Tuttavia rimangono problemi aperti nell’ottenere un controllo ricco di movimenti e stabile, fra cui la presenza del limb position effect. La tesi si concentra nel cercare di ridurre questa causa di instabilità, proponendo una versione modificata dell’algoritmo Ridge Regression with Random Fourier Features, reso incrementale e arricchito di feedback all’utente. Questo approccio viene poi validato tramite un esperimento su 12 soggetti intatti, per verificare l’incremento di performance, e tramite un ulteriore studio pilota su un soggetto amputato, a seguito dell’adattamento del software ad una mano protesica in via di sviluppo.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

45

Boruvka, Audrey. „Data-driven estimation for Aalen's additive risk model“. Thesis, Kingston, Ont. : [s.n.], 2007. http://hdl.handle.net/1974/489.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

46

Bertoli, Claudia Damo. „Modelos e metodologias para estimação dos efeitos genéticos fixos em uma população multirracial Angus x Nelore“. reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2015. http://hdl.handle.net/10183/128116.

Der volle Inhalt der Quelle

Annotation:

Os objetivos deste trabalho foram estimar os efeitos genéticos fixos atuando sobre uma população sintética e testar diferentes modelos e metodologias neste processo de estimação. Os efeitos genéticos fixos testados foram os efeitos aditivos direto e materno de raça e não aditivos diretos e maternos de heterose, perdas epistáticas e complementariedade. Os modelos testados incluem alternada e conjuntamente todos estes efeitos. As metodologias de regressão de cumeeira e regressão por quadrados mínimos foram comparadas assim como dois métodos distintos para determinação do ridge parameter. Uma população sintética, envolvendo as raças Angus e Nelore foi utilizada. Foram utilizados 294.045 registros de desmame e 148.443 registros de sobreano de uma população sintética envolvendo as raças Angus e Nelore. Foram estudadas as seguintes características: ganho de peso do nascimento ao desmame (WG), escores de conformação (WC), precocidade (WP) e musculatura (WM) coletados ao desmame, ganho de peso do desmame ao sobreano (PG), escores fenotípicos de conformação (PC), precocidade (PP) e musculatura (PM) e perímetro escrotal (SC) coletados ao sobreano. Na maioria das análises, os efeitos genéticos fixos estimados foram estatisticamente significativos. O modelo completo, incluindo todos os efeitos genéticos fixos foi o mais indicado nas duas metodologias testadas. Na estimação por regressão de quadrados mínimos, o modelo mais parcimonioso foi o que incluiu apenas os efeitos aditivos de raça e não aditivos de heterose (dominância) e na estimação por regressão de cumeeira o mais parcimonioso foi o aquele que incluiu, além dos dois já referidos, os efeitos não aditivos de perdas epistáticas. As metodologias mostraram-se equivalentes, para os modelos que incluíram apenas efeito aditivo de raça e não aditivo de heterose. Todavia com a inclusão dos efeitos não aditivos de perdas epistáticas e/ou complementariedade, a regressão de cumeeira mostrou-se mais indicada até o momento em que os dados atingiram um determinado volume e estrutura, com grande parte das classes de composições raciais representadas na amostra e, a partir daí os modelos se mostraram equivalentes. Na comparação entre os métodos de determinação do ridge parameter, o mais indicado foi o método que identifica o menor valor possível que produz fatores de inflação de variância abaixo de 10 para todos os regressores estimados.
The objectives of this study were to estimate the fixed genetic effects acting on a synthetic population, as well as test different models and methodologies in this estimation process. The tested fixed genetic effects were the direct and maternal breed additive and direct and maternal heterosis, epistatic loss and complementarity non-additive effects The tested models include alternate and together all these effects. The ridge regression and least square regression methodologies were compared and were also compared two different methods for determining the ridge parameter to use in the ridge regression. A synthetic beef cattle population, involving Angus and Nellore in several breed combinations was used. 294,045 records at weaning and 148,443 records at yearling were used. The traits of weight gain from birth to weaning (WG), phenotypic scores of conformation (WC), precocity (WP) and muscling (WM) collected at weaning, weight gain from weaning to yearling (PG), phenotypic scores of conformation (PC), precocity (PP) and muscles (PM) collected at yearling and scrotal circumference (SC) were used in the analyzes. In most of analyzes, the estimated fixed genetic effects were statistically significant. The complete model, including all fixed genetic effects was the most suitable in the two tested methodologies. In the estimation by least squares regression, the most parsimonious model was the model that included only breed additive and non-additive heterosis (dominance) effects and in the estimation by ridge regression the most parsimonious model was that included, besides the breed additive and non-additive heterosis (dominance) effects, the non-additive epistatic loss effects. Comparing the two methodologies, for models that include only breed additive and non-additive heterosis effects, methodologies proved to be equivalent; with the inclusion of non-additive epistatic loss and / or complementarity effects, ridge regression was more indicated originally. After reached a certain volume and structure, with much of classes of breeds represented in the sample. Both least squares and ridge regression were equivalent. Comparing the methods for determining the ridge parameter, the best method was that which identifies the smallest possible value that produces the variance inflation factors below 10 for all estimated regressors.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

47

Bratu, Claudia. „Machine Learning of Crystal Formation Energies with Novel Structural Descriptors“. Thesis, Linköpings universitet, Teoretisk Fysik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-143203.

Der volle Inhalt der Quelle

Annotation:

To assist technology advancements, it is important to continue the search for new materials. The stability of a crystal structures is closely connected to its formation energy. By calculating the formation energies of theoretical crystal structures it is possible to find new stable materials. However, the number of possible structures are so many that traditional methods relying on quantum mechanics, such as Density Functional Theory (DFT), require too much computational time to be viable in such a project. A presented alternative to such calculations is machine learning. Machine learning is an umbrella term for algorithms that can use information gained from one set of data to predict properties of new, similar data. Feature vector representations (descriptors) are used to present data in an appropriate manner to the machine. Thus far, no combination of machine learning method and feature vector representation has been established as general and accurate enough to be of practical use for accelerating the phase diagram calculations necessary for predicting material stability. It is important that the method predicts all types of structures equally well, regardless of stability, composition, or geometrical structure. In this thesis, the performances of different feature vector representations were compared to each other. The machine learning method used was primarily Kernel Ridge Regression, implemented in Python. The training and validation were performed on two different datasets and subsets of these. The representation which consistently yielded the lowest cross-validated error was a representation using the Voronoi tessellation of the structure by Ward et. al. [Phys. Rev. B 96, 024104 (2017)]. Following up was an experimental representation called the SLATM representation presented by Huang and von Lilienfeld [arXiv:1707.04146], which is partially based on the Radial Distribution Function. The Voronoi representation achieved an MAE of 0.16 eV/atom at 3534 training set size for one of the sets, and 0.28 eV/atom at 10086 training set size for the other set. The effect of separating linear and non-linear energy contributions was evaluated using the sinusoidal and Coulomb representations. The result was that separating these improved the error for small training set sizes, but the effect diminishes as the training set size increases. The results from this thesis implicate that further work is still required for machine learning to be used effectively in the search for new materials.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

48

Dumora, Christophe. „Estimation de paramètres clés liés à la gestion d'un réseau de distribution d'eau potable : Méthode d'inférence sur les noeuds d'un graphe“. Thesis, Bordeaux, 2020. http://www.theses.fr/2020BORD0325.

Der volle Inhalt der Quelle

Annotation:

L'essor des données générées par les capteurs et par les outils opérationnels autour de la gestion des réseaux d'alimentation en eau potable (AEP) rendent ces systèmes de plus en plus complexes et de façon générale les événements plus difficiles à appréhender. L'historique de données lié à la qualité de l’eau distribuée croisé avec la connaissance du patrimoine réseau, des données contextuelles et des paramètres temporels amène à étudier un système complexe de par sa volumétrie et l'existence d'interactions entre ces différentes données de natures diverses pouvant varier dans le temps et l’espace. L'utilisation de graphes mathématiques permet de regrouper toute cette diversité et fournit une représentation complète des réseaux AEP ainsi que les évènements pouvant y survenir ou influer sur leur bon fonctionnement. La théorie des graphes associées à ces graphes mathématiques permet une analyse structurelle et spectrale des réseaux ainsi constitués afin de répondre à des problématiques métiers concrètes et d'améliorer des processus internes existants. Ces graphes sont ensuite utilisés pour répondre au problème d'inférence sur les noeuds d'un très grand graphe à partir de l'observation partielle de quelques données sur un faible nombre de noeuds. Une approche par algorithme d'optimisation sur les graphes est utilisée pour construire une variable numérique de débit en tout noeuds du graphe (et donc en tout point du réseau physique) à l'aide d'algorithme de flots et des données issues des débitmètres réseau. Ensuite une approche de prédiction par noyau reposant sur un estimateur pénalisé de type Ridge, qui soulève des problèmes d'analyse spectrale de grande matrice creuse, permet l'inférence d'un signal observé sur un certains nombre de noeuds en tout point d'un réseau AEP
The rise of data generated by sensors and operational tools around water distribution network (WDN) management make these systems more and more complex and in general the events more difficult to predict. The history of data related to the quality of distributed water crossed with the knowledge of network assets, contextual data and temporal parameters lead to study a complex system due to its volume and the existence of interactions between these various type of data which may vary in time and space. This big variety of data is grouped by the use of mathematical graph and allow to represent WDN as a whole and all the events that may arise therein or influence their proper functioning. The graph theory associated with these mathematical graphs allow a structural and spectral analysis of WDN to answer to specific needs and enhance existing process. These graphs are then used to answer the probleme of inference on the nodes of large graph from the observation of data on a small number of nodes. An approach by optminisation algorithm is used to construct a variable of flow on every nodes of a graph (therefore at any point of a physical network) using flow algorithm and data measured in real time by flowmeters. Then, a kernel prediction approach based on a Ridge estimator, which raises spectral analysis problems of a large sparse matrix, allow the inference of a signal measured on specific nodes of a graph at any point of a WDN

APA, Harvard, Vancouver, ISO und andere Zitierweisen

49

Lindblom, Ellen, und Isabelle Almquist. „Data-Driven Predictions of Heating Energy Savings in Residential Buildings“. Thesis, Uppsala universitet, Byggteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-387395.

Der volle Inhalt der Quelle

Annotation:

Along with the increasing use of intermittent electricity sources, such as wind and sun, comes a growing demand for user flexibility. This has paved the way for a new market of services that provide electricity customers with energy saving solutions. These include a variety of techniques ranging from sophisticated control of the customers’ home equipment to information on how to adjust their consumption behavior in order to save energy. This master thesis work contributes further to this field by investigating an additional incentive; predictions of future energy savings related to indoor temperature. Five different machine learning models have been tuned and used to predict monthly heating energy consumption for a given set of homes. The model tuning process and performance evaluation were performed using 10-fold cross validation. The best performing model was then used to predict how much heating energy each individual household could save by decreasing their indoor temperature by 1°C during the heating season. The highest prediction accuracy (of about 78%) is achieved with support vector regression (SVR), closely followed by neural networks (NN). The simpler regression models that have been implemented are, however, not far behind. According to the SVR model, the average household is expected to lower their heating energy consumption by approximately 3% if the indoor temperature is decreased by 1°C.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

50

Rahman, Md Abdur. „Statistical and Machine Learning for assessment of Traumatic Brain Injury Severity and Patient Outcomes“. Thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-37710.

Der volle Inhalt der Quelle

Annotation:

Traumatic brain injury (TBI) is a leading cause of death in all age groups, causing society to be concerned. However, TBI diagnostics and patient outcomes prediction are still lacking in medical science. In this thesis, I used a subset of TBIcare data from Turku University Hospital in Finland to classify the severity, patient outcomes, and CT (computerized tomography) as positive/negative. The dataset was derived from the comprehensive metabolic profiling of serum samples from TBI patients. The study included 96 TBI patients who were diagnosed as 7 severe (sTBI=7), 10 moderate (moTBI=10), and 79 mild (mTBI=79). Among them, there were 85 good recoveries (Good_Recovery=85) and 11 bad recoveries (Bad_Recovery=11), as well as 49 CT positive (CT. Positive=49) and 47 CT negative (CT. Negative=47). There was a total of 455 metabolites (features), excluding three response variables. Feature selection techniques were applied to retain the most important features while discarding the rest. Subsequently, four classifications were used for classification: Ridge regression, Lasso regression, Neural network, and Deep learning. Ridge regression yielded the best results for binary classifications such as patient outcomes and CT positive/negative. The accuracy of CT positive/negative was 74% (AUC of 0.74), while the accuracy of patient outcomes was 91% (AUC of 0.91). For severity classification (multi-class classification), neural networks performed well, with a total accuracy of 90%. Despite the limited number of data points, the overall result was satisfactory.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema „Regresión de Ridge“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an