To see the other types of publications on this topic, follow the link: Statistic and probability.

Dissertations / Theses on the topic 'Statistic and probability'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Statistic and probability.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Forgo, Vincent Z. Mr. "A Distribution of the First Order Statistic When the Sample Size is Random." Digital Commons @ East Tennessee State University, 2017. https://dc.etsu.edu/etd/3181.

Full text
Abstract:
Statistical distributions also known as probability distributions are used to model a random experiment. Probability distributions consist of probability density functions (pdf) and cumulative density functions (cdf). Probability distributions are widely used in the area of engineering, actuarial science, computer science, biological science, physics, and other applicable areas of study. Statistics are used to draw conclusions about the population through probability models. Sample statistics such as the minimum, first quartile, median, third quartile, and maximum, referred to as the five-number summary, are examples of order statistics. The minimum and maximum observations are important in extreme value theory. This paper will focus on the probability distribution of the minimum observation, also known as the first order statistic, when the sample size is random.
APA, Harvard, Vancouver, ISO, and other styles
2

Odei, James Beguah. "Statistical Modeling, Exploration, and Visualization of Snow Water Equivalent Data." DigitalCommons@USU, 2014. https://digitalcommons.usu.edu/etd/3871.

Full text
Abstract:
Due to a continual increase in the demand for water as well as an ongoing regional drought, there is an imminent need to monitor and forecast water resources in the Western United States. In particular, water resources in the IntermountainWest rely heavily on snow water storage. Thus, the need to improve seasonal forecasts of snowpack and considering new techniques would allow water resources to be more effectively managed throughout the entire water-year. Many available models used in forecasting snow water equivalent (SWE) measurements require delicate calibrations. In contrast to the physical SWE models most commonly used for forecasting, we offer a statistical model. We present a data-based statistical model that characterizes seasonal snow water equivalent in terms of a nested time-series, with the large scale focusing on the inter-annual periodicity of dominant signals and the small scale accommodating seasonal noise and autocorrelation. This model provides a framework for independently estimating the temporal dynamics of SWE for the various snow telemetry (SNOTEL) sites. We use SNOTEL data from ten stations in Utah over 34 water-years to implement and validate this model. This dissertation has three main goals: (i) developing a new statistical model to forecast SWE; (ii) bridging existing R packages into a new R package to visualize and explore spatial and spatio-temporal SWE data; and (iii) applying the newly developed R package to SWE data from Utah SNOTEL sites and the Upper Sheep Creek site in Idaho as case studies.
APA, Harvard, Vancouver, ISO, and other styles
3

Galijasevic, Amar, and Josef Tegbaru. "Can IPO first day returns be predicted? A multiple linear regression analysis." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254293.

Full text
Abstract:
During the last three years the Swedish stock market has showed a strong upwards movement from the lows of 2016. At the same time the IPO activity has been large and a lot of the offerings have had a positive return during the first day of trading in the market. The goal of this study is to analyze if there is any particular IPO specific data that has a correlation with the first day return and if it can be used to predict the first day return for future IPO’s. If any regressors were shown to have correlation with the first day return, the goal is also to find a subset of regressors with even higher predictability. Then to classify which regressors show the highest correlation with a large positive return. The method which has been used is a multiple linear regression with IPO-data from the period 2017-2018. The results from the study imply that none of the chosen regressors show any significant correlation with the first day return. It is a complicated process which might be difficult to simplify and quantify into a regression model, but further studies are needed to draw a conclusion if there are any other qualitative factors which correlate with the first day return.<br>Under de senaste tre åren har den svenska aktiemarknaden visat en kraftigt uppåtgående rörelse från de låga nivåerna 2016. Samtidigt har det varit hög IPO-aktivitet, där många noteringar har haft en positiv avkastning under den första handelsdagen. Målet med denna studie är att analysera om det finns särskilda IPO-specifika faktorer som påvisar samband med avkastningen från första handelsdagen och om det kan användas för att förutsäga utvecklingen under första handelsdagen för framtida noteringar. Om regressorerna visade korrelation är målet sedan att ta fram de bästa av dessa för att se om det ökar modellens säkerhet. Vidare var det av intresse att visa vilka regressorer som korrelerar med en positiv avkastning. Metoden som användes var en multipel linjär regression med historisk data från perioden 2017-2018. Studiens resultat visar att ingen av de valda regressorerna har någon signifikant korrelation med avkastningen under första handelsdagen. Börsintroduktioner är komplicerade processer som kan vara svåra att förenkla och kvantifiera i en regressionsmodell, men ytterligare studier behövs för att dra en slutsats om det finns andra kvalitativa faktorer som kan förklara utvecklingen under första handelsdagen.
APA, Harvard, Vancouver, ISO, and other styles
4

Echavarria, Gregory Maria Angelica. "Predictive Data-Derived Bayesian Statistic-Transport Model and Simulator of Sunken Oil Mass." Scholarly Repository, 2010. http://scholarlyrepository.miami.edu/oa_dissertations/471.

Full text
Abstract:
Sunken oil is difficult to locate because remote sensing techniques cannot as yet provide views of sunken oil over large areas. Moreover, the oil may re-suspend and sink with changes in salinity, sediment load, and temperature, making deterministic fate models difficult to deploy and calibrate when even the presence of sunken oil is difficult to assess. For these reasons, together with the expense of field data collection, there is a need for a statistical technique integrating limited data collection with stochastic transport modeling. Predictive Bayesian modeling techniques have been developed and demonstrated for exploiting limited information for decision support in many other applications. These techniques brought to a multi-modal Lagrangian modeling framework, representing a near-real time approach to locating and tracking sunken oil driven by intrinsic physical properties of field data collected following a spill after oil has begun collecting on a relatively flat bay bottom. Methods include (1) development of the conceptual predictive Bayesian model and multi-modal Gaussian computational approach based on theory and literature review; (2) development of an object-oriented programming and combinatorial structure capable of managing data, integration and computation over an uncertain and highly dimensional parameter space; (3) creating a new bi-dimensional approach of the method of images to account for curved shoreline boundaries; (4) confirmation of model capability for locating sunken oil patches using available (partial) real field data and capability for temporal projections near curved boundaries using simulated field data; and (5) development of a stand-alone open-source computer application with graphical user interface capable of calibrating instantaneous oil spill scenarios, obtaining sets maps of relative probability profiles at different prediction times and user-selected geographic areas and resolution, and capable of performing post-processing tasks proper of a basic GIS-like software. The result is a predictive Bayesian multi-modal Gaussian model, SOSim (Sunken Oil Simulator) Version 1.0rc1, operational for use with limited, randomly-sampled, available subjective and numeric data on sunken oil concentrations and locations in relatively flat-bottomed bays. The SOSim model represents a new approach, coupling a Lagrangian modeling technique with predictive Bayesian capability for computing unconditional probabilities of mass as a function of space and time. The approach addresses the current need to rapidly deploy modeling capability without readily accessible information on ocean bottom currents. Contributions include (1) the development of the apparently first pollutant transport model for computing unconditional relative probabilities of pollutant location as a function of time based on limited available field data alone; (2) development of a numerical method of computing concentration profiles subject to curved, continuous or discontinuous boundary conditions; (3) development combinatorial algorithms to compute unconditional multimodal Gaussian probabilities not amenable to analytical or Markov-Chain Monte Carlo integration due to high dimensionality; and (4) the development of software modules, including a core module containing the developed Bayesian functions, a wrapping graphical user interface, a processing and operating interface, and the necessary programming components that lead to an open-source, stand-alone, executable computer application (SOSim - Sunken Oil Simulator). Extensions and refinements are recommended, including the addition of capability for accepting available information on bathymetry and maybe bottom currents as Bayesian prior information, the creation of capability of modeling continuous oil releases, and the extension to tracking of suspended oil (3-D).
APA, Harvard, Vancouver, ISO, and other styles
5

Ministr, Martin. "Matematický model výskytu závad na vybraných stanicích montážní linky motorů." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2008. http://www.nusl.cz/ntk/nusl-228263.

Full text
Abstract:
The quality and the defectiveness of production are important factors in all sectors of engineering industry. On their improve there are a lot of effective tools, descriptive statistics is one of them. Descriptive statistic has many applicable tools but this work primarily focuses on desription of production using time series plot, testing of statistical hypothesis, testing of mutual dependencies of particular defects and control diagrams drawing. It is about a project that connects practice with theory.
APA, Harvard, Vancouver, ISO, and other styles
6

Philippe, Anne. "Contribution à la théorie des lois de référence et aux méthodes de Monte Carlo." Rouen, 1997. http://www.theses.fr/1997ROUES005.

Full text
Abstract:
Cette thèse est composée de deux parties distinctes : la première relève de la statistique bayésienne et la seconde des méthodes de Monte Carlo. Nous étudions, dans la première partie, l'apport des lois non informatives de référence. Nous obtenons, via ces lois et les régions de confiance bayésiennes, une solution au classique problème de Fieller-Creasy posé par le modèle de calibration. Un deuxième problème est l'estimation de fonctions quadratiques d'une moyenne normale. Il conduit à de surprenantes complications dans les inférences bayésiennes et fréquentistes. Nous évaluons les propriétés de ces lois et plus particulièrement leurs propriétés de couverture pour ce modèle. La seconde partie de cette thèse est consacrée à l'estimation des intégrales par les méthodes de Monte Carlo. Nous introduisons un estimateur de Monte Carlo basé sur les propriétés des sommes de Riemann. Nous montrons que cet estimateur possède des propriétés de convergence supérieures aux approches classiques. Nous montrons que la méthode d'échantillonnage pondéré se généralise à notre estimateur et produit un estimateur optimal en terme de réduction de la variance. Nous généralisons notre estimateur aux méthodes de Monte Carlo par chaines de Markov. De plus, nous établissons un critère de contrôle de convergence des chaines de Markov issues des algorithmes de Monte Carlo par chaines de Markov. L'étape de simulation des variables aléatoires, qui apparait dans les méthodes de Monte Carlo, est abordée dans notre étude des lois gamma tronquées. Nous déterminons des algorithmes d'acceptation-rejet dominant les approches classiques. Nous avons illustré les différents résultats obtenus par de nombreuses simulations.
APA, Harvard, Vancouver, ISO, and other styles
7

Morais, Sílvia Cristina Dorneles de. "“EXCEL: uma alternativa para o ensino de probabilidade e estatística”." Universidade Federal de Goiás, 2016. http://repositorio.bc.ufg.br/tede/handle/tede/6381.

Full text
Abstract:
Submitted by Cássia Santos (cassia.bcufg@gmail.com) on 2016-10-11T10:31:41Z No. of bitstreams: 2 Dissertação - Silvia Cristina Dorneles de Morais - 2016.pdf: 2108011 bytes, checksum: 4f95066b3d4fd9191a716470ccfbdfb4 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)<br>Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2016-10-11T11:40:43Z (GMT) No. of bitstreams: 2 Dissertação - Silvia Cristina Dorneles de Morais - 2016.pdf: 2108011 bytes, checksum: 4f95066b3d4fd9191a716470ccfbdfb4 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)<br>Made available in DSpace on 2016-10-11T11:40:43Z (GMT). No. of bitstreams: 2 Dissertação - Silvia Cristina Dorneles de Morais - 2016.pdf: 2108011 bytes, checksum: 4f95066b3d4fd9191a716470ccfbdfb4 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2016-09-30<br>Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES<br>In that work was realized a short analysis of challenges of Maths' education in the High School, in special probability and statistics, that refer to adequation of new technologys availables, distinguishing the difficults found to insert computationals resources in the classroom's every day. We emphasize the using electronics spreadsheets, in that case, Excel like a tool good to learn probability and statistic, searching to balance work's teacher to objectives presents in the Nationals Curriculums Parameters (NCP) and in the new National Common Base Curricular (NCBC), which is in the final process of elaboration. By a research with IFG's teachers, we notice that, although having access to informatics laboratories adequated to education, almost teachers do not use them. Before this situation noticed, we propose some simple activities created with the objective to connective theory from didatics books with the use of electronics spreadsheets, trying to contribute for critical formation of students.<br>Nesse trabalho foi realizada uma breve análise dos desafios do ensino da matemática no Ensino Médio, em especial da probabilidade e estatística, no que se refere à adequação às novas tecnologias disponíveis, salientando as dificuldades encontradas ao inserir recursos computacionais no cotidiano da sala de aula. Enfatizamos a utilização de planilhas eletrônicas, nesse caso do Excel, como uma ferramenta benéfica ao ensino de probabilidade e estatística na busca de adequar o trabalho docente aos objetivos propostos nos Parâmetros Curriculares Nacionais (PCN's) e na nova Base Nacional Curricular Comum (BNCC), que se encontra em fase final de elaboração. Através de um questionário aplicado junto a professores do IFG, constatamos que, mesmo tendo acesso a laboratórios de informática adequados ao ensino, a maioria dos professores não faz uso dos mesmos. Diante da situação vivenciada propomos algumas atividades simples, elaboradas com o objetivo de aliar a teoria estudada nos livros didáticos e a utilização de planilhas eletrônicas, procurando assim contribuir para uma formação crítica dos alunos.
APA, Harvard, Vancouver, ISO, and other styles
8

Lu, Yinghua. "Empirical Likelihood Inference for the Accelerated Failure Time Model via Kendall Estimating Equation." Digital Archive @ GSU, 2010. http://digitalarchive.gsu.edu/math_theses/76.

Full text
Abstract:
In this thesis, we study two methods for inference of parameters in the accelerated failure time model with right censoring data. One is the Wald-type method, which involves parameter estimation. The other one is empirical likelihood method, which is based on the asymptotic distribution of likelihood ratio. We employ a monotone censored data version of Kendall estimating equation, and construct confidence intervals from both methods. In the simulation studies, we compare the empirical likelihood (EL) and the Wald-type procedure in terms of coverage accuracy and average length of confidence intervals. It is concluded that the empirical likelihood method has a better performance. We also compare the EL for Kendall’s rank regression estimator with the EL for other well known estimators and find advantages of the EL for Kendall estimator for small size sample. Finally, a real clinical trial data is used for the purpose of illustration.
APA, Harvard, Vancouver, ISO, and other styles
9

Le, cousin Jean-Maxime. "Asymptotique des feux rares dans le modèle des feux de forêts." Thesis, Paris Est, 2015. http://www.theses.fr/2015PESC1018/document.

Full text
Abstract:
Dans cette thèse, nous nous intéressons à deux modèles de feux de forêts définis sur Z. On étudie le modèle des feux de forêts sur Z avec propagation non instantanée dans le chapitre 2. Dans ce modèle, chaque site a trois états possibles : vide, occupé ou en feu. Un site vide devient occupé avec taux 1. Sur chaque site, des allumettes tombent avec taux λ. Si le site est occupé, il brûle pendant un temps exponentiel de paramètre π avant de se propager à ses deux voisins. S’ils sont eux-mêmes occupés, ils brûlent, sinon le feu s’éteint. On étudie l’asymptotique des feux rares c’est à dire la limite du processus lorsque λ → 0 et π → ∞. On montre qu’il y a trois catégories possibles de limites d’échelles, selon le régime dans lequel λ tend vers 0 et π vers l’infini. On étudie formellement et brièvement dans le chapitre 3 le modèle des feux de forêts sur Z en environnement aléatoire. Dans ce modèle, chaque site n’a que deux états possibles : vide ou occupé. On se donne un paramètre λ &gt; 0, une loi ν sur (0 ,∞) et une suite (κi)i∈Z de variables aléatoires indépendantes identiquement distribuées selon ν. Un site vide i devient occupé avec taux κi. Sur chaque site, des allumettes tombent avec taux λ et détruisent immédiatement la composante de sites occupés correspondante. On étudie l’asymptotique des feux rares. Sous une hypothèse raisonnable sur ν, on espère que le processus converge, avec une renormalisation correcte, vers un modèle limite. On s’attend à distinguer trois processus limites différents<br>The aim of this work is to study two differents forest-fire processes defined on Z. In Chapter 2, we study the so-called one dimensional forest-fire process with non instantaeous propagation. In this model, each site has three possible states: ’vacant’, ’occupied’ or ’burning’. Vacant sites become occupied at rate 1. At each site, ignition (by lightning) occurs at rate λ. When a site is ignited, a fire starts and propagates to neighbors at rate π. We study the asymptotic behavior of this process as λ → 0 and π → ∞. We show that there are three possible classes of scaling limits, according to the regime in which λ → 0 and π → ∞. In Chapter 3, we study formally and briefly the so-called one dimensional forest-fire processes in random media. Here, each site has only two possible states: ’vacant’ or occupied’. Consider a parameter λ &gt; 0, a probability distribution ν on (0 ,∞) as well as (κi)i∈Z an i.i.d. sequence of random variables with law ν. A vacant site i becomes occupied at rate κi. At each site, ignition (by lightning) occurs at rate λ. When a site is ignited, the fire destroys the corresponding component of occupied sites. We study the asymptotic behavior of this process as λ → 0. Under some quite reasonable assumptions on the law ν, we hope that the process converges, with a correct normalization, to a limit forest fire model. We expect that there are three possible classes of scaling limits
APA, Harvard, Vancouver, ISO, and other styles
10

Bouadoumou, Maxime K. "Jackknife Empirical Likelihood for the Accelerated Failure Time Model with Censored Data." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/math_theses/112.

Full text
Abstract:
Kendall and Gehan estimating functions are used to estimate the regression parameter in accelerated failure time (AFT) model with censored observations. The accelerated failure time model is the preferred survival analysis method because it maintains a consistent association between the covariate and the survival time. The jackknife empirical likelihood method is used because it overcomes computation difficulty by circumventing the construction of the nonlinear constraint. Jackknife empirical likelihood turns the statistic of interest into a sample mean based on jackknife pseudo-values. U-statistic approach is used to construct the confidence intervals for the regression parameter. We conduct a simulation study to compare the Wald-type procedure, the empirical likelihood, and the jackknife empirical likelihood in terms of coverage probability and average length of confidence intervals. Jackknife empirical likelihood method has a better performance and overcomes the under-coverage problem of the Wald-type method. A real data is also used to illustrate the proposed methods.
APA, Harvard, Vancouver, ISO, and other styles
11

Pereira, Mailson Matos. "Oficinas de Probabilidade e Estatística: Uma proposta de intervenção no ensino e aprendizagem de Matemática." Universidade Estadual da Paraíba, 2017. http://tede.bc.uepb.edu.br/tede/jspui/handle/tede/2763.

Full text
Abstract:
Submitted by Jean Medeiros (jeanletras@uepb.edu.br) on 2017-03-17T12:30:54Z No. of bitstreams: 1 PDF - Mailson Matos Pereira.pdf: 24505333 bytes, checksum: 8d5131337b622912668da01c191f13e8 (MD5)<br>Approved for entry into archive by Secta BC (secta.csu.bc@uepb.edu.br) on 2017-03-22T15:17:32Z (GMT) No. of bitstreams: 1 PDF - Mailson Matos Pereira.pdf: 24505333 bytes, checksum: 8d5131337b622912668da01c191f13e8 (MD5)<br>Made available in DSpace on 2017-03-22T15:17:32Z (GMT). No. of bitstreams: 1 PDF - Mailson Matos Pereira.pdf: 24505333 bytes, checksum: 8d5131337b622912668da01c191f13e8 (MD5) Previous issue date: 2017-02-10<br>Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES<br>Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq<br>The main objective of this work is to present the results of a differentiated approach to probability and statistics in secondary education. In order to reach the objective, five o activities were applied with the students of the 3 year of the State School of Elementary and Middle Arruda Câmara located at the city of Pombal-PB. The purpose of the activities was to develop a new look at mathematics, to enable them to better understand probabilistic and statistical concepts, to construct and read graphs and tables, and to associate and use statistics and Probability in their social everyday. In the course of the work, the following will be presented: a description of the workshops, their application and the results obtained. We will present the class profile and the impressions of the students about the impact of the work on their learning.<br>Este trabalho tem como principal objetivo apresentar os resultados de uma abordagem diferenciada dos conteúdos de Probabilidade e Estatística no ensino médio. Para atingir o o objetivo foram aplicadas cinco oficinas de Matemática com os alunos do 3 ano da Escola Estadual de Ensino Fundamental e Médio Arruda Câmara localizada na cidade de Pombal-PB. A aplicação das oficinas tinha como objetivo desenvolver nos educandos um novo olhar sobre a matemática, fazer com que os mesmos fossem capazes de compreender conceitos probabilísticos e estatísticos, construir e fazer leitura de gráficos e tabelas, bem como associar e utilizar estudo de Estatística e Probabilidade em seu cotidiano social. No decorrer do trabalho, serão apresentados: a descrição das oficinas, sua aplicação e os resultados obtidos. Por fim, apresentaremos o perfil da turma e as impressões dos alunos sobre o impacto do trabalho na sua aprendizagem
APA, Harvard, Vancouver, ISO, and other styles
12

Nordvall, Lagerås Andreas. "Markov Chains, Renewal, Branching and Coalescent Processes : Four Topics in Probability Theory." Doctoral thesis, Stockholm University, Department of Mathematics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-6637.

Full text
Abstract:
<p>This thesis consists of four papers.</p><p>In paper 1, we prove central limit theorems for Markov chains under (local) contraction conditions. As a corollary we obtain a central limit theorem for Markov chains associated with iterated function systems with contractive maps and place-dependent Dini-continuous probabilities.</p><p>In paper 2, properties of inverse subordinators are investigated, in particular similarities with renewal processes. The main tool is a theorem on processes that are both renewal and Cox processes.</p><p>In paper 3, distributional properties of supercritical and especially immortal branching processes are derived. The marginal distributions of immortal branching processes are found to be compound geometric.</p><p>In paper 4, a description of a dynamic population model is presented, such that samples from the population have genealogies as given by a Lambda-coalescent with mutations. Depending on whether the sample is grouped according to litters or families, the sampling distribution is either regenerative or non-regenerative.</p>
APA, Harvard, Vancouver, ISO, and other styles
13

Sheffet, Malka, and Ronit Bassan-Cincinatus. "Probability in Mathematics: Facing Probability in Everyday Life." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-83075.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Kart, Özlem. "A Historical Survey of the Development of Classical Probability Theory." Thesis, Uppsala universitet, Analys och sannolikhetsteori, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-359774.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Lesser, Elizabeth Rochelle. "A New Right Tailed Test of the Ratio of Variances." UNF Digital Commons, 2016. http://digitalcommons.unf.edu/etd/719.

Full text
Abstract:
It is important to be able to compare variances efficiently and accurately regardless of the parent populations. This study proposes a new right tailed test for the ratio of two variances using the Edgeworth’s expansion. To study the Type I error rate and Power performance, simulation was performed on the new test with various combinations of symmetric and skewed distributions. It is found to have more controlled Type I error rates than the existing tests. Additionally, it also has sufficient power. Therefore, the newly derived test provides a good robust alternative to the already existing methods.
APA, Harvard, Vancouver, ISO, and other styles
16

Hild, Andreas. "ESTIMATING AND EVALUATING THE PROBABILITY OF DEFAULT – A MACHINE LEARNING APPROACH." Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447385.

Full text
Abstract:
In this thesis, we analyse and evaluate classification models for panel credit risk data. Variables are selected based on results from recursive feature elimination as well as economic reasoning where the probability of default is estimated. We employ several machine learning and statistical techniques and assess the performance of each model based on AUC, Brier score as well as the absolute mean difference between the predicted and the actual outcome, carried out with cross validation of four folds and extensive hyperparameter optimization. The LightGBM model had the best performance and many machine learning models showed a superior performance compared to traditional models like logistic regression. Hence, the results of this thesis show that machine learning models like gradient boosting models, neural networks and voting models have the capacity to challenge traditional statistical methods such as logistic regression within credit risk modelling.
APA, Harvard, Vancouver, ISO, and other styles
17

Passos, Homailson Lopes. "Planejamento de experimentos no ensino da estatística e probabilidade nas séries finais do ensino fundamental II." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/97/97138/tde-04122018-145513/.

Full text
Abstract:
Este trabalho apresenta uma proposta para o ensino da Estatística e Probabilidade nas séries finais do Ensino Fundamental II. Seu objetivo é mostrar que a metodologia aqui adotada possibilita a aquisição de conceitos estatísticos e probabilísticos, assim como o desenvolvimento de habilidades pessoais e interpessoais. Trata de um projeto fundamentado em Planejamento de Experimentos com respaldo na aprendizagem ativa. Na sequência didática do projeto os alunos realizaram um experimento com aviões de papel no qual tiveram que responder, na prática, a seguinte questão \"Quais alterações podem ser feitas em um modelo de avião de papel para que ele permaneça mais tempo no ar?\". Para atestar a efetividade da sequência didática, foi construído e validado um Teste de Proficiência em Estatística e Probabilidade (TEPEP) com base nos fundamentos da Psicometria. A análise das características do teste foi feita por meio da Teoria Clássica dos Testes e da Teoria de Resposta ao Item. Foram sujeitos da pesquisa 391 alunos de escolas públicas e particulares da região do Vale do Paraíba, Estado de São Paulo. Desse total, 374 auxiliaram na validação do instrumento e os 17 alunos restantes participaram do projeto. Os resultados deste trabalho mostraram que o uso de Planejamento de Experimentos favoreceu a aprendizagem da Estatística e Probabilidade, desenvolvendo também outras competências. Em relação à validação do TEPEP, concluiu-se que os métodos psicométricos empregados têm grande potencial e devem ser mais explorados. Esta pesquisa apresenta, como produtos finais, a metodologia desenvolvida e o teste de proficiência construído, oferecendo ambos a professores e pesquisadores.<br>This work presents a proposal for teaching of Statistics and Probability, for the last years of Elementary School. Its objective is to show that the methodology adopted here allows the acquisition of statistical and probabilistic concepts, as well as the development of personal and interpersonal skills. It\'s a project with a didactic sequence grounded in Designs of Experiments, supported in active learning. In the didactic sequence of the project the students carried out an experiment with paper airplanes in which they had to answer, in practice, the following question \"What changes can be made to a paper airplane model so that it stays longer in the air?\". To attest the effectiveness of the didactic sequence, it was developed and validated a Proficiency Test in Statistics and Probability (PTSP), this using Psychometry. The analysis of the characteristics of the test was made through the Classical Test Theory and the Item Response Theory. The research subjects were a total of 391 students from public and private schools in the Vale do Paraíba region, State of São Paulo. Of this total, 17 students participated in the project. The other students (374) assisted in the test validation. The results of this research showed that the use of Design of Experiments favored the learning of Statistic and Probability, also to develop others competences. Regarding the validation of PTSP, it could be concluded that the psychometric methods used have potential and they should be more explored. This research acclaims, as final products, the developed methodology and the Proficiency test validated, both offered to teachers and researchers.
APA, Harvard, Vancouver, ISO, and other styles
18

Olofsson, Isak. "@TheRealDonaldTrump’s tweets correlation with stock market volatility." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-275683.

Full text
Abstract:
The purpose of this study is to analyze if there is any tweet specific data posted by Donald Trump that has a correlation with the volatility of the stock market. If any details about the president Trump's tweets show correlation with the volatility, the goal is to find a subset of regressors with as high as possible predictability. The content of tweets is used as the base for regressors. The method which has been used is a multiple linear regression with tweet and volatility data ranging from 2010 until 2020. As a measure of volatility, the Cboe VIX has been used, and the regressors in the model have focused on the content of tweets posted by Trump using TF-IDF to evaluate the content of tweets. The results from the study imply that the chosen regressors display a small significant correlation of with an adjusted R2 = 0.4501 between Trump´s tweets and the market volatility. The findings Include 78 words with correlation to stock market volatility when part of President Trump's tweets. The stock market is a large and complex system of many unknowns, which aggravate the process of simplifying and quantifying data of only one source into a regression model with high predictability.<br>Syftet med denna studie är att analysera om det finns några specifika egenskaper i de tweets publicerade av Donald Trump som har en korrelation med volatiliteten på aktiemarknaden. Om egenskaper kring president Trumps tweets visar ett samband med volatiliteten är målet att hitta en delmängd av regressorer med för att beskriva sambandet med så hög signifikans som möjligt. Innehållet i tweets har varit i fokus använts som regressorer. Metoden som har använts är en multipel linjär regression med tweet och volatilitetsdata som sträcker sig från 2010 till 2020. Som ett mått på volatilitet har Cboe VIX använts, och regressorerna i modellen har fokuserat på innehållet i tweets där TF-IDF har använts för att transformera ord till numeriska värden. Resultaten från studien visar att de valda regressorerna uppvisar en liten men signifikant korrelation med en justerad R2 = 0,4501 mellan Trumps tweets och marknadens volatilitet. Resultaten inkluderar 78 ord som de när en är en del av president Trumps tweets visar en signifikant korrelation till volatiliteten på börsen. Börsen är ett stort och komplext system av många okända, som försvårar processen att förenkla och kvantifiera data från endast en källa till en regressionsmodell med hög förutsägbarhet.
APA, Harvard, Vancouver, ISO, and other styles
19

Lindell, Andreas. "Theoretical and Practical Applications of Probability : Excursions in Brownian Motion, Risk Capital Stress Testing, and Hedging of Power Derivatives." Doctoral thesis, Stockholm : Department of Mathematics, Stockholm university, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-8570.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Romeiro, Renata Guimarães 1987. "Modelo de regressão Birnbaum-Saunders bivariado." [s.n.], 2014. http://repositorio.unicamp.br/jspui/handle/REPOSIP/307091.

Full text
Abstract:
Orientador: Filidor Edilfonso Vilca Labra<br>Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica<br>Made available in DSpace on 2018-08-24T16:29:00Z (GMT). No. of bitstreams: 1 Romeiro_RenataGuimaraes_M.pdf: 10761224 bytes, checksum: 3606332b6846c959d076e318f1667133 (MD5) Previous issue date: 2014<br>Resumo: O modelo de regressão Birnbaum-Saunders de Rieck e Nedelman (1991) tem sido amplamente discutido por vários autores, com aplicações na área de sobrevivência e confiabilidade. Neste trabalho, desenvolvemos um modelo de regressão Birnbaum-Saunders bivariado através do uso da distribuição Senh-Normal proposta por Rieck (1989). Este modelo de regressão pode ser utilizado para analisar logaritmos de tempos de vida de duas unidades correlacionadas, e gera marginais correspondentes aos modelos de regressão Birnbaum-Saunders univariados. Apresentamos um estudo de inferência e análise de diagnóstico para modelo de regressão Birnbaum-Saunders bivariado proposto. Em primeiro lugar, apresentamos os estimadores obtidos através do método dos momentos e de máxima verossimilhança, e a matriz de informação observada de Fisher. Além disso, discutimos testes de hipóteses com base na normalidade assintótica dos estimadores de máxima verossimilhança. Em segundo lugar, desenvolvemos um método de diagnóstico para o modelo de regressão Birnbaum- Saunders bivariado baseado na metodologia de Cook (1986). Finalmente, apresentamos alguns resultados de estudos de simulações e aplicações em dados reais<br>Abstract: The Birnbaum-Saunders regression model of Rieck and Nedelman (1991) has been extensively discussed by various authors with application in survival and reliability studies. In this work a bivariate Birnbaum-Saunders regression model is developed through the use of Sinh-Normal distribution proposed by Rieck (1989). This bivariate regression model can be used to analyze correlated log-time of two units, it bivariate regression model has its marginal as the Birnbaum- Saunders regression model. For the bivariate Birnbaum-Saunders regression model is discussed some of its properties, in the moment estimation, the maximum likelihood estimation and the observed Fisher information matrix. Hypothesis testing is performed by using the asymptotic normality of the maximum-likelihood estimators. Influence diagnostic methods are developed for this model based on the Cook¿s(1986) approach. Finally, the results of a simulation study as well as an application to a real data set are presented<br>Mestrado<br>Estatistica<br>Mestra em Estatística
APA, Harvard, Vancouver, ISO, and other styles
21

Guinaudeau, Alexandre. "Estimating the probability of event occurrence." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-246338.

Full text
Abstract:
In complex systems anomalous behaviors can occur intermittently and stochastically. In this case, it is hard to diagnose real errors among spurious ones. These errors are often hard to troubleshoot and require close attention, but troubleshooting each occurrence is time-consuming and is not always an option. In this thesis, we define two different models to estimate the underlying probability of occurrence of an error, one based on binary segmentation and null hypothesis testing, and the other one based on hidden Markov models. Given a threshold level of confidence, these models are tuned to trigger alerts when a change is detected with sufficiently high probability. We generated events drawn from Bernoulli distributions emulating these anomalous behaviors to benchmark these two candidate models. Both models have the same sensitivity, δp ≈ 10%, and delay, δt ≈ 100 observations, to detect change points. However, they do not generalize in the same way to broader problems and provide therefore two complementary solutions.<br>I komplexa system kan anomala beteenden uppträda intermittent och stokastiskt. I de här fallen är det svårt att diagnostisera verkliga fel bland falska sådana. Dessa fel är ofta svåra att felsöka och kräver noggrann uppmärksamhet, men felsökning av varje händelse är mycket tidskrävande och är inte alltid ett alternativ. I denna avhandling definierar vi två olika modeller för att uppskatta den underliggande sannolikheten för att ett fel uppträder, den första baserad på binär segmentering och prövning av nollhypotes, och den andra baserad på dolda Markovmodeller. Givet ett tröskelvärde för konfidensgraden är dessa modeller justerade för att utlösa varningar när en förändring detekteras med tillräcklig hög sannolikhet. Vi genererade händelser som drogs från Bernoullifördelningar som emulerar dessa avvikande beteenden för att utvärdera dessa två kandidatmodeller. Båda modellerna har samma sensitivitet, δp ≈ 10% och fördröjning, δt ≈ 100  observationer, för att upptäcka ändringspunkter. De generaliserar emellertid inte på samma sätt till större problem och ger därför två kompletterande lösningar.
APA, Harvard, Vancouver, ISO, and other styles
22

Eriksson, Anders. "Essays on Gaussian Probability Laws with Stochastic Means and Variances : With Applications to Financial Economics." Doctoral thesis, Uppsala University, Department of Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-5777.

Full text
Abstract:
<p>This work consists of four articles concerning Gaussian probability laws with stochastic means and variances. The first paper introduces a new way of approximating the probability distribution of a function of random variables. This is done with a Gaussian probability law with stochastic mean and variance. In the second paper an extension of the Generalized Hyperbolic class of probability distributions is presented. The third paper introduces, using a Gaussian probability law with stochastic mean and variance, a GARCH type stochastic process with skewed innovations. </p><p>In the fourth paper a Lévy process with second order stochastic volatility is presented, option pricing under such a process is also considered.</p>
APA, Harvard, Vancouver, ISO, and other styles
23

Berglund, Filip. "Asymptotics of beta-Hermite Ensembles." Thesis, Linköpings universitet, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-171096.

Full text
Abstract:
In this thesis we present results about some eigenvalue statistics of the beta-Hermite ensembles, both in the classical cases corresponding to beta = 1, 2, 4, that is the Gaussian orthogonal ensemble (consisting of real symmetric matrices), the Gaussian unitary ensemble (consisting of complex Hermitian matrices) and the Gaussian symplectic ensembles (consisting of quaternionic self-dual matrices) respectively. We also look at the less explored general beta-Hermite ensembles (consisting of real tridiagonal symmetric matrices). Specifically we look at the empirical distribution function and two different scalings of the largest eigenvalue. The results we present relating to these statistics are the convergence of the empirical distribution function to the semicircle law, the convergence of the scaled largest eigenvalue to the Tracy-Widom distributions, and with a different scaling, the convergence of the largest eigenvalue to 1. We also use simulations to illustrate these results. For the Gaussian unitary ensemble, we present an expression for its level density. To aid in understanding the Gaussian symplectic ensemble we present properties of the eigenvalues of quaternionic matrices. Finally, we prove a theorem about the symmetry of the order statistic of the eigenvalues of the beta-Hermite ensembles.<br>I denna kandidatuppsats presenterar vi resultat om några olika egenvärdens-statistikor från beta-Hermite ensemblerna, först i de klassiska fallen då beta = 1, 2, 4, det vill säga den gaussiska ortogonala ensemblen (bestående av reella symmetriska matriser), den gaussiska unitära ensemblen (bestående av komplexa hermitiska matriser) och den gaussiska symplektiska ensemblen (bestående av kvaternioniska själv-duala matriser). Vi tittar även på de mindre undersökta generella beta-Hermite ensemblerna (bestående av reella symmetriska tridiagonala matriser). Specifikt tittar vi på den empiriska fördelningsfunktionen och två olika normeringar av det största egenvärdet. De resultat vi presenterar för dessa statistikor är den empiriska fördelningsfunktionens konvergens mot halvcirkel-fördelningen, det normerade största egenvärdets konvergens mot Tracy-Widom fördelningen, och, med en annan normering, största egenvärdets konvergens mot 1. Vi illustrerar även dessa resultat med hjälp av simuleringar. För den gaussiska unitära ensemblen presenterar vi ett uttryck för dess nivåtäthet. För att underlätta förståelsen av den gaussiska symplektiska ensemblen presenterar vi egenskaper hos egenvärdena av kvaternioniska matriser. Slutligen bevisar vi en sats om symmetrin hos ordningsstatistikan av egenvärdena av beta-Hermite ensemblerna.
APA, Harvard, Vancouver, ISO, and other styles
24

Murphy, Sean. "Some topics in spatial probability and statistics." Thesis, University of Bath, 1989. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.280810.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

RYSZ, TERI. "METACOGNITION IN LEARNING ELEMENTARY PROBABILITY AND STATISTICS." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1099248340.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Albing, Malin. "Process capability analysis with focus on indices for one-sided specification limits." Licentiate thesis, Luleå : Luleå University of Technology, 2006. http://epubl.ltu.se/1402-1757/2006/71/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Engström, Alva, and Filippa Frithz. "Measuring the impact of strategic and tactic allocation for managed futures portfolios." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252303.

Full text
Abstract:
The optimal asset allocation is an ever current matter for investment managers. This thesis aims to investigate the impact of risk parity and target volatility on the Sharpe ratio of a portfolio consisting of futures contracts on equity indices and bonds during the period 2000-2018. In addition, this thesis examines on which level - instrument, asset class or total portfolio level - a momentum strategy has the largest effect. This is done by applying design of experiments. The final result in this thesis finds that risk parity and target volatility improve the Sharpe ratio compared to a classic 60/40 capital allocation. Furthermore, utilising momentum strategies is the most beneficial on the asset class level, i.e. to allocate between equitiy indices and bond futures.<br>Den optimala tillgångsallokeringen är ett konstant aktuellt ämne. Den här uppsatsen ämnar undersöka effekten av riskviktning och målrisk på Sharpekvoten för en portfölj som handlar terminskontrakt på aktieindex och obligationer mellan 2000 och 2018. Dessutom undersöker denna uppsats på vilken nivå - instrument, tillgångsklass eller total portföljnivå - som en momentumstrategi har störst effekt. Vilket undersöks med statitisk försöksplanering. Det slutgiltiga resultatet i denna uppsats visar att riskviktning och målrisk förbättrar Sharpe-kvoten jämfört med en klassisk 60/40 kapitalallokering. Vidare är nyttjande av momentumstrategier det mest fördelaktiga på tillgångsklassnivå, det vill säga att allokera mellan aktieindex- och obligationsterminskontrakt.
APA, Harvard, Vancouver, ISO, and other styles
28

Cui, Titing. "Short term traffic speed prediction on a large road network." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252345.

Full text
Abstract:
Traffic flow speed prediction has been an important element in the application of intelligent transportation system (ITS). The timely and accurate traffic flow speed prediction can be utilized to support the control, management, and improvement of traffic conditions. In this project, we investigate the short term traffic flow speed prediction on a large highway network. To eliminate the vagueness, we first give a formal mathematical definition of traffic flow speed prediction problem on a road network. In the last decades, traffic flow prediction research has been advancing from the theoretically well established parametric methods to nonparametric data-driven algorithms, like the deep neural networks. In this research, we give a detailed review of the state-of-art prediction models appeared in the literature.However, we find that the road networks are rather small in most of the literature, usually hundreds of road segments. The highway network in our project is much larger, consists of more than eighty thousand road segments, which makes it almost impossible to use the models in the literature directly. Therefore, in this research, we employ the time series clustering method to divide the road network into different disjoint regions. After that, several prediction models include historical average (HA), univariate and vector Autoregressive Integrated Moving Average model (ARIMA), support vector regression (SVR), Gaussian process regression (GPR), Stacked Autoencoders (SAEs), long short-term memory neural networks (LSTM) are selected to do the prediction on each region. We give a performance analysis of selected models at the end of the thesis.<br>Trafikflöde förutsägelse är ett viktigt element i intelligenta transportsystem (ITS). Den läglig och exakta trafikflödes hastighet förutsägelse kan utnyttjas för att stödja kontrollen, hanteringen och förbättringen av trafikförhållandena. I det här projektet undersöker vi korttidsprognosens hastighetsprediktion på ett stort motorvägsnät. För att eliminera vaghet, vi först en formell matematisk definition av trafikflödeshastighetsprognosproblem på ett vägnät. Under de senaste årtiondena har prognosis för trafik flödeshastighet frodas från de teoretiskt väl etablerade parametriska metoderna till icke-parametriska data-driven algoritmer, som de djupa neurala nätverken. I den här undersökningen ger vi en detaljerad granskning av de modernaste prediksionsmodellerna i litteraturen.Vi finner dock att vägnätet är ganska litet i de flesta av litteraturen, vanligtvis hundratals vägsegment. Motorvägsnätverket i vårt projekt är mycket större, består av mer än 80 tusen vägsegment, vilket gör det nästan omöjligt att direkt använda modellerna i litteraturen. Därför använder vi i tidsserien klustermetoden för att dela upp vägnätet i olika åtskilja regioner. Därefter innehåller flera prediktionsmodeller historisk medelvärde (HA), univariate och vector Autoregressive Integrated Moving Average-modellen (ARIMA), stödvektorregression (SVR), Gaussian processregression (GPR), Staplade Autoenkodare (SAEs) neurala nätverk (LSTM) väljs för att göra förutsägelsen för varje region. Vi ger en prestationsanalys av utvalda modeller i slutet av avhandlingen.
APA, Harvard, Vancouver, ISO, and other styles
29

Huang, Xin. "A study on the application of machine learning algorithms in stochastic optimal control." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252541.

Full text
Abstract:
By observing a similarity between the goal of stochastic optimal control to minimize an expected cost functional and the aim of machine learning to minimize an expected loss function, a method of applying machine learning algorithm to approximate the optimal control function is established and implemented via neural approximation. Based on a discretization framework, a recursive formula for the gradient of the approximated cost functional on the parameters of neural network is derived. For a well-known Linear-Quadratic-Gaussian control problem, the approximated neural network function obtained with stochastic gradient descent algorithm manages to reproduce to shape of the theoretical optimal control function, and application of different types of machine learning optimization algorithm gives quite close accuracy rate in terms of their associated empirical value function. Furthermore, it is shown that the accuracy and stability of machine learning approximation can be improved by increasing the size of minibatch and applying a finer discretization scheme. These results suggest the effectiveness and appropriateness of applying machine learning algorithm for stochastic optimal control.<br>Genom att observera en likhet mellan målet för stokastisk optimal styrning för att minimera en förväntad kostnadsfunktionell och syftet med maskininlärning att minimera en förväntad förlustfunktion etableras och implementeras en metod för att applicera maskininlärningsalgoritmen för att approximera den optimala kontrollfunktionen via neuralt approximation. Baserat på en diskretiseringsram, härleds en rekursiv formel för gradienten av den approximerade kostnadsfunktionen på parametrarna för neuralt nätverk. För ett välkänt linjärt-kvadratisk-gaussiskt kontrollproblem lyckas den approximerade neurala nätverksfunktionen erhållen med stokastisk gradient nedstigningsalgoritm att reproducera till formen av den teoretiska optimala styrfunktionen och tillämpning av olika typer av algoritmer för maskininlärning optimering ger en ganska nära noggrannhet med avseende på deras motsvarande empiriska värdefunktion. Vidare är det visat att noggrannheten och stabiliteten hos maskininlärning simetrationen kan förbättras genom att öka storleken på minibatch och tillämpa ett finare diskretiseringsschema. Dessa resultat tyder på effektiviteten och lämpligheten av att tillämpa maskininlärningsalgoritmen för stokastisk optimal styrning.
APA, Harvard, Vancouver, ISO, and other styles
30

Bofeldt, Josefine, and Sara Joon. "Pricing of a balance sheet option limited by a minimum solvency boundary." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252542.

Full text
Abstract:
Pension companies are required by law to remain above a certain solvency level. The main purpose of this thesis is to determine the cost of remaining above a lower solvency level for different pension companies. This will be modelled by an option with a balance sheet as the underlying asset. The balance sheet is assumed to consist of bonds, stocks, liabilities and own funds. Both liabilities and bonds are modelled using forward rates. Data used in this thesis is historical stock prices and forward rates. Several potential models for stock and forward rate processes are considered. Examples of models considered are Bates model, Libor market model and a discrete model based on normal log-normal mixture random variables which have different properties and distributions. The discrete normal log-normal mixture model is concluded to be the model best suited for stocks and bonds, i.e. the assets, and for liabilities. The price of the balance sheet option is determined using quasi-Monte Carlo simulations. The price is determined in relation to the initial value of the own funds for different portfolios with different initial solvency levels and different lower solvency bounds. The price as a function of the lower solvency bound seems to be an exponential function and varies depending on portfolio, initial solvency level and lower solvency bound. The price converges with sufficient accuracy. It is concluded that the model proves that remaining above a lower solvency level results in a significant cost for the pension company. A further improvement suggested is to validate the constructed model with other models.<br>Enligt lag måste pensionsbolag säkerställa att solvenskvoten för deras portfölj inte understiger en lägre gräns. Syftet med detta examensarbete är att bestämma kostnaden för att förbli ovan en lägre solvensgräns för olika pensionsbolag. Detta modelleras med hjälp av en option som har pensionsbolagets balansräkning som underliggande tillgång. Det antas att balansräkningen består av tillgångar i form av aktier och obligationer, skulder samt eget kapital. Både skulder och obliationer modelleras med hjälp av forwardräntor. Data som används i detta arbete är historiska aktiepriser samt forwardräntor. Exempel på modeller som undersöks för aktie- och forwardränteprocesser under arbetet är Bates modell, Libor market modellen samt en diskret modell som baseras på mixade normal log-normala slumpvariabler. Dessa modeller har olika fördelningar och attribut. Slutligen fastställs att modellen med mixade normal log-normala slumpvariabler är bäst lämpad för modellering av aktie-, obligations- samt skuldprocessen. Priset av balansräkningsoptionen bestäms genom att använda quasi-Monte Carlo simuleringar. Priset presenteras i förhållande till det initiala värdet av det egna kapitalet för olika portföljer med olika initiala solvenskvoter och för olika lägre solvensgränser. Det visar sig att priset som en funktion av olika lägre solvensgränser har ett exponentiellt utseende och varierar beroende på vilken portfölj som studeras, den initiala solvensnivån samt vad den lägre solvensgränsen är. Det konkluderas att priset konvergerar med tillräcklig noggrannhet. Den konstruerade modellen påvisar att det uppstår en kostnad av att förbli ovan en viss solvenskvot samt att denna kostnad är av betydelse för pensionsbolaget. Ett förslag på förbättring av detta arbete är att validera den konstruerade modellen med andra möjliga modeller.
APA, Harvard, Vancouver, ISO, and other styles
31

Mattsson, Johan. "Constructing Residential Price Property Indices Using Robust and Shrinkage Regression Modelling." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252555.

Full text
Abstract:
This thesis intends to construct and compare multiple Residential Price Property Indices (RPPI) with the aim to express the price development of houses in Stockholm county from January 2013 to September 2018. The index method used is the hedonic time dummy variable method. Different methods of imputation of missing data will be applied and new variables will be derived from the available data in order to develop various regression models. Observations judged as not part of the index's target population will be excluded to improve the quality of the training data. The indices will be computed by fitting the final model with OLS regression (as a benchmark), Huber regression, Tukey regression, Ridge regression as well as least-angle regression. Lastly, the obtained indices will be assessed by analyzing different measures of performance when included in \textit{Booli}'s valuation engine. The main result of this thesis is that a specific regression model is produced and that it is concluded that Huber regression slightly outperforms the other methods.<br>Denna uppsats ämnar att konstruera och jämföra flera prisindex för hus med syftet att beskriva prisutvecklingen i Stockholms län från januari 2013 till september 2018. Indexmetoden som tillämpas är den hedoniska time dummy variabel metoden. Olika tillvägagångssätt för imputering av saknade värden används och nya variabler härleds för att skapa diverse regressionsmodeller. Observationer som ej anses representera indexets målgrupp utesluts för att således förbättra kvalitén på träningsdatan. Indexen beräknas genom att passa den slutgiltiga modellen med OLS regression (som ett riktmärke), Huber regression, Tukey regression, Ridge regression samt least-angle regression. Avslutningsvis utvärderas de erhållna prisindexen genom att analysera dess prestanda när de implementeras i \textit{Boolis} värderingsmotor. Huvudresultatet av denna uppsats är att en specifik regressionsmodell tas fram och att det fastslås att Huber regression aningen överträffar de
APA, Harvard, Vancouver, ISO, and other styles
32

Strandberg, Rickard, and Johan Låås. "A comparison between Neural networks, Lasso regularized Logistic regression, and Gradient boosted trees in modeling binary sales." Thesis, KTH, Optimeringslära och systemteori, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252556.

Full text
Abstract:
The primary purpose of this thesis is to predict whether or not a customer will make a purchase from a specific item category. The historical data is provided by the Nordic online-based IT-retailer Dustin. The secondary purpose is to evaluate how well a fully connected feed forward neural network performs as compared to Lasso regularized logistic regression and gradient boosted trees (XGBoost) on this task. This thesis finds XGBoost to be superior to the two other methods in terms of prediction accuracy, as well as speed.<br>Det primära syftet med denna uppsats är att förutsäga huruvida en kund kommer köpa en specifik produkt eller ej. Den historiska datan tillhandahålls av den Nordiska internet-baserade IT-försäljaren Dustin. Det sekundära syftet med uppsatsen är att evaluera hur väl ett djupt neuralt nätverk presterar jämfört med Lasso regulariserad logistisk regression och gradient boostade träd (GXBoost). Denna uppsats fann att XGBoost presterade bättre än de två andra metoderna i såväl träffsäkerhet, som i hastighet.
APA, Harvard, Vancouver, ISO, and other styles
33

Jennerot, Mikaela. "Modeling Customer Behavior of Non-Maturity Deposits." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252567.

Full text
Abstract:
The modeling of non-maturity deposits has become a highly relevant subject in the financial sector since these instruments constitute a significant portion of banks’ funding. A non-maturity deposit may look relatively simple, however, it has features that complicate the handling of these products. This thesis has the purpose of building a model based on the identification, integration and significance level of factors that influence customer behavior related to non-maturity deposits. Moreover, a mathematical approach based on a selection of these factors is made with the aim to analyze client behavior related to these products. The developed model uses simple linear regression and multiple linear regression with dummy variables to model long-term behavior. In contrast to the statistical methods that banks typically apply in this context, this thesis can contribute to the modeling of non-maturity deposits by highlighting customer behavior. Although, the evaluation of the mathematical approach indicates that the model might not be appropriate to use in real practice, it may arise ideas of alternative methods for the handling of non-maturity deposits.<br>Modellering av icke tidsbunden inlåning har blivit ett väldigt aktuellt ämne i den finansiella sektorn eftersom dessa instrument utgör en betydande del av bankers finansiering. Icke tidsbunden inlåning kan verka simpelt, dock finns det egenskaper hos denna som visat sig komplicera hanteringen av dessa produkter. Syftet med denna avhandling är att bygga en modell baserat på identifikation, integration och signifikansnivå av faktorer som påverkar kundbeteende relaterat till icke tidsbunden inlåning. Därtill, en matematisk modell baserat på ett urval av dessa faktorer är konstruerad med målet att analysera kunders beteende relaterat till dessa produkter. Den utvecklade modellen använder enkel linjär regression och multipel linjär regression med dummy variabler för att modellera långsiktigt beteende. I kontrast till de statistiska metoder som banker oftast tillämpar i detta sammanhang, kan denna studie bidra till modelleringen av icke tidsbunden inlåning genom att belysa kundbeteende. Trots att resultaten indikerar att den matematiska modellen kanske inte är lämplig att använda i praktiken, kan detta ge idéer på alternativa metoder för hanteringen av icke tidsbunden inlåning.
APA, Harvard, Vancouver, ISO, and other styles
34

Maupin, Thomas. "Can Bitcoin, and other cryptocurrencies, be modeled effectively with a Markov-Switching approach?" Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252569.

Full text
Abstract:
This research is an attempt at deepening the understanding of hyped cryptocurrencies. A deductive nature is used where we attempt to estimate the linear dependencies of cryptocurrencies with four different time series models. Investigating linear dependencies of univariate time series offers the reader an understanding on how previous prices of cryptocurrencies affect future prices. The linear interdepencies for a multivariate scenario will provide an apprehension on how, and if, the cryptocurrency market is correlated. The dataset used consists of the prices between January 1, 2016 to March 31, 2019 of the four cryptocurrency rivals: Bitcoin, Ethereum, Ripple and Litecoin. The modeling is performed by using autoregression and fitting on 80% of the data. Thereafter, the models are forecasted on the last 20% of the data in order to test the accuracy of the model. The four types of model are used in this thesis and are named by the abbreviations AR(p), MSAR(p), VAR(p) and MSVAR(p) where AR(p) represents an autoregressive model of order p; MSAR(p) represents a Markov-Switching autoregressive model of order p; VAR(p) represents a multivariate model for of the AR(p) also known as the vector autoregressive model of order p; finally MSVAR(p) stands for a Markov-Switching vector autoregressive model of order p. As cryptocurrencies are said to be very volatile, we hope that the Markov-Switching approach would help to classify the level of volatility into different regimes. Further, we anticipate that the fitted time series for each regime will offer a greater accuracy than the regular AR(p) and VAR(p) models. By using scale-dependent error estimators, the thesis concludes that the Markov-Switching approach does in fact improve the efficiency of chosen time series models for our cryptocurrencies.<br>Denna forskning är ett försök att fördjupa förståelsen för välkända kryptovalutor. En deduktiv forskningsmetodik används där vi försöker att uppskatta de linjära beroendena av kryptovalutor med fyra olika tidsseriemodeller. En undersökning på linjära beroenden av univariata tidsserier ger läsaren en förståelse för hur tidigare priser på kryptovalutor påverkar framtida priser. För det multivariata fallet kommer vi försöka att uppskatta korrelationen inom kryptomarknaden. Den datan som används består av priserna mellan 1 januari 2016 och 31 mars 2019 av de följande kryptovalutor: Bitcoin, Ethereum, Ripple och Litecoin. Modelleringen utförs genom att använda autoregression på 80% av datan. Därefter prognostiseras modellerna för de sista 20% av datan för att testa modellens noggrannhet. De fyra typer av modeller som används är: AR(p) som representerar en autoregressiv modell av ordning p; MSAR(p) som representerar en Markov-Switching autoregressiv modell av ordning p; VAR(p) som representerar en AR(p) modell i ett multivariat fall, som kallas för en vektor autoregressiv modell av ordning p; MSVAR(p) som representerar en Markov-Switching vektor autoregressiv modell av ordning p. Eftersom kryptovalutor sägs vara väldigt volatila hoppas vi på att Markov-Switching metoden skulle bidra till att klassificera volatiliteten i olika regimer. Genom klassificeringen hoppas vi på att de anpassade tidsserierna kommer att ge större noggrannhet för varje regim än de vanliga AR(p) och VAR(p) modellerna. Med två skalaberoende feltyper drar vi slutsatsen att Markov-Switching metoden faktiskt förbättrar effektiviteten hos de valda tidsseriemodellerna för våra kryptovalutor.
APA, Harvard, Vancouver, ISO, and other styles
35

Blazevic, Darko, and Fredrik Marcusson. "Volatility Evaluation Using Conditional Heteroscedasticity Models on Bitcoin, Ethereum and Ripple." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252570.

Full text
Abstract:
This study examines and compares the volatility in sample fit and out of sample forecast of four different heteroscedasticity models, namely ARCH, GARCH, EGARCH and GJR-GARCH applied to Bitcoin, Ethereum and Ripple. The models are fitted over the period from 2016-01-01 to 2019-01-01 and then used to obtain one day rolling forecasts during the period from 2018-01-01 to 2019-01-01. The study investigates three different themes consisting of the modelling framework structure, complexity of models and the relation between a good in sample fit and good out of sample forecast. AIC and BIC are used to evaluate the in sample fit while MSE, MAE and R2LOG are used as loss functions when evaluating the out of sample forecast against the chosen Parkinson volatility proxy. The results show that a heavier tailed reference distribution than the normal distribution generally improves the in sample fit, while this generality is not found for the out of sample forecast. Furthermore, it is shown that GARCH type models clearly outperform ARCH models in both in sample fit and out of sample forecast. For Ethereum, it is shown that the best fitted models also result in the best out of sample forecast for all loss functions, while for Bitcoin non of the best fitted models result in the best out of sample forecast. Finally, for Ripple, no generality between in sample fit and out of sample forecast is found.<br>Den här rapporten undersöker om bättre anpassade volatilitetsmodeller leder till bättre prognoser av volatiliteten för olika heteroskedastiska modeller, i detta fall ARCH, GARCH, EGARCH och GJR-GARCH, med olika innovationsdistributioner. Modellerna anpassas för Bitcoin, Ethereum och Ripple under 2016-01-01 till 2017-01-01 och därefter görs endagsprognoser under perioden 2018-01-01 till 2018-12-31. Studien undersöker tre olika teman bestående av modellstruktur, komplexitet av modeller och relationen mellan en god passning och god prognos. För att evaluera passningen för modellerna används AIC och BIC och för prognoserna används förlustfunktionerna MSE, MAE och R2log som evaluering av prognosen mot den valda volatilitetsproxyn Parkinson. Resultaten visar på att innovationsdistributioner med tyngre svansar än normalfördelningen generellt leder till bättre passning, medan man för prognoserna inte kan dra en sådan slutsats. Vidare visas det att GARCH-modellerna påvisade bättre resultat både för passning och prognoser än dem mer simpla ARCH-modellerna. För Ethereum var samma modell bäst för samtliga förlustfunktioner medan Bitcoin visar olika modeller för respektive förlustfunktion. För Ripple kan inte heller någon generalitet påvisas mellan passning och prognoser.
APA, Harvard, Vancouver, ISO, and other styles
36

Hedblom, Edvin, and Rasmus Åkerblom. "Debt recovery prediction in securitized non-performing loans using machine learning." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252311.

Full text
Abstract:
Credit scoring using machine learning has been gaining attention within the research field in recent decades and it is widely used in the financial sector today. Studies covering binary credit scoring of securitized non-performing loans are however very scarce. This paper is using random forest and artificial neural networks to predict debt recovery for such portfolios. As a performance benchmark, logistic regression is used. Due to the nature of high imbalance between the classes, the performance is evaluated mainly on the area under both the receiver operating characteristic curve and the precision-recall curve. This paper shows that random forest, artificial neural networks and logistic regression have similar performance. They all indicate an overall satisfactory ability to predict debt recovery and hold potential to be implemented in day-to-day business related to non-performing loans.<br>Bedömning av kreditvärdighet med maskininlärning har fått ökad uppmärksamhet inom forskningen under de senaste årtiondena och är ofta använt inom den finansiella sektorn. Tidigare studier inom binär klassificering av kreditvärdighet för icke-presterande lånportföljer är få. Denna studie använder random forest och artificial neural networks för att prediktera återupptagandet av lånbetalningar för sådana portföljer. Som jämförelse används logistisk regression. På grund av kraftig obalans mellan klasserna kommer modellerna att bedömas huvudsakligen på arean under reciever operating characteristic-kurvan och precision-recall-kurvan. Denna studie visar på att random forest, artificial neural networks och logistisk regression presterar likartat med överlag goda resultat som har potential att fördelaktigt implementeras i praktiken.
APA, Harvard, Vancouver, ISO, and other styles
37

Foa', Alessandro. "Object Detection in Object Tracking System for Mobile Robot Application." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252561.

Full text
Abstract:
This thesis work takes place at the Emerging Technologies department of Volvo Construction Equipment(CE), in the context of a larger project which involves several students. The focus is a mobile robot built by Volvo for testing some AI features such as Decision Making, Natural Language Processing, Speech Recognition, Object Detection. This thesis will focus on the latter. During last 5 years researchers have built very powerful deep learning object detectors in terms of accuracy and speed. This has been possible thanks to the remarkable development of Convolutional Neural Networks as feature extractors for Image Classification. The purpose of the report is to give a broad view over the state-of-the-art literature of Object Detection, in order to choose the best detector for the robot application Volvo CE is working with, considering that the robot's real-time performance is a priority goal of the project. After comparing the different methods, YOLOv3 seems to be the best choice. Such framework will be implemented in Python and integrated with an object tracking system which returns the 3D position of the objects of interest. The result of the whole system will be evaluated in terms of speed and precision of the resulting detection of the objects.<br>Detta arbete utförs hos Emerging Technologies på Volvo Construction Equipment(CE) i ett stort projekt som involverar flera studenter. Arbetes fokus är att använda en robot skapad av Volvo för att testa olika AI tekniker såsom beslutsfattandeg, naturlig språkbehandling, taligenkänning, objektdetektering. Denna uppsats kommer att behandla den sistnämnda tekniken. Under de 5 senaste åren har forskning visat att det är möjligt att bygga kraftfulla deep learning object detectors vad gäller att korrekt identifera samt snabbt detektera objekt. Allt detta är möjligt tack vare ramverket Convolutional Neural Networks som agerar som feature extractors för Image Classification. Målet med denna rapport är att ge en generell överblick över det senaste inom objektdetektering för att på så sätt välja den mest lämpliga metoden att implementera på en robot hos Volvo CE. Att ta hänsyn till realtidspresetanda är ett av målen med projeketet. Efter att ha utvärderat olika metoder valdes YOLOv3. Detta ramverk implmenterades med Python och integrerades med ett objektidentiferingssystem vilket retunerar en position i tre dimentioner. Hela systemet kommer att utvärderas med hänsyn till hastighet och presition.
APA, Harvard, Vancouver, ISO, and other styles
38

Saive, Yannick. "DirCNN: Rotation Invariant Geometric Deep Learning." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252573.

Full text
Abstract:
Recently geometric deep learning introduced a new way for machine learning algorithms to tackle point cloud data in its raw form. Pioneers like PointNet and many architectures building on top of its success realize the importance of invariance to initial data transformations. These include shifting, scaling and rotating the point cloud in 3D space. Similarly to our desire for image classifying machine learning models to classify an upside down dog as a dog, we wish geometric deep learning models to succeed on transformed data. As such, many models employ an initial data transform in their models which is learned as part of a neural network, to transform the point cloud into a global canonical space. I see weaknesses in this approach as they are not guaranteed to perform completely invariant to input data transformations, but rather approximately. To combat this I propose to use local deterministic transformations which do not need to be learned. The novelty layer of this project builds upon Edge Convolutions and is thus dubbed DirEdgeConv, with the directional invariance in mind. This layer is slightly altered to introduce another layer by the name of DirSplineConv. These layers are assembled in a variety of models which are then benchmarked against the same tasks as its predecessor to invite a fair comparison. The results are not quite as good as state of the art results, however are still respectable. It is also my belief that the results can be improved by improving the learning rate and its scheduling. Another experiment in which ablation is performed on the novel layers shows that the layers  main concept indeed improves the overall results.<br>Nyligen har ämnet geometrisk deep learning presenterat ett nytt sätt för maskininlärningsalgoritmer att arbeta med punktmolnsdata i dess råa form.Banbrytande arkitekturer som PointNet och många andra som byggt på dennes framgång framhåller vikten av invarians under inledande datatransformationer. Sådana transformationer inkluderar skiftning, skalning och rotation av punktmoln i ett tredimensionellt rum. Precis som vi önskar att klassifierande maskininlärningsalgoritmer lyckas identifiera en uppochnedvänd hund som en hund vill vi att våra geometriska deep learning-modeller framgångsrikt ska kunna hantera transformerade punktmoln. Därför använder många modeller en inledande datatransformation som tränas som en del av ett neuralt nätverk för att transformera punktmoln till ett globalt kanoniskt rum. Jag ser tillkortakommanden i detta tillgångavägssätt eftersom invariansen är inte fullständigt garanterad, den är snarare approximativ. För att motverka detta föreslår jag en lokal deterministisk transformation som inte måste läras från datan. Det nya lagret i det här projektet bygger på Edge Convolutions och döps därför till DirEdgeConv, namnet tar den riktningsmässiga invariansen i åtanke. Lagret ändras en aning för att introducera ett nytt lager vid namn DirSplineConv. Dessa lager sätts ihop i olika modeller som sedan jämförs med sina efterföljare på samma uppgifter för att ge en rättvis grund för att jämföra dem. Resultaten är inte lika bra som toppmoderna resultat men de är ändå tillfredsställande. Jag tror även resultaten kan förbättas genom att förbättra inlärningshastigheten och dess schemaläggning. I ett experiment där ablation genomförs på de nya lagren ser vi att lagrens huvudkoncept förbättrar resultaten överlag.
APA, Harvard, Vancouver, ISO, and other styles
39

Brynolfsson, Borg Andreas. "Non-Contractual Churn Prediction with Limited User Information." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252343.

Full text
Abstract:
This report compares the effectiveness of three statistical methods for predicting defecting viewers in SVT's video on demand (VOD) services: logistic regression, random forests, and long short-term memory recurrent neural networks (LSTMs). In particular, the report investigates whether or not sequential data consisting of users' weekly watch histories can be used with LSTMs to achieve better predictive performance than the two other methods. The study found that the best LSTM models did outperform the other methods in terms of precision, recall, F-measure and AUC – but not accuracy. Logistic regression and random forests offered comparable performance results. The models are however subject to several notable limitations, so further research is advised.<br>Den här rapporten undersöker effektiviteten av tre statistiska metoder för att förutse tittaravhopp i SVT:s playtjänster: logistisk regression, random forests och rekurrenta neurala nätverk av varianten long short-term memory (LSTM:s). I synnerhet försöker studien utröna huruvida sekventiell data i form av tittares veckovisa besökshistorik kan användas med LSTM:s för att nå bättre prediktionsprestanda än de övriga två metoderna. Studien fann att LSTM-modeller genererade bättre precision, täckning, F-mått och AUC – men inte träffsäkerhet. Prestandan av logistisk regression och random forests visade sig vara jämförbara. På grund av modellernas många begränsningar finns det dock gott om utrymme för vidare forskning och utveckling.
APA, Harvard, Vancouver, ISO, and other styles
40

Nam, Hyunjin. "Predicting Diabetes Using Tree-based Methods." Thesis, Uppsala universitet, Statistiska institutionen, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385358.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Bergquist, Emanuel, and Gustav Thunström. "Propensity score matchning för estimering av en marginell kausal effekt med matchat fall-kontrolldata." Thesis, Umeå universitet, Statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160784.

Full text
Abstract:
När  en  fall-kontrollstudie  har  genomförts  kan  det  vara  av  intresse  att  genomföra en sekundär analys som studerar fall-kontrollstudiens utfalls effekt på någon annan variabel i populationen. I dessa fall ses fall-kontrollstudiens utfall som en behandling i sekundäranalysen och denna variabels effekt på ett nytt utfall undersöks. I observationsstudier baserade på fall-kontrolldata existerar ofta systematiska skillnader mellan fall- och kontrollgruppen. Om dessa  skillnader  i  bakgrundsvariabler  mellan  grupperna påverkar  både  behandlingen och utfallet kommer det att skapa bias i skattningen av den kausala effekten. Ett sätt att kontrollera för dessa bakgrundsvariabler är genom att matcha på propensity score. Denna  uppsats  består  av  en  simuleringsstudie  där  den  kausala  effekten  på utfallet för de behandlade skattas med hjälp av propensity score matchning i  en  sekundäranalys  av  matchat  fall-kontrolldata.  Syftet är  att  undersöka matchingsestimatorns egenskaper när individernas propensity score skattas med  en  viktad  logistisk  regressionsmodell  gentemot  när  individernas  propensity score skattas med en logistisk regressionsmodell utan vikter. Viktad logistisk regressionsmodell innebär att en behandlings sanna prevalens i populationen  och  populationens subgrupper är  känd  och  inkluderas  i  modellen, vilket resulterar i att skattningar av propensity score kommer att vara väntevärdesriktiga. I den logistiska regressionmodellen utan vikter inkluderas inte den sanna prevalensen när propensity score ska skattas och skattningarna av propensity score kommer inte vara väntevärdesriktiga. Egenskaper som jämförs är bias, standardavvikelse och MSE. Resultatet av uppsatsen visade ingen minskning av MSE när prevalensen avbehandlingen i populationen inkluderades vid skattningen av observationernas propensity score. Estimatorn där behandlingens prevalens inte inkluderades vid skattningen av observationernas propensity score resulterade i lägre bias och MSE, men högre standardavvikelse. Båda estimatorernas bias gickmot noll när stickprovstorleken ökade.
APA, Harvard, Vancouver, ISO, and other styles
42

Wallmark, Joakim. "Selection bias when estimating average treatment effects in the M and butterfly structures." Thesis, Umeå universitet, Statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160792.

Full text
Abstract:
Due to a phenomenon known as selection bias, the estimator of the average treatmen teffect (ATE) of a treatment variable on some outcome may be biased. Selection bias, caused by exclusion of possible units from the studied data, is a major obstacle to valid statistical and causal inferences. It is hard to detect in experimental or observational studies and is introduced when conditioning a sample on a common collider of the treatment and response variables. A certain type of selection bias known as M-Bias occurs when conditioning on a pretreatment variable that is part of a particular variable structure, the M structure. In this structure, the collider has no direct causal association with the treatment and outcome variables, but it is indirectly associated with both through ancestors. In this thesis, scenarios where potential M-bias arises were examined in a simulation study. The percentage of bias relative to the true ATE was estimated for each of the scenarios. A continuous collider variable was used and samples were conditioned to only include units with values on the collider variable above a certain cutoff value.T he cutoff value was varied to explore the relationship between the collider and theresulting bias. A variation of the M structure known as the butterfly structure was also studied in a similar fashion. The butterfly structure is known to result in confounding bias when not adjusting for said collider but selection bias when adjustment is done. The results show that selection bias is relatively small compared to bias originating from confounding in the butterfly structure. Increasing the cutoff level in this structure substantially decreases the overall bias of the ATE in almost all of the explored scenarios. The bias was smaller in the M structure than in the butterfly structure in close to all scenarios. For the M structure, the bias was generally smaller for higher cutoff values and insubstantial in some scenarios. This occurred because in most of the studied scenarios, a large proportion of the variance of the collider was explained by binary ancestors of said collider. When these ancestors are the primary causes of the collider, increasing the cutoff to a high enough value causes adjustment for the ancestors. Adjusting for these ancestors will in turn d-separate the treatment and the outcome which results in an unbiased estimator of the ATE. When conducting studies in pratice, the possibility of selection bias should be taken into consideration. Even though this type of bias is usually small even whe nthe causal effects between involved variables are strong, it can still be significant and an unbiased estimator cannot be taken for granted in the presence of sample selection.
APA, Harvard, Vancouver, ISO, and other styles
43

Lidman, Julia, and Roma Goussakov. "Analys av bortfall i Betula-studien : Faktorer som påverkar bortfall i en studie om åldrande & minne." Thesis, Umeå universitet, Statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160795.

Full text
Abstract:
Den longitudinella Betula-studien studerar relationen mellan minne och åldrande. Denna studie undersöker, med hjälp av givet data, om det finns skillnad mellan de individer som föll bort och de som återkom, samt vilka egenskaper som kan ha påverkat bortfallet. Datamaterialet är uppbyggt på ett stickprov av 176 deltagare från Betulas femte undersökningstillfälle, varav 70 individer var bortfall, de som ej deltog vid det sjätte undersökningstillfället, medan 106 individer återkom. Med hjälp av Little ́s MCAR-test framkom bevis för att kunna förkasta att data från det sjätte tillfället saknades slumpmässigt. Detta pekade på att bortfallet var påverkad av faktorer i antingen det givna eller icke-existerande datamaterialet. Sedan användes klassificeringsmetoder logistisk regression och random forest för att undersöka vilka faktorer, från det givna datamaterialet, som kan ha påverkat utfallet. Resultaten från den logistiska regressionsmodellen visade på att män hade större odds att falla bort från studien än kvinnor och att högre poäng på tester som undersöker episodiskt minne och bearbetningshastighet minskade oddset. Resultatet från analys med random forest pekade på att de som föll bort och de som återkom skiljde sig åt i variabeln volymen av grå substans i hippocampus samt i fyra kognitionstester. Två av testerna, som undersöker episodiskt minne och bearbetningshastighet, visade betydelse med båda modellerna. Testresultat från block design och ordflöde visade sig endast ha betydelse med random forest.
APA, Harvard, Vancouver, ISO, and other styles
44

Edlund, Jessica. "Can the effect of income on survival after stroke be explained by access to secondary prevention? : A mediation analysis on data from the Swedish stroke register." Thesis, Umeå universitet, Statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160797.

Full text
Abstract:
In Sweden, research has shown that socially underprivileged groups have poorer access to stroke care, both in the acute stage and secondary prevention after stroke, and are more likely to have adverse outcomes. The aim of this thesis is to study the causal mechanisms behind the association between low income and death after having a stroke. More specifically, to what extent is the effect of income on death mediated through treatment according to guidelines? To do this, mediation analysis have been applied to a data material from Riksstroke, the Swedish stroke register. The results of a mediation analysis rely on confounding assumptions that cannot be verified using observed data and it is important to quantify the effects of violations. Sensitivity analysis has therefore been applied to investigate how sensitive the results are to unobserved confounding. The results show that a small part of the effect of having low income on the probability of death 29 days to 1 year after stroke is mediated by treatment according to guidelines. This effect is significant positive for the study population. The same results were shown for patients with high risk of dying after stroke. However, there were no evidence of a mediated effect for patients with low risk of dying after stroke. The sensitivity analyses indicate that the estimated effects for the population are non-significant or reversed for certain levels of unobserved confounding. This must be considered when interpreting the results.<br>Forskning har visat att socialt underpriviligerade grupper i Sverige har sämre tillgång till strokevård, både i akutskedet och de sekundärpreventiva vårdinsatserna efter stroke. De har också större risk att avlida. Syftet med denna studie är att undersöka de kausala mekanismerna bakom sambandet mellan låg inkomst och död efter stroke. Mer specifikt är det av intresse att undersöka till vilken grad effekten av inkomst på död medieras genom behandling enligt riktlinjer. För att undersöka detta har mediationsanalys applicerats på ett datamaterial från Riksstroke. Estimerade mediationseffekter bygger på starka antaganden om confounding som inte går inte att verifiera genom observerat data. Sensitivitsanalys har därför använts för att undersöka hur känsliga resultaten är för icke-observerad confounding. Resultaten visar att en liten del av effekten av låg inkomst på död 29 dagar till 1 år efter stroke medieras av behandling enligt riktlinjer. Effekten är positiv och signifikant för hela stickprovet. För patienter med hög risk att dö efter stroke visas också en signifikant positiv medierad effekt. För patienter med låg risk att dö efter stroke fanns inga bevis för en medierad effekt. Sensitivitsanalysen indikerar att de estimerade effekterna för hela stickprovet är icke-signifikanta eller omvända för specifika nivåer av icke-observerad confounding. Detta måste övervägas vid tolkning av resultaten.
APA, Harvard, Vancouver, ISO, and other styles
45

Lövgren, Andreas, and Joakim Strandberg. "Jämförande av risk för omoperation mellan två operationsmetoder vid ljumskbråck : En tillämpning av överlevnadsanalys." Thesis, Umeå universitet, Statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160809.

Full text
Abstract:
När män primäropereras för ljumskbråck är praxis att göra det med metoden öppet nät. Om de senare behöver opereras om kan inte öppet nät användas igen utan då används vanligen titthålsmetoder. Vissa blir dock primäropererade med titthålsmetoder. Ungefär en tiondel av alla bråckoperationer i Sverige är en omoperation. Med hjälp av data från Svenskt Bråckregister undersöker denna studie om det finns någon skillnad i risk för omoperation beroende på operationsmetod. För att undersöka det används överlevnadsanalys där hazard ratio för operationsmetod är av intresse. En Cox Proportional Hazard-modell skattades och proportional hazard antagandet kontrollerades. Då proportional hazard ej ansågs uppfyllt för Cox PH skattades istället en Extended Cox model med två heaviside-funktioner. Resultatet av studien är att titthål har 3.3 gånger så hög hazard gentemot öppet nät under de första 460 dagarna efter primäroperationen, medan titthål har 1.4 gånger så hög hazard gentemot öppet nät efter 460 dagar.
APA, Harvard, Vancouver, ISO, and other styles
46

Öhman, Oscar. "Rating corrumption within insurance companies using Bayesian network classifiers." Thesis, Umeå universitet, Statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160810.

Full text
Abstract:
Bayesian Network (BN) classifiers are a type of probabilistic models. The learning process consists of two steps, structure learning and parameter learning. Four BN classifiers will be learned. These are two different Naive Bayes classifiers (NB), one Tree Augmented Naive Bayes classifier (TAN) and one Forest Naive Bayes classifier (FAN). The NB classifiers will utililize two different parameter learning techniques, which are generative learning and discriminative learning. Generative learning uses maximum likelihood estimation (MLE) to optimize the parameters, while discriminative learning uses conditional likelihood estimation (CLE). The latter is more appropriate given the target at hand, while the former is less complicated. These four models are created in order to find the model best suited for predicting/rating the corruption levels of different insurance companies, given their features. Multi-class Area under the receiver operating characteristic (ROC) curve (AUC), as well as accuracy, is used in order to compare the predictive performances of the models. We observe that the classifiers learnt by generative parameter learning performed remarkably well, even outperforming the NB classifier with discriminative parameter learning. But unfortunately, this might imply an optimization issue when learning the parameters discriminately. Another unexpected result was that the CL-TAN classifier had the highest multi-class AUC, even though FAN is supposed to be an upgrade of CL-TAN. Further, the generatively learned NB performed about as good as the other two generative classifiers, which was also unexpected.<br>Bayesianska nätverk (BN) är en typ av sannolikhetsmodell som används för klassificering. Inlärningsprocessen av en sådan modell består av två steg, strukturinlärning ochparameterinlärning. Fyra olika BN-klassificerare kommer att skattas. Dessa är två stycken Naive Bayes-klassificerare (NB), en Tree augmented naive Bayes-klassificerare (TAN) och enForest augmented naive Bayes-klassificerare (FAN). De två olika NB-klassificerarna kommer att skilja sig åt i att den ena använder sig av generativ parameterskattning, medan den andra använder sig av diskriminativ parameterinlärning. Chow och Lius (CL) berömda algoritm, där det ingår att beräkna betingad ömsesidig information (CMI), brukar ofta användas för att hitta den optimala trädstrukturen. Denna variant av TAN är känd som CL-TAN. FAN är en annan slags uppgradering av NB, som kan anses vara en förstärkt variant av CL-TAN, där förklaringsvariablerna är kopplade till varandra på ett sätt som ger en skogs-liknande struktur. De två olika parameterinlärningsmetoderna som används är generativ inlärning och diskriminativ inlärning. Den förstnämnda använder sig av maximum likelihood-skattning (MLE) för att optimera parametrarna. Detta är smidigt, men samtidigt skattas inte det som avsetts. Den sistnämnda metoden använder sig istället av betingad maximum likelihood-skattning (CLE), vilket ger en mer korrekt, men också mer komplicerad, skattning. Dessa sex modeller kommer att tränas i syfte att hitta den modellsom bäst skattar korruptionsnivåerna inom olika försäkringsbolag, givet dess egenskaper iform av förklaringsvariabler. En multiklassvariant av Area under the reciever operatingcharacteristics (ROC) curve (AUC) används för att bedöma skattningsprecisionen för varjemodell. Analysen resulterade i anmärkningsvärda resultat för de generativa modellerna,som med goda marginaler skattade mer precist än den diskriminativa NB-modellen.Tyvärr kan detta dock vara en indikation på optimeringsproblem vid de diskriminativa parameterinlärningen av NB. Ett annat anmärkningsvärt resultat var att av samtliga generativa modeller, så var CL-TAN den modellen med högst AUC, trots att FAN i teorinska vara en förbättrad variant av CL-TAN. Även den generativa NB-modellens resultat var anmärkningsvärd, då denna modell hade nästan lika hög AUC som de generativa CL-TAN och FAN-modellerna.
APA, Harvard, Vancouver, ISO, and other styles
47

Warnqvist, Anna. "TREATMENT EXPECTATIONS AND THEIRIMPLICATIONS FOR LUMBAR FUSION SURGERY ON CHRONIC BACK PAIN." Thesis, Uppsala universitet, Statistiska institutionen, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385506.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Motzi, Edward. "En enkel modell i utslagningsturneringar." Thesis, Uppsala universitet, Analys och sannolikhetsteori, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-388122.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Norrman, Michaela, and Lina Hahlin. "Hur tänker Instagram? : En statistisk analys av två Instagramflöden." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-388141.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Örneholm, Filip. "Anomaly Detection in Seasonal ARIMA Models." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-388503.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!