Dissertations / Theses on the topic 'High-Dimensional Regression'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'High-Dimensional Regression.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Fang, Zhou. "Reweighting methods in high dimensional regression." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:26f8541a-9e2d-466a-84aa-e6850c4baba9.
Full textMeier, Lukas Dieter. "High-dimensional regression problems with special structure /." Zürich : ETH, 2008. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=18129.
Full textHashem, Hussein Abdulahman. "Regularized and robust regression methods for high dimensional data." Thesis, Brunel University, 2014. http://bura.brunel.ac.uk/handle/2438/9197.
Full textAldahmani, Saeed. "High-dimensional linear regression problems via graphical models." Thesis, University of Essex, 2017. http://repository.essex.ac.uk/19207/.
Full textWang, Tao. "Variable selection and dimension reduction in high-dimensional regression." HKBU Institutional Repository, 2013. http://repository.hkbu.edu.hk/etd_ra/1544.
Full textLee, Wai Hong. "Variable selection for high dimensional transformation model." HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1161.
Full textChen, Xiaohui. "Lasso-type sparse regression and high-dimensional Gaussian graphical models." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/42271.
Full textChen, Chi. "Variable selection in high dimensional semi-varying coefficient models." HKBU Institutional Repository, 2013. https://repository.hkbu.edu.hk/etd_oa/11.
Full textBreheny, Patrick John Huang Jian. "Regularized methods for high-dimensional and bi-level variable selection." Iowa City : University of Iowa, 2009. http://ir.uiowa.edu/etd/325.
Full textVillegas, Santamaría Mauricio. "Contributions to High-Dimensional Pattern Recognition." Doctoral thesis, Universitat Politècnica de València, 2011. http://hdl.handle.net/10251/10939.
Full textVillegas Santamaría, M. (2011). Contributions to High-Dimensional Pattern Recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/10939
Palancia
Luo, Weiqi. "Spatial/temporal modelling of crop disease data using high-dimensional regression." Thesis, University of Leeds, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.493292.
Full textHerath, Herath Mudiyanselage Wiranthe Bandara. "TENSOR REGRESSION AND TENSOR TIME SERIES ANALYSES FOR HIGH DIMENSIONAL DATA." OpenSIUC, 2019. https://opensiuc.lib.siu.edu/theses/2585.
Full textZhang, Yuankun. "(Ultra-)High Dimensional Partially Linear Single Index Models for Quantile Regression." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535703962712806.
Full textAndersson, Niklas. "Regression-Based Monte Carlo For Pricing High-Dimensional American-Style Options." Thesis, Umeå universitet, Institutionen för fysik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-119013.
Full textPrissättning av olika finansiella derivat är en viktig del av den finansiella sektorn. För vissa derivat existerar en sluten lösning, men prissättningen av derivat med hög dimensionalitet och av amerikansk stil är fortfarande ett utmanande problem. Detta projekt fokuserar på derivatet som kallas option och särskilt prissättningen av amerikanska korg optioner, dvs optioner som både kan avslutas i förtid och som bygger på flera underliggande tillgångar. För problem med hög dimensionalitet, vilket definitivt är fallet för optioner av amerikansk stil, är Monte Carlo metoder fördelaktiga. I detta examensarbete har därför regressions baserad Monte Carlo använts för att bestämma avslutningsstrategier för optionen. Den välkända minsta kvadrat Monte Carlo (LSM) algoritmen av Longstaff och Schwartz (2001) har implementerats och jämförts med Robust Regression Monte Carlo (RRM) av C.Jonen (2011). Skillnaden mellan metoderna är att robust regression används istället för minsta kvadratmetoden för att beräkna fortsättningsvärden för optioner av amerikansk stil. Eftersom robust regression är mer stabil mot avvikande värden påstår C.Jonen att denna metod ger bättre skattingar av optionspriset. Det var svårt att jämföra teknikerna utan tillvägagångssättet med dualitet av Andersen och Broadie (2004) därför lades denna metod till. De numeriska testerna indikerar då att avslutningsstrategin som bestämts med RRM producerar en högre undre gräns och en snävare övre gräns jämfört med LSM. Skillnaden mellan övre och undre gränsen kunde vara upp till 4 gånger mindre med RRM. Importance sampling och Quasi Monte Carlo har också använts för att reducera variansen i skattningen av optionspriset och för att påskynda konvergenshastigheten.
Xie, Fang. "ζ1 penalized methods in high-dimensional regressions and its theoretical properties." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3952485.
Full textLo, Shin-Lian. "High-dimensional classification and attribute-based forecasting." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37193.
Full textRatnasingam, Suthakaran. "Sequential Change-point Detection in Linear Regression and Linear Quantile Regression Models Under High Dimensionality." Bowling Green State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu159050606401363.
Full textPettersson, Anders. "High-Dimensional Classification Models with Applications to Email Targeting." Thesis, KTH, Matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-168203.
Full textFöretag kan använda e-mejl för att på ett enkelt sätt sprida viktig information, göra reklam för nya produkter eller erbjudanden och mycket mer, men för många e-mejl kan göra att kunder slutar intressera sig för innehållet, genererar badwill och omöjliggöra framtida kommunikation. Att kunna urskilja vilka kunder som är intresserade av det specifika innehållet skulle vara en möjlighet att signifikant förbättra ett företags användning av e-mejl som kommunikationskanal. Denna studie fokuserar på att urskilja kunder med hjälp av statistisk inlärning applicerad på historisk data tillhandahållen av musikstreaming-företaget Spotify. En binärklassificeringsmodell valdes, där responsvariabeln beskrev huruvida kunden öppnade e-mejlet eller inte. Två olika metoder användes för att försöka identifiera de kunder som troligtvis skulle öppna e-mejlen, logistisk regression, både med och utan regularisering, samt random forest klassificerare, tack vare deras förmåga att hantera högdimensionella data. Metoderna blev sedan utvärderade på både ett träningsset och ett testset, med hjälp av flera olika statistiska valideringsmetoder så som korsvalidering och ROC kurvor. Modellerna studerades under både scenarios med stora stickprov och högdimensionella data. Där scenarion med högdimensionella data representeras av att antalet observationer, N, är av liknande storlek som antalet förklarande variabler, p, och scenarion med stora stickprov representeras av att N ≫ p. Lasso-baserad variabelselektion utfördes för båda dessa scenarion för att studera informationsvärdet av förklaringsvariablerna. Denna studie visar att det är möjligt att signifikant förbättra öppningsfrekvensen av e-mejl genom att selektera kunder, även när man endast använder små mängder av data. Resultaten visar att en enorm ökning i antalet träningsobservationer endast kommer förbättra modellernas förmåga att urskilja kunder marginellt.
Gusnanto, Arief. "Regression on high-dimensional predictor space : with application in chemometrics and microarray data /." Stockholm, 2004. http://diss.kib.ki.se/2004/91-7140-153-9/.
Full textYi, Congrui. "Penalized methods and algorithms for high-dimensional regression in the presence of heterogeneity." Diss., University of Iowa, 2016. https://ir.uiowa.edu/etd/2299.
Full textVasquez, Monica M., and Monica M. Vasquez. "Penalized Regression Methods in the Study of Serum Biomarkers for Overweight and Obesity." Diss., The University of Arizona, 2017. http://hdl.handle.net/10150/625637.
Full textWang, Fan. "Penalised regression for high-dimensional data : an empirical investigation and improvements via ensemble learning." Thesis, University of Cambridge, 2019. https://www.repository.cam.ac.uk/handle/1810/289419.
Full textOhlsson, Henrik. "Regression on Manifolds with Implications for System Identification." Licentiate thesis, Linköping University, Linköping University, Automatic Control, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15467.
Full textThe trend today is to use many inexpensive sensors instead of a few expensive ones, since the same accuracy can generally be obtained by fusing several dependent measurements. It also follows that the robustness against failing sensors is improved. As a result, the need for high-dimensional regression techniques is increasing.
As measurements are dependent, the regressors will be constrained to some manifold. There is then a representation of the regressors, of the same dimension as the manifold, containing all predictive information. Since the manifold is commonly unknown, this representation has to be estimated using data. For this, manifold learning can be utilized. Having found a representation of the manifold constrained regressors, this low-dimensional representation can be used in an ordinary regression algorithm to find a prediction of the output. This has further been developed in the Weight Determination by Manifold Regularization (WDMR) approach.
In most regression problems, prior information can improve prediction results. This is also true for high-dimensional regression problems. Research to include physical prior knowledge in high-dimensional regression i.e., gray-box high-dimensional regression, has been rather limited, however. We explore the possibilities to include prior knowledge in high-dimensional manifold constrained regression by the means of regularization. The result will be called gray-box WDMR. In gray-box WDMR we have the possibility to restrict ourselves to predictions which are physically plausible. This is done by incorporating dynamical models for how the regressors evolve on the manifold.
MOVIII
Minnier, Jessica. "Inference and Prediction for High Dimensional Data via Penalized Regression and Kernel Machine Methods." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10327.
Full textWang, Guoshen. "Analysis of Additive Risk Model with High Dimensional Covariates Using Correlation Principal Component Regression." Digital Archive @ GSU, 2008. http://digitalarchive.gsu.edu/math_theses/51.
Full textSarac, Ferdi. "Development of unsupervised feature selection methods for high dimensional biomedical data in regression domain." Thesis, Northumbria University, 2017. http://nrl.northumbria.ac.uk/36260/.
Full textGu, Chao. "Advancing Bechhofer's Ranking Procedures to High-dimensional Variable Selection." Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1626653022254095.
Full textBreheny, Patrick John. "Regularized methods for high-dimensional and bi-level variable selection." Diss., University of Iowa, 2009. https://ir.uiowa.edu/etd/325.
Full textRen, Sheng. "New Methods of Variable Selection and Inference on High Dimensional Data." University of Cincinnati / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1511883302569683.
Full textZuber, Verena. "A Multivariate Framework for Variable Selection and Identification of Biomarkers in High-Dimensional Omics Data." Doctoral thesis, Universitätsbibliothek Leipzig, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-101223.
Full textMiller, Ryan. "Marginal false discovery rate approaches to inference on penalized regression models." Diss., University of Iowa, 2018. https://ir.uiowa.edu/etd/6474.
Full textMahmood, Nozad. "Sparse Ridge Fusion For Linear Regression." Master's thesis, University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5986.
Full textM.S.
Masters
Statistics
Sciences
Statistical Computing
Liu, Li. "Grouped variable selection in high dimensional partially linear additive Cox model." Diss., University of Iowa, 2010. https://ir.uiowa.edu/etd/847.
Full textSeetharaman, Indu. "Consistent bi-level variable selection via composite group bridge penalized regression." Kansas State University, 2013. http://hdl.handle.net/2097/15980.
Full textDepartment of Statistics
Kun Chen
We study the composite group bridge penalized regression methods for conducting bilevel variable selection in high dimensional linear regression models with a diverging number of predictors. The proposed method combines the ideas of bridge regression (Huang et al., 2008a) and group bridge regression (Huang et al., 2009), to achieve variable selection consistency in both individual and group levels simultaneously, i.e., the important groups and the important individual variables within each group can both be correctly identi ed with probability approaching to one as the sample size increases to in nity. The method takes full advantage of the prior grouping information, and the established bi-level oracle properties ensure that the method is immune to possible group misidenti cation. A related adaptive group bridge estimator, which uses adaptive penalization for improving bi-level selection, is also investigated. Simulation studies show that the proposed methods have superior performance in comparison to many existing methods.
Massias, Mathurin. "Sparse high dimensional regression in the presence of colored heteroscedastic noise : application to M/EEG source imaging." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT053.
Full textUnderstanding the functioning of the brain under normal and pathological conditions is one of the challenges of the 21textsuperscript{st} century.In the last decades, neuroimaging has radically affected clinical and cognitive neurosciences.Amongst neuroimaging techniques, magneto- and electroencephalography (M/EEG) stand out for two reasons: their non-invasiveness, and their excellent time resolution.Reconstructing the neural activity from the recordings of magnetic field and electric potentials is the so-called bio-magnetic inverse problem.Because of the limited number of sensors, this inverse problem is severely ill-posed, and additional constraints must be imposed in order to solve it.A popular approach, considered in this manuscript, is to assume spatial sparsity of the solution: only a few brain regions are involved in a short and specific cognitive task.Solutions exhibiting such a neurophysiologically plausible sparsity pattern can be obtained through L21-penalized regression approaches.However, this regularization requires to solve time-consuming high-dimensional and non-smooth optimization problems, with iterative (block) proximal gradients solvers.% Issues of M/EEG: noise:Additionally, M/EEG recordings are usually corrupted by strong non-white noise, which breaks the classical statistical assumptions of inverse problems. To circumvent this, it is customary to whiten the data as a preprocessing step,and to average multiple repetitions of the same experiment to increase the signal-to-noise ratio.Averaging measurements has the drawback of removing brain responses which are not phase-locked, ie do not happen at a fixed latency after the stimuli presentation onset.%Making it faster.In this work, we first propose speed improvements of iterative solvers used for the L21-regularized bio-magnetic inverse problem.Typical improvements, screening and working sets, exploit the sparsity of the solution: by identifying inactive brain sources, they reduce the dimensionality of the optimization problem.We introduce a new working set policy, derived from the state-of-the-art Gap safe screening rules.In this framework, we also propose duality improvements, yielding a tighter control of optimality and improving feature identification techniques.This dual construction extrapolates on an asymptotic Vector AutoRegressive regularity of the dual iterates, which we connect to manifold identification of proximal algorithms.Beyond the L21-regularized bio-magnetic inverse problem, the proposed methods apply to the whole class of sparse Generalized Linear Models.%Better handling of the noiseSecond, we introduce new concomitant estimators for multitask regression.Along with the neural sources estimation, concomitant estimators jointly estimate the noise covariance matrix.We design them to handle non-white Gaussian noise, and to exploit the multiple repetitions nature of M/EEG experiments.Instead of averaging the observations, our proposed method, CLaR, uses them all for a better estimation of the noise.The underlying optimization problem is jointly convex in the regression coefficients and the noise variable, with a ``smooth + proximable'' composite structure.It is therefore solvable via standard alternate minimization, for which we apply the improvements detailed in the first part.We provide a theoretical analysis of our objective function, linking it to the smoothing of Schatten norms.We demonstrate the benefits of the proposed approach for source localization on real M/EEG datasets.Our improved solvers and refined modeling of the noise pave the way for a faster and more statistically efficient processing of M/EEG recordings, allowing for interactive data analysis and scaling approaches to larger and larger M/EEG datasets
Margevicius, Seunghee P. "Modeling of High-Dimensional Clinical Longitudinal Oxygenation Data from Retinopathy of Prematurity." Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case1523022165691473.
Full textLiley, Albert James. "Statistical co-analysis of high-dimensional association studies." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/270628.
Full textSwinson, Michael D. "Statistical Modeling of High-Dimensional Nonlinear Systems: A Projection Pursuit Solution." Diss., Available online, Georgia Institute of Technology, 2005, 2005. http://etd.gatech.edu/theses/available/etd-11232005-204333/.
Full textShapiro, Alexander, Committee Member ; Vidakovic, Brani, Committee Member ; Ume, Charles, Committee Member ; Sadegh, Nader, Committee Chair ; Liang, Steven, Committee Member. Vita.
Jonen, Christian [Verfasser], Rüdiger [Akademischer Betreuer] Seydel, and Caren [Akademischer Betreuer] Tischendorf. "Efficient Pricing of High-Dimensional American-Style Derivatives : A Robust Regression Monte Carlo Method / Christian Jonen. Gutachter: Rüdiger Seydel ; Caren Tischendorf." Köln : Universitäts- und Stadtbibliothek Köln, 2011. http://d-nb.info/103811179X/34.
Full textHermann, Philipp [Verfasser], and Hajo [Akademischer Betreuer] Holzmann. "High-dimensional, robust, heteroscedastic variable selection with the adaptive LASSO, and applications to random coefficient regression / Philipp Hermann ; Betreuer: Hajo Holzmann." Marburg : Philipps-Universität Marburg, 2021. http://d-nb.info/1236692187/34.
Full textJiang, Wei. "Statistical inference with incomplete and high-dimensional data - modeling polytraumatized patients." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM013.
Full textThe problem of missing data has existed since the beginning of data analysis, as missing values are related to the process of obtaining and preparing data. In applications of modern statistics and machine learning, where the collection of data is becoming increasingly complex and where multiple sources of information are combined, large databases often have an extraordinarily high number of missing values. These data therefore present important methodological and technical challenges for analysis: from visualization to modeling including estimation, variable selection, predictive capabilities, and implementation through implementations. Moreover, although high-dimensional data with missing values are considered common difficulties in statistical analysis today, only a few solutions are available.The objective of this thesis is to provide new methodologies for performing statistical inferences with missing data and in particular for high-dimensional data. The most important contribution is to provide a comprehensive framework for dealing with missing values from estimation to model selection based on likelihood approaches. The proposed method doesn't rely on a specific pattern of missingness, and allows a good balance between quality of inference and computational efficiency.The contribution of the thesis consists of three parts. In Chapter 2, we focus on performing a logistic regression with missing values in a joint modeling framework, using a stochastic approximation of the EM algorithm. We discuss parameter estimation, variable selection, and prediction for incomplete new observations. Through extensive simulations, we show that the estimators are unbiased and have good confidence interval coverage properties, which outperforms the popular imputation-based approach. The method is then applied to pre-hospital data to predict the risk of hemorrhagic shock, in collaboration with medical partners - the Traumabase group of Paris hospitals. Indeed, the proposed model improves the prediction of bleeding risk compared to the prediction made by physicians.In chapters 3 and 4, we focus on model selection issues for high-dimensional incomplete data, which are particularly aimed at controlling for false discoveries. For linear models, the adaptive Bayesian version of SLOPE (ABSLOPE) we propose in Chapter 3 addresses these issues by embedding the sorted l1 regularization within a Bayesian spike-and-slab framework. Alternatively, in Chapter 4, aiming at more general models beyond linear regression, we consider these questions in a model-X framework, where the conditional distribution of the response as a function of the covariates is not specified. To do so, we combine knockoff methodology and multiple imputations. Through extensive simulations, we demonstrate satisfactory performance in terms of power, FDR and estimation bias for a wide range of scenarios. In the application of the medical data set, we build a model to predict patient platelet levels from pre-hospital and hospital data.Finally, we provide two open-source software packages with tutorials, in order to help decision making in medical field and users facing missing values
Klau, Simon [Verfasser], and Anne-Laure [Akademischer Betreuer] Boulesteix. "Addressing the challenges of uncertainty in regression models for high dimensional and heterogeneous data from observational studies / Simon Klau ; Betreuer: Anne-Laure Boulesteix." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2020. http://d-nb.info/1220631884/34.
Full textHuynh, Bao Tuyen. "Estimation and feature selection in high-dimensional mixtures-of-experts models." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC237.
Full textThis thesis deals with the problem of modeling and estimation of high-dimensional MoE models, towards effective density estimation, prediction and clustering of such heterogeneous and high-dimensional data. We propose new strategies based on regularized maximum-likelihood estimation (MLE) of MoE models to overcome the limitations of standard methods, including MLE estimation with Expectation-Maximization (EM) algorithms, and to simultaneously perform feature selection so that sparse models are encouraged in such a high-dimensional setting. We first introduce a mixture-of-experts’ parameter estimation and variable selection methodology, based on l1 (lasso) regularizations and the EM framework, for regression and clustering suited to high-dimensional contexts. Then, we extend the method to regularized mixture of experts models for discrete data, including classification. We develop efficient algorithms to maximize the proposed l1 -penalized observed-data log-likelihood function. Our proposed strategies enjoy the efficient monotone maximization of the optimized criterion, and unlike previous approaches, they do not rely on approximations on the penalty functions, avoid matrix inversion, and exploit the efficiency of the coordinate ascent algorithm, particularly within the proximal Newton-based approach
Escalante, Bañuelos Alberto Nicolás [Verfasser], Laurenz [Gutachter] Wiskott, and Rolf [Gutachter] Würtz. "Extensions of hierarchical slow feature analysis for efficient classsification and regression on high-dimensional data / Alberto Nicolás Escalante Bañuelos ; Gutachter: Laurenz Wiskott, Rolf Würtz." Bochum : Ruhr-Universität Bochum, 2017. http://d-nb.info/1140223186/34.
Full textLannsjö, Fredrik. "Forecasting the Business Cycle using Partial Least Squares." Thesis, KTH, Matematisk statistik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-151378.
Full textPartial Least Squares är både en regressionsmetod och ett verktyg för variabelselektion som är specielltlämpligt för modeller baserade på en stor mängd (möjligtvis korrelerade) variabler.Medan det är en väletablerad modelleringsmetod inom kemimetri, anpassar den häruppsatsen PLS till finansiell data för att förutspå rörelserna av konjunkturen,representerad av OECD's Composite Leading Indicator. Högdimensionella dataanvänds och en model med automatiserad variabelselektion via en genetiskalgoritm utvecklas för att göra en prognos av olika ekonomiska regioner medgoda resultat i out-of-sample-tester
Kim, Byung-Jun. "Semiparametric and Nonparametric Methods for Complex Data." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99155.
Full textDoctor of Philosophy
A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing the following three types of data: (1) matched case-crossover data, (2) HCHD data, and (3) Time-series data. We contribute to the development of statistical methods to deal with such complex data. First, under the matched study, we discuss an idea about hypothesis testing to effectively determine the association between observed factors and risk of interested disease. Because, in practice, we do not know the specific form of the association, it might be challenging to set a specific alternative hypothesis. By reflecting the reality, we consider the possibility that some observations are measured with errors. By considering these measurement errors, we develop a testing procedure under the matched case-crossover framework. This testing procedure has the flexibility to make inferences on various hypothesis settings. Second, we consider the data where the number of variables is very large compared to the sample size, and the variables are correlated to each other. In this case, our goal is to identify important variables for outcome among a large amount of the variables and build their network. For example, identifying few genes among whole genomics associated with diabetes can be used to develop biomarkers. By our proposed approach in the second project, we can identify differentially expressed and important genes and their network structure with consideration for the outcome. Lastly, we consider the scenario of changing patterns of interest over time with application to gas chromatography. We propose an efficient detection method to effectively distinguish the patterns of multi-level subjects in time-trend analysis. We suggest that our proposed method can give precious information on efficient search for the distinguishable patterns so as to reduce the burden of examining all observations in the data.
Gentry, Amanda E. "Penalized mixed-effects ordinal response models for high-dimensional genomic data in twins and families." VCU Scholars Compass, 2018. https://scholarscompass.vcu.edu/etd/5575.
Full textSjödin, Hällstrand Andreas. "Bilinear Gaussian Radial Basis Function Networks for classification of repeated measurements." Thesis, Linköpings universitet, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-170850.
Full textCourtois, Émeline. "Score de propension en grande dimension et régression pénalisée pour la détection automatisée de signaux en pharmacovigilance Propensity Score-Based Approaches in High Dimension for Pharmacovigilance Signal Detection: an Empirical Comparison on the French Spontaneous Reporting Database New adaptive lasso approaches for variable selection in automated pharmacovigilance signal detection." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASR009.
Full textPost-marketing pharmacovigilance aims to detect as early as possible adverse effects of marketed drugs. It relies on large databases of individual case safety reports of adverse events suspected to be drug-induced. Several automated signal detection tools have been developed to mine these large amounts of data in order to highlight suspicious adverse event-drug combinations. Classical signal detection methods are based on disproportionality analyses of counts aggregating patients’ reports. Recently, multiple regression-based methods have been proposed to account for multiple drug exposures. In chapter 2, we propose a signal detection method based on the high-dimensional propensity score (HDPS). An empirical study, conducted on the French pharmacovigilance database with a reference signal set pertaining to drug-induced liver injury (DILIrank), is carried out to compare the performance of this method (in 12 modalities) to methods based on lasso penalized regressions. In this work, the influence of the score estimation method is minimal, unlike the score integration method. In particular, HDPS weighting with matching weights shows good performances, comparable to those of lasso-based methods. In chapter 3, we propose a method based on a lasso extension: the adaptive lasso which allows to introduce specific penalties to each variable through adaptive weights. We propose two new weights adapted to spontaneous reports data, as well as the use of the BIC for the choice of the penalty term. An extensive simulation study is performed to compare the performances of our proposals with other implementations of the adaptive lasso, a disproportionality method, lasso-based methods and HDPS-based methods. The proposed methods show overall better results in terms of false discoveries and sensitivity than competing methods. An empirical study similar to the one conducted in chapter 2 completes the evaluation. All the evaluated methods are implemented in the R package "adapt4pv" available on the CRAN. Alongside to methodological developments in spontaneous reporting, there has been a growing interest in the use of medico-administrative databases for signal detection in pharmacovigilance. Methodological research efforts in this area are to be developed. In chapter 4, we explore detection strategies exploiting spontaneous reports and the national health insurance permanent sample (Echantillon Généraliste des bénéficiaires, EGB). We first evaluate the performance of a detection on the EGB using DILIrank. Then, we consider a detection conducted on spontaneous reports based on an adaptive lasso integrating, through weights, the information related to the drug exposure of a control group measured in the EGB. In both cases, the contribution of medico-administrative data is difficult to evaluate because of the relatively small size of the EGB
Ternes, Nils. "Identification de biomarqueurs prédictifs de la survie et de l'effet du traitement dans un contexte de données de grande dimension." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS278/document.
Full textWith the recent revolution in genomics and in stratified medicine, the development of molecular signatures is becoming more and more important for predicting the prognosis (prognostic biomarkers) and the treatment effect (predictive biomarkers) of each patient. However, the large quantity of information has rendered false positives more and more frequent in biomedical research. The high-dimensional space (i.e. number of biomarkers ≫ sample size) leads to several statistical challenges such as the identifiability of the models, the instability of the selected coefficients or the multiple testing issue.The aim of this thesis was to propose and evaluate statistical methods for the identification of these biomarkers and the individual predicted survival probability for new patients, in the context of the Cox regression model. For variable selection in a high-dimensional setting, the lasso penalty is commonly used. In the prognostic setting, an empirical extension of the lasso penalty has been proposed to be more stringent on the estimation of the tuning parameter λ in order to select less false positives. In the predictive setting, focus has been given to the biomarker-by-treatment interactions in the setting of a randomized clinical trial. Twelve approaches have been proposed for selecting these interactions such as lasso (standard, adaptive, grouped or ridge+lasso), boosting, dimension reduction of the main effects and a model incorporating arm-specific biomarker effects. Finally, several strategies were studied to obtain an individual survival prediction with a corresponding confidence interval for a future patient from a penalized regression model, while limiting the potential overfit.The performance of the approaches was evaluated through simulation studies combining null and alternative scenarios. The methods were also illustrated in several data sets containing gene expression data in breast cancer