Dissertations / Theses on the topic 'High dimensional data'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'High dimensional data.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Wauters, John, and John Wauters. "Independence Screening in High-Dimensional Data." Thesis, The University of Arizona, 2016. http://hdl.handle.net/10150/623083.
Full textZeugner, Stefan. "Macroeconometrics with high-dimensional data." Doctoral thesis, Universite Libre de Bruxelles, 2012. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209640.
Full textThe default g-priors predominant in Bayesian Model Averaging tend to over-concentrate posterior mass on a tiny set of models - a feature we denote as 'supermodel effect'. To address it, we propose a 'hyper-g' prior specification, whose data-dependent shrinkage adapts posterior model distributions to data quality. We demonstrate the asymptotic consistency of the hyper-g prior, and its interpretation as a goodness-of-fit indicator. Moreover, we highlight the similarities between hyper-g and 'Empirical Bayes' priors, and introduce closed-form expressions essential to computationally feasibility. The robustness of the hyper-g prior is demonstrated via simulation analysis, and by comparing four vintages of economic growth data.
CHAPTER 2:
Ciccone and Jarocinski (2010) show that inference in Bayesian Model Averaging (BMA) can be highly sensitive to small data perturbations. In particular they demonstrate that the importance attributed to potential growth determinants varies tremendously over different revisions of international income data. They conclude that 'agnostic' priors appear too sensitive for this strand of growth empirics. In response, we show that the found instability owes much to a specific BMA set-up: First, comparing the same countries over data revisions improves robustness. Second, much of the remaining variation can be reduced by applying an evenly 'agnostic', but flexible prior.
CHAPTER 3:
This chapter explores the link between the leverage of the US financial sector, of households and of non-financial businesses, and real activity. We document that leverage is negatively correlated with the future growth of real activity, and positively linked to the conditional volatility of future real activity and of equity returns.
The joint information in sectoral leverage series is more relevant for predicting future real activity than the information contained in any individual leverage series. Using in-sample regressions and out-of sample forecasts, we show that the predictive power of leverage is roughly comparable to that of macro and financial predictors commonly used by forecasters.
Leverage information would not have allowed to predict the 'Great Recession' of 2008-2009 any better than conventional macro/financial predictors.
CHAPTER 4:
Model averaging has proven popular for inference with many potential predictors in small samples. However, it is frequently criticized for a lack of robustness with respect to prediction and inference. This chapter explores the reasons for such robustness problems and proposes to address them by transforming the subset of potential 'control' predictors into principal components in suitable datasets. A simulation analysis shows that this approach yields robustness advantages vs. both standard model averaging and principal component-augmented regression. Moreover, we devise a prior framework that extends model averaging to uncertainty over the set of principal components and show that it offers considerable improvements with respect to the robustness of estimates and inference about the importance of covariates. Finally, we empirically benchmark our approach with popular model averaging and PC-based techniques in evaluating financial indicators as alternatives to established macroeconomic predictors of real economic activity.
Doctorat en Sciences économiques et de gestion
info:eu-repo/semantics/nonPublished
Boulesteix, Anne-Laure. "Dimension reduction and Classification with High-Dimensional Microarray Data." Diss., lmu, 2005. http://nbn-resolving.de/urn:nbn:de:bvb:19-28017.
Full textSamko, Oksana. "Low dimension hierarchical subspace modelling of high dimensional data." Thesis, Cardiff University, 2009. http://orca.cf.ac.uk/54883/.
Full textRuan, Lingyan. "Statistical analysis of high dimensional data." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37135.
Full textShen, Xilin. "Multiscale analysis of high dimensional data." Connect to online resource, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3284443.
Full textWang, Wangie. "Clustering Problems for High Dimensional Data." Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/384.
Full textWang, Wanjie. "CLUSTERING PROBLEMS FOR HIGH DIMENSIONAL DATA." Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/443.
Full textCsikós, Mónika. "Efficient Approximations of High-Dimensional Data." Thesis, Université Gustave Eiffel, 2022. http://www.theses.fr/2022UEFL2004.
Full textIn this thesis, we study approximations of set systems (X,S), where X is a base set and S consists of subsets of X called ranges. Given a finite set system, our goal is to construct a small subset of X set such that each range is `well-approximated'. In particular, for a given parameter epsilon in (0,1), we say that a subset A of X is an epsilon-approximation of (X,S) if for any range R in S, the fractions |A cap R|/|A| and |R|/|X| are epsilon-close.Research on such approximations started in the 1950s, with random sampling being the key tool for showing their existence. Since then, the notion of approximations has become a fundamental structure across several communities---learning theory, statistics, combinatorics and algorithms. A breakthrough in the study of approximations dates back to 1971 when Vapnik and Chervonenkis studied set systems with finite VC-dimension, which turned out a key parameter to characterise their complexity. For instance, if a set system (X,S) has VC dimension d, then a uniform sample of O(d/epsilon^2) points is an epsilon-approximation of (X,S) with high probability. Importantly, the size of the approximation only depends on epsilon and d, and it is independent of the input sizes |X| and |S|!In the first part of this thesis, we give a modular, self-contained, intuitive proof of the above uniform sampling guarantee .In the second part, we give an improvement of a 30 year old algorithmic bottleneck---constructing matchings with low crossing number. This can be applied to build approximations with improved guarantees.Finally, we answer a 30 year old open problem of Blumer etal. by proving tight lower bounds on the VC dimension of unions of half-spaces - a geometric set system that appears in several applications, e.g. coreset constructions
Qin, Yingli. "Statistical inference for high-dimensional data." [Ames, Iowa : Iowa State University], 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3389139.
Full textLou, Qiang. "LEARNING FROM INCOMPLETE HIGH-DIMENSIONAL DATA." Diss., Temple University Libraries, 2013. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/214785.
Full textPh.D.
Data sets with irrelevant and redundant features and large fraction of missing values are common in the real life application. Learning such data usually requires some preprocess such as selecting informative features and imputing missing values based on observed data. These processes can provide more accurate and more efficient prediction as well as better understanding of the data distribution. In my dissertation I will describe my work in both of these aspects and also my following up work on feature selection in incomplete dataset without imputing missing values. In the last part of my dissertation, I will present my current work on more challenging situation where high-dimensional data is time-involving. The first two parts of my dissertation consist of my methods that focus on handling such data in a straightforward way: imputing missing values first, and then applying traditional feature selection method to select informative features. We proposed two novel methods, one for imputing missing values and the other one for selecting informative features. We proposed a new method that imputes the missing attributes by exploiting temporal correlation of attributes, correlations among multiple attributes collected at the same time and space, and spatial correlations among attributes from multiple sources. The proposed feature selection method aims to find a minimum subset of the most informative variables for classification/regression by efficiently approximating the Markov Blanket which is a set of variables that can shield a certain variable from the target. I present, in the third part, how to perform feature selection in incomplete high-dimensional data without imputation, since imputation methods only work well when data is missing completely at random, when fraction of missing values is small, or when there is prior knowledge about the data distribution. We define the objective function of the uncertainty margin-based feature selection method to maximize each instance's uncertainty margin in its own relevant subspace. In optimization, we take into account the uncertainty of each instance due to the missing values. The experimental results on synthetic and 6 benchmark data sets with few missing values (less than 25%) provide evidence that our method can select the same accurate features as the alternative methods which apply an imputation method first. However, when there is a large fraction of missing values (more than 25%) in data, our feature selection method outperforms the alternatives, which impute missing values first. In the fourth part, I introduce my method handling more challenging situation where the high-dimensional data varies in time. Existing way to handle such data is to flatten temporal data into single static data matrix, and then applying traditional feature selection method. In order to keep the dynamics in the time series data, our method avoid flattening the data in advance. We propose a way to measure the distance between multivariate temporal data from two instances. Based on this distance, we define the new objective function based on the temporal margin of each data instance. A fixed-point gradient descent method is proposed to solve the formulated objective function to learn the optimal feature weights. The experimental results on real temporal microarray data provide evidence that the proposed method can identify more informative features than the alternatives that flatten the temporal data in advance.
Temple University--Theses
Dannenberg, Matthew. "Pattern Recognition in High-Dimensional Data." Scholarship @ Claremont, 2016. https://scholarship.claremont.edu/hmc_theses/76.
Full textPacella, Massimo. "High-dimensional statistics for complex data." Doctoral thesis, Universita degli studi di Salerno, 2018. http://hdl.handle.net/10556/3016.
Full textHigh dimensional data analysis has become a popular research topic in the recent years, due to the emergence of various new applications in several fields of sciences underscoring the need for analysing massive data sets. One of the main challenge in analysing high dimensional data regards the interpretability of estimated models as well as the computational efficiency of procedures adopted. Such a purpose can be achieved through the identification of relevant variables that really affect the phenomenon of interest, so that effective models can be subsequently constructed and applied to solve practical problems. The first two chapters of the thesis are devoted in studying high dimensional statistics for variable selection. We firstly introduce a short but exhaustive review on the main developed techniques for the general problem of variable selection using nonparametric statistics. Lastly in chapter 3 we will present our proposal regarding a feature screening approach for non additive models developed by using of conditional information in the estimation procedure... [edited by Author]
XXX ciclo
ZANCO, ALESSANDRO. "High-dimensional data driven parameterized macromodeling." Doctoral thesis, Politecnico di Torino, 2022. http://hdl.handle.net/11583/2971991.
Full textHassan, Tahir Mohammed. "Data-independent vs. data-dependent dimension reduction for pattern recognition in high dimensional spaces." Thesis, University of Buckingham, 2017. http://bear.buckingham.ac.uk/199/.
Full textYahya, Waheed Babatunde. "Sequential Dimension Reduction and Prediction Methods with High-dimensional Microarray Data." Diss., lmu, 2009. http://nbn-resolving.de/urn:nbn:de:bvb:19-102544.
Full textLiu, Jinze Wang Wei. "New approaches for clustering high dimensional data." Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2006. http://dc.lib.unc.edu/u?/etd,584.
Full textTitle from electronic title page (viewed Oct. 10, 2007). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science." Discipline: Computer Science; Department/School: Computer Science.
Mansoor, Rashid. "Assessing Distributional Properties of High-Dimensional Data." Doctoral thesis, Internationella Handelshögskolan, Högskolan i Jönköping, IHH, Economics, Finance and Statistics, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-22547.
Full textSun, Yizhi. "Statistical Analysis of Structured High-dimensional Data." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/97505.
Full textPHD
Harvey, William John. "Understanding High-Dimensional Data Using Reeb Graphs." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1342614959.
Full textGreen, Brittany. "Ultra-high Dimensional Semiparametric Longitudinal Data Analysis." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1593171378846243.
Full textHuo, Shuning. "Bayesian Modeling of Complex High-Dimensional Data." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/101037.
Full textDoctor of Philosophy
With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional data in different forms, such as engineering signals, medical images, and genomics measurements. However, acquisition of such data does not automatically lead to efficient knowledge discovery. The main objective of this dissertation is to develop novel Bayesian methods to extract useful knowledge from complex high-dimensional data. It has two parts—the development of an ultra-fast functional mixed model and the modeling of data heterogeneity via Dirichlet Diffusion Trees. The first part focuses on developing approximate Bayesian methods in functional mixed models to estimate parameters and detect significant regions. Two datasets demonstrate the effectiveness of proposed method—a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part focuses on modeling data heterogeneity via Dirichlet Diffusion Trees. The method helps uncover the underlying hierarchical tree structures and estimate systematic differences between the group of samples. We demonstrate the effectiveness of the method through the brain tumor imaging data.
Williams, Andre. "Stereotype Logit Models for High Dimensional Data." VCU Scholars Compass, 2010. http://scholarscompass.vcu.edu/etd/147.
Full textChi, Yuan. "Machine learning techniques for high dimensional data." Thesis, University of Liverpool, 2015. http://livrepository.liverpool.ac.uk/2033319/.
Full textMcWilliams, Brian Victor Parulian. "Projection based models for high dimensional data." Thesis, Imperial College London, 2011. http://hdl.handle.net/10044/1/9577.
Full textMahammad, Beigi Majid. "Kernel methods for high-dimensional biological data." [S.l. : s.n.], 2008.
Find full textKöchert, Karl. "From high-dimensional data to disease mechanisms." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät I, 2011. http://dx.doi.org/10.18452/16297.
Full textInappropriate activation of the NOTCH signaling pathway, e.g. by activating mutations, contributes to the pathogenesis of various human malignancies. Using a bottom up approach based on the acquisition of high–dimensional microarray data of classical Hodgkin lymphoma (cHL) and non-Hodgkin B cell lymphomas as control, we identify a cHL specific NOTCH gene-expression signature dominated by the NOTCH co-activator Mastermind-like 2 (MAML2). This set the basis for demonstrating that aberrant expression of the essential NOTCH co-activator MAML2 provides an alternative mechanism to activate NOTCH signaling in human lymphoma cells. Using immunohistochemistry we detected high-level MAML2 expression in several B cell-derived lymphoma types, including cHL cells, whereas in normal B cells no staining for MAML2 was detectable. Inhibition of MAML protein activity by a dominant negative form of MAML or by shRNAs targeting MAML2 in cHL cells resulted in down-regulation of the NOTCH target genes HES7 and HEY1, which we identified as overexpressed in cHL cells, and in reduced proliferation. In order to target the NOTCH transcriptional complex directly we developed short peptide constructs that competitively inhibit NOTCH dependent transcriptional activity as demonstrated by NOTCH reporter assays and EMSA analyses. We conclude that NOTCH signaling is aberrantly activated in a cell autonomous manner in cHL. This is mediated by high-level expression of the essential NOTCH coactivator MAML2, a protein that is only weakly expressed in B cells from healthy donors. Using short peptide constructs we moreover show, that this approach is promising in regard to the development of NOTCH pathway inhibitors that will also work in NOTCH associated malignancies that are resistant to -secretase inhibition.
Salaro, Rossana <1994>. "Multinomial Logistic Regression with High Dimensional Data." Master's Degree Thesis, Università Ca' Foscari Venezia, 2018. http://hdl.handle.net/10579/13814.
Full textBlake, Patrick Michael. "Biclustering and Visualization of High Dimensional Data using VIsual Statistical Data Analyzer." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/87392.
Full textMaster of Science
Many data sets have too many features for conventional pattern recognition techniques to work properly. This thesis investigates techniques that alleviate these difficulties. One such technique, biclustering, clusters data in both dimensions and is inherently resistant to the challenges posed by having too many features. However, the algorithms that implement biclustering have limitations in that the user must know at least the structure of the data and how many biclusters to expect. This is where the VIsual Statistical Data Analyzer, or VISDA, can help. It is a visualization tool that successively and progressively explores the structure of the data, identifying clusters along the way. This thesis proposes coupling VISDA with biclustering to overcome some of the challenges of data sets with too many features. Further, to increase the performance, usability, and maintainability as well as reduce costs, VISDA was translated from Matlab to a Python version called VISDApy. Both VISDApy and the overall process were demonstrated with real and synthetic data sets. The results of this work have the potential to improve analysts’ understanding of the relationships within complex data sets and their ability to make informed decisions from such data.
Chung, David H. S. "High-dimensional glyph-based visualization and interactive techniques." Thesis, Swansea University, 2014. https://cronfa.swan.ac.uk/Record/cronfa42276.
Full textBattey, Heather Suzanne. "Dimension reduction and automatic smoothing in high dimensional and functional data analysis." Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609849.
Full textWeng, Jiaying. "TRANSFORMS IN SUFFICIENT DIMENSION REDUCTION AND THEIR APPLICATIONS IN HIGH DIMENSIONAL DATA." UKnowledge, 2019. https://uknowledge.uky.edu/statistics_etds/40.
Full textPolin, Afroza. "Simultaneous Inference for High Dimensional and Correlated Data." Bowling Green State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1563182262263262.
Full textBressan, Marco José Miguel. "Statistical Independence for classification for High Dimensional Data." Doctoral thesis, Universitat Autònoma de Barcelona, 2003. http://hdl.handle.net/10803/3034.
Full textLandfors, Mattias. "Normalization and analysis of high-dimensional genomics data." Doctoral thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-53486.
Full textMuja, Marius. "Scalable nearest neighbour methods for high dimensional data." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44402.
Full textWiniger, Joakim. "Estimating the intrinsic dimensionality of high dimensional data." Thesis, KTH, Matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-163170.
Full textDenna rapport ger en genomgång av olika metoder för skattning av inre dimension (ID). Principen bakom begreppet ID är att det ofta är möjligt att hitta strukturer i data som gör det möjligt att uttrycka samma data på nytt med ett färre antal koordinater (dimensioner). Syftet med detta projekt är att lösa ett vanligt problem: given en (vanligtvis högdimensionell) datamängd, avgör om antalet dimensioner är överflödiga, och om så är fallet, hitta en representation av datamängden som har ett mindre antal dimensioner. Vi introducerar olika tillvägagångssätt för skattning av inre dimension, går igenom teorin bakom dem och jämför deras resultat för både syntetiska och verkliga datamängder. De tre första metoderna skattar den inre dimensionen av data medan den fjärde hittar en lägre-dimensionell version av en datamängd. Denna ordning är praktisk för syftet med projektet, när vi har en skattning av den inre dimensionen av en datamängd kan vi använda denna skattning för att konstruera en enklare version av datamängden som har detta antal dimensioner. Resultaten visar att för högdimensionell data går det att reducera antalet dimensioner avsevärt. De olika metoderna ger liknande resultat trots deras olika teoretiska bakgrunder, och ger väntade resultat när de används på syntetiska datamängder vars inre dimensioner redan är kända.
Zhang, Peng. "Structured sensing for estimation of high-dimensional data." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/49415.
Full textNilsson, Mårten. "Augmenting High-Dimensional Data with Deep Generative Models." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233969.
Full textDataaugmentering är en teknik som kan utföras på flera sätt för att förbättra träningen av diskriminativa modeller. De senaste framgångarna inom djupa generativa modeller har öppnat upp nya sätt att augmentera existerande dataset. I detta arbete har ett ramverk för augmentering av annoterade dataset med hjälp av djupa generativa modeller föreslagits. Utöver detta så har en metod för kvantitativ evaulering av kvaliteten hos genererade data set tagits fram. Med hjälp av detta ramverk har två dataset för pupillokalisering genererats med olika generativa modeller. Både väletablerade modeller och en ny modell utvecklad för detta syfte har testats. Den unika modellen visades både kvalitativt och kvantitativt att den genererade de bästa dataseten. Ett antal mindre experiment på standardiserade dataset visade exempel på fall där denna generativa modell kunde förbättra prestandan hos en existerande diskriminativ modell. Resultaten indikerar att generativa modeller kan användas för att augmentera eller ersätta existerande dataset vid träning av diskriminativa modeller.
Schlosser, Pascal [Verfasser], and Martin [Akademischer Betreuer] Schumacher. "Netboost: statistical modeling strategies for high-dimensional data." Freiburg : Universität, 2019. http://d-nb.info/1237220505/34.
Full textLi, Hao. "Feature cluster selection for high-dimensional data analysis." Diss., Online access via UMI:, 2007.
Find full textWang, Kaijun. "Graph-based Modern Nonparametrics For High-dimensional Data." Diss., Temple University Libraries, 2019. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/578840.
Full textPh.D.
Developing nonparametric statistical methods and inference procedures for high-dimensional large data have been a challenging frontier problem of statistics. To attack this problem, in recent years, a clear rising trend has been observed with a radically different viewpoint--``Graph-based Nonparametrics," which is the main research focus of this dissertation. The basic idea consists of two steps: (i) representation step: code the given data using graphs, (ii) analysis step: apply statistical methods on the graph-transformed problem to systematically tackle various types of data structures. Under this general framework, this dissertation develops two major research directions. Chapter 2—based on Mukhopadhyay and Wang (2019a)—introduces a new nonparametric method for high-dimensional k-sample comparison problem that is distribution-free, robust, and continues to work even when the dimension of the data is larger than the sample size. The proposed theory is based on modern LP-nonparametrics tools and unexplored connections with spectral graph theory. The key is to construct a specially-designed weighted graph from the data and to reformulate the k-sample problem into a community detection problem. The procedure is shown to possess various desirable properties along with a characteristic exploratory flavor that has practical consequences. The numerical examples show surprisingly well performance of our method under a broad range of realistic situations. Chapter 3—based on Mukhopadhyay and Wang (2019b)—revisits some foundational questions about network modeling that are still unsolved. In particular, we present unified statistical theory of the fundamental spectral graph methods (e.g., Laplacian, Modularity, Diffusion map, regularized Laplacian, Google PageRank model), which are often viewed as spectral heuristic-based empirical mystery facts. Despite half a century of research, this question has been one of the most formidable open issues, if not the core problem in modern network science. Our approach integrates modern nonparametric statistics, mathematical approximation theory (of integral equations), and computational harmonic analysis in a novel way to develop a theory that unifies and generalizes the existing paradigm. From a practical standpoint, it is shown that this perspective can provide adequate guidance for designing next-generation computational tools for large-scale problems. As an example, we have described the high-dimensional change-point detection problem. Chapter 4 discusses some further extensions and application of our methodologies to regularized spectral clustering and spatial graph regression problems. The dissertation concludes with the a discussion of two important areas of future studies.
Temple University--Theses
GALVANI, MARTA. "Predictive and Clustering Methods for High dimensional data." Doctoral thesis, Università degli studi di Pavia, 2020. http://hdl.handle.net/11571/1361035.
Full textZhang, Liangwei. "Big Data Analytics for eMaintenance : Modeling of high-dimensional data streams." Licentiate thesis, Luleå tekniska universitet, Drift, underhåll och akustik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-17012.
Full textGodkänd; 2015; 20150512 (liazha); Nedanstående person kommer att hålla licentiatseminarium för avläggande av teknologie licentiatexamen. Namn: Liangwei Zhang Ämne: Drift och underhållsteknik/Operation and Maintenance Engineering Uppsats: Big Data Analytics for eMaintenance Examinator: Professor Uday Kumar Institutionen för samhällsbyggnad och naturresurser Avdelning Drift, underhåll och akustik Luleå tekniska universitet Diskutant: Professor Wolfgang Birk Institutionen för system- och rymdteknik Avdelning Signaler och system Luleå tekniska universitet Tid: Onsdag 10 juni 2015 kl 10.00 Plats: E243, Luleå tekniska universitet
François, Damien. "High-dimensional data analysis : optimal metrics and feature selection." Université catholique de Louvain, 2007. http://edoc.bib.ucl.ac.be:81/ETD-db/collection/available/BelnUcetd-01152007-162739/.
Full textMiloš, Radovanović. "High-Dimensional Data Representations and Metrics for Machine Learning and Data Mining." Phd thesis, Univerzitet u Novom Sadu, Prirodno-matematički fakultet u Novom Sadu, 2011. https://www.cris.uns.ac.rs/record.jsf?recordId=77530&source=NDLTD&language=en.
Full textU tekućem „informatičkom dobu“, masivne količine podataka sesakupljaju brzinom koja ne dozvoljava njihovo efektivno strukturiranje,analizu, i pretvaranje u korisno znanje. Ovo zasićenje informacijamase manifestuje kako kroz veliki broj objekata uključenihu skupove podataka, tako i kroz veliki broj atributa, takođe poznatkao velika dimenzionalnost. Disertacija se bavi problemima kojiproizilaze iz velike dimenzionalnosti reprezentacije podataka, čestonazivanim „prokletstvom dimenzionalnosti“, u kontekstu mašinskogučenja, data mining-a i information retrieval-a. Opisana istraživanjaprate dva pravca: izučavanje ponašanja metrika (ne)sličnosti u odnosuna rastuću dimenzionalnost, i proučavanje metoda odabira atributa,prvenstveno u interakciji sa tehnikama reprezentacije dokumenata zaklasifikaciju teksta. Centralni rezultati disertacije, relevantni za prvipravac istraživanja, uključuju teorijske uvide u fenomen koncentracijekosinusne mere sličnosti, i detaljnu analizu fenomena habovitosti kojise odnosi na tendenciju nekih tačaka u skupu podataka da postanuhabovi tako što bivaju uvrštene u neočekivano mnogo lista k najbližihsuseda ostalih tačaka. Mehanizmi koji pokreću fenomen detaljno suproučeni, kako iz teorijske tako i iz empirijske perspektive. Habovitostje povezana sa (latentnom) dimenzionalnošću podataka, opisanaje njena interakcija sa strukturom klastera u podacima i informacijamakoje pružaju oznake klasa, i demonstriran je njen efekat napoznate algoritme za klasifikaciju, semi-supervizirano učenje, klasteringi detekciju outlier-a, sa posebnim osvrtom na klasifikaciju vremenskihserija i information retrieval. Rezultati koji se odnose nadrugi pravac istraživanja uključuju kvantifikaciju interakcije izmeđurazličitih transformacija višedimenzionalnih reprezentacija dokumenatai odabira atributa, u kontekstu klasifikacije teksta.
Vege, Sri Harsha. "Ensemble of Feature Selection Techniques for High Dimensional Data." TopSCHOLAR®, 2012. http://digitalcommons.wku.edu/theses/1164.
Full textDing, Yuanyuan. "Handling complex, high dimensional data for classification and clustering /." Full text available from ProQuest UM Digital Dissertations, 2007. http://0-proquest.umi.com.umiss.lib.olemiss.edu/pqdweb?index=0&did=1400971141&SrchMode=2&sid=1&Fmt=2&VInst=PROD&VType=PQD&RQT=309&VName=PQD&TS=1219343482&clientId=22256.
Full textTillander, Annika. "Classification models for high-dimensional data with sparsity patterns." Doctoral thesis, Stockholms universitet, Statistiska institutionen, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-95664.
Full textMed dagens teknik, till exempel spektrometer och genchips, alstras data i stora mängder. Detta överflöd av data är inte bara till fördel utan orsakar även vissa problem, vanligtvis är antalet variabler (p) betydligt fler än antalet observation (n). Detta ger så kallat högdimensionella data vilket kräver nya statistiska metoder, då de traditionella metoderna är utvecklade för den omvända situationen (p<n). Dessutom är det vanligtvis väldigt få av alla dessa variabler som är relevanta för något givet projekt och styrkan på informationen hos de relevanta variablerna är ofta svag. Därav brukar denna typ av data benämnas som gles och svag (sparse and weak). Vanligtvis brukar identifiering av de relevanta variablerna liknas vid att hitta en nål i en höstack. Denna avhandling tar upp tre olika sätt att klassificera i denna typ av högdimensionella data. Där klassificera innebär, att genom ha tillgång till ett dataset med både förklaringsvariabler och en utfallsvariabel, lära en funktion eller algoritm hur den skall kunna förutspå utfallsvariabeln baserat på endast förklaringsvariablerna. Den typ av riktiga data som används i avhandlingen är microarrays, det är cellprov som visar aktivitet hos generna i cellen. Målet med klassificeringen är att med hjälp av variationen i aktivitet hos de tusentals gener (förklaringsvariablerna) avgöra huruvida cellprovet kommer från cancervävnad eller normalvävnad (utfallsvariabeln). Det finns klassificeringsmetoder som kan hantera högdimensionella data men dessa är ofta beräkningsintensiva, därav fungera de ofta bättre för diskreta data. Genom att transformera kontinuerliga variabler till diskreta (diskretisera) kan beräkningstiden reduceras och göra klassificeringen mer effektiv. I avhandlingen studeras huruvida av diskretisering påverkar klassificeringens prediceringsnoggrannhet och en mycket effektiv diskretiseringsmetod för högdimensionella data föreslås. Linjära klassificeringsmetoder har fördelen att vara stabila. Nackdelen är att de kräver en inverterbar kovariansmatris och vilket kovariansmatrisen inte är för högdimensionella data. I avhandlingen föreslås ett sätt att skatta inversen för glesa kovariansmatriser med blockdiagonalmatris. Denna matris har dessutom fördelen att det leder till additiv klassificering vilket möjliggör att välja hela block av relevanta variabler. I avhandlingen presenteras även en metod för att identifiera och välja ut blocken. Det finns också probabilistiska klassificeringsmetoder som har fördelen att ge sannolikheten att tillhöra vardera av de möjliga utfallen för en observation, inte som de flesta andra klassificeringsmetoder som bara predicerar utfallet. I avhandlingen förslås en sådan Bayesiansk metod, givet den blockdiagonala matrisen och normalfördelade utfallsklasser. De i avhandlingen förslagna metodernas relevans och fördelar är visade genom att tillämpa dem på simulerade och riktiga högdimensionella data.
Zhao, Jiwu [Verfasser]. "Automatic subspace clustering for high-dimensional data / Jiwu Zhao." Düsseldorf : Universitäts- und Landesbibliothek der Heinrich-Heine-Universität Düsseldorf, 2014. http://d-nb.info/1047907658/34.
Full text