Dissertations / Theses on the topic 'Numeric and categorical data'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Numeric and categorical data.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Jia, Hong. "Clustering of categorical and numerical data without knowing cluster number." HKBU Institutional Repository, 2013. http://repository.hkbu.edu.hk/etd_ra/1495.
Full textSuarez, Alvarez Maria Del Mar. "Design and analysis of clustering algorithms for numerical, categorical and mixed data." Thesis, Cardiff University, 2010. http://orca.cf.ac.uk/54131/.
Full textHjerpe, Adam. "Computing Random Forests Variable Importance Measures (VIM) on Mixed Numerical and Categorical Data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-185496.
Full textRandom Forest (RF) är en populär prediktormodell som visat goda resultat vid en stor uppsättning applikationsstudier. Modellen ger hög prediktionsprecision, har förmåga att modellera komplex högdimensionell data och modellen har vidare visat goda resultat vid interkorrelerade prediktorvariabler. Detta projekt undersöker ett mått, variabel importance measure (VIM) erhållna från RF modellen, för att beräkna graden av association mellan prediktorvariabler och målvariabeln. Projektet undersöker känsligheten hos VIM vid kvalitativt prediktorbrus och undersöker VIMs förmåga att differentiera prediktiva variabler från variabler som endast, med aveende på målvariableln, beskriver brus. Att differentiera prediktiva variabler vid övervakad inlärning kan användas till att öka robustheten hos klassificerare, öka prediktionsprecisionen, reducera data dimensionalitet och VIM kan användas som ett verktyg för att utforska relationer mellan prediktorvariabler och målvariablel.
Kirsch, Matthew Robert. "Signal Processing Algorithms for Analysis of Categorical and Numerical Time Series: Application to Sleep Study Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=case1278606480.
Full textObry, Tom. "Apprentissage numérique et symbolique pour le diagnostic et la réparation automobile." Thesis, Toulouse, INSA, 2020. http://www.theses.fr/2020ISAT0014.
Full textClustering is one of the methods resulting from unsupervised learning which aims to partition a data set into different homogeneous groups in the sense of a similarity criterion. The data in each group then share common characteristics. DyClee is a classifier that performs a classification based on digital data arriving in a continuous flow and which proposes an adaptation mechanism to update this classification, thus performing dynamic clustering in accordance with the evolution of the system or process being followed. Nevertheless, the only consideration of numerical attributes does not allow to apprehend all the fields of application. In this generalization objective, this thesis proposes on the one hand an extension to nominal categorical data, and on the other hand an extension to mixed data. Hierarchical clustering approaches are also proposed in order to assist the experts in the interpretation of the obtained clusters and in the validation of the generated partitions. The presented algorithm, called Mixed DyClee, can be applied in various application domains. In the case of this thesis, it is used in the field of automotive diagnostics
Bashon, Yasmina M. "Contributions to fuzzy object comparison and applications. Similarity measures for fuzzy and heterogeneous data and their applications." Thesis, University of Bradford, 2013. http://hdl.handle.net/10454/6305.
Full textLibyan Embassy
Bashon, Yasmina Massoud. "Contributions to fuzzy object comparison and applications : similarity measures for fuzzy and heterogeneous data and their applications." Thesis, University of Bradford, 2013. http://hdl.handle.net/10454/6305.
Full textHollingsworth, Jason Michael. "Foundational Data Repository for Numeric Engine Validation." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2661.pdf.
Full textLäuter, Henning, and Ayad Ramadan. "Statistical Scaling of Categorical Data." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2011/4956/.
Full textZhang, Yiqun. "Advances in categorical data clustering." HKBU Institutional Repository, 2019. https://repository.hkbu.edu.hk/etd_oa/658.
Full textChang, Janis. "Analysis of ordered categorical data." Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/27857.
Full textScience, Faculty of
Statistics, Department of
Graduate
Läuter, Henning, and Ayad Ramadan. "Modeling and Scaling of Categorical Data." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2011/4957/.
Full textPilhöfer, Alexander [Verfasser]. "Categorical Data Analysis Reordered / Alexander Pilhöfer." München : Verlag Dr. Hut, 2014. http://d-nb.info/1063221277/34.
Full textZingmark, Per-Henrik. "Models for Ordered Categorical Pharmacodynamic Data." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis: Univ.-bibl. [distributör], 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-6125.
Full textHommola, Susan Kerstin. "Categorical data analysis of protein structure." Thesis, University of Leeds, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.578618.
Full textFear, Simon Charles. "The analysis of categorical longitudinal data." Thesis, University of Liverpool, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266052.
Full textGrapsa, Erofili. "Bayesian analysis for categorical survey data." Thesis, University of Southampton, 2010. https://eprints.soton.ac.uk/197303/.
Full textBeck, John. "Interactive Visualization of Categorical Data Sets." OpenSIUC, 2012. https://opensiuc.lib.siu.edu/theses/950.
Full textAnderlucci, Laura <1984>. "Comparing Different Approaches for Clustering Categorical Data." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amsdottorato.unibo.it/4302/.
Full textPickering, R. M. "Analysis of categorical data on pregnancy outcome." Thesis, University of Glasgow, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.280012.
Full textClarke, Paul Simon. "Nonignorable nonresponse models for categorical survey data." Thesis, University of Southampton, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.262905.
Full textKahiri, James Mwangi K. "Impact of measurement errors on categorical data." Thesis, University of Southampton, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.318197.
Full textStemp, Iain Charles. "Bayesian model selection ideas for categorical data." Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.308335.
Full textEhlers, Rene. "Maximum likelihood estimation procedures for categorical data." Pretoria : [s.n.], 2002. http://upetd.up.ac.za/thesis/available/etd-07222005-124541.
Full textLee, Keunbaik. "Marginalized regression models for longitudinal categorical data." [Gainesville, Fla.] : University of Florida, 2007. http://purl.fcla.edu/fcla/etd/UFE0021244.
Full textAl-Babtain, Abdulhakim A. "Bayesian model determination for categorical data survey." Thesis, University of Southampton, 2001. https://eprints.soton.ac.uk/50629/.
Full textWhite, Paul. "Designs and analysis of ordered categorical data." Thesis, Teesside University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.410844.
Full textLacerda, Fred W. "Comparative advantages of graphic versus numeric representation of quantitative data." Diss., Virginia Polytechnic Institute and State University, 1986. http://hdl.handle.net/10919/49817.
Full textFatima, Kaniz. "Analysis of longitudinal data with ordered categorical response." Thesis, University of Reading, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.239058.
Full textPang, Wan-Kai. "Modelling ordinal categorical data : a Gibbs sampler approach." Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.323876.
Full textChen, Dandan. "Amended Estimators of Several Ratios for Categorical Data." Digital Commons @ East Tennessee State University, 2006. https://dc.etsu.edu/etd/2218.
Full textKarangwa, Innocent. "Imputation techniques for non-ordered categorical missing data." University of the Western Cape, 2016. http://hdl.handle.net/11394/5061.
Full textMissing data are common in survey data sets. Enrolled subjects do not often have data recorded for all variables of interest. The inappropriate handling of missing data may lead to bias in the estimates and incorrect inferences. Therefore, special attention is needed when analysing incomplete data. The multivariate normal imputation (MVNI) and the multiple imputation by chained equations (MICE) have emerged as the best techniques to impute or fills in missing data. The former assumes a normal distribution of the variables in the imputation model, but can also handle missing data whose distributions are not normal. The latter fills in missing values taking into account the distributional form of the variables to be imputed. The aim of this study was to determine the performance of these methods when data are missing at random (MAR) or completely at random (MCAR) on unordered or nominal categorical variables treated as predictors or response variables in the regression models. Both dichotomous and polytomous variables were considered in the analysis. The baseline data used was the 2007 Demographic and Health Survey (DHS) from the Democratic Republic of Congo. The analysis model of interest was the logistic regression model of the woman’s contraceptive method use status on her marital status, controlling or not for other covariates (continuous, nominal and ordinal). Based on the data set with missing values, data sets with missing at random and missing completely at random observations on either the covariates or response variables measured on nominal scale were first simulated, and then used for imputation purposes. Under MVNI method, unordered categorical variables were first dichotomised, and then K − 1 (where K is the number of levels of the categorical variable of interest) dichotomised variables were included in the imputation model, leaving the other category as a reference. These variables were imputed as continuous variables using a linear regression model. Imputation with MICE considered the distributional form of each variable to be imputed. That is, imputations were drawn using binary and multinomial logistic regressions for dichotomous and polytomous variables respectively. The performance of these methods was evaluated in terms of bias and standard errors in regression coefficients that were estimated to determine the association between the woman’s contraceptive methods use status and her marital status, controlling or not for other types of variables. The analysis was done assuming that the sample was not weighted fi then the sample weight was taken into account to assess whether the sample design would affect the performance of the multiple imputation methods of interest, namely MVNI and MICE. As expected, the results showed that for all the models, MVNI and MICE produced less biased smaller standard errors than the case deletion (CD) method, which discards items with missing values from the analysis. Moreover, it was found that when data were missing (MCAR or MAR) on the nominal variables that were treated as predictors in the regression model, MVNI reduced bias in the regression coefficients and standard errors compared to MICE, for both unweighted and weighted data sets. On the other hand, the results indicated that MICE outperforms MVNI when data were missing on the response variables, either the binary or polytomous. Furthermore, it was noted that the sample design (sample weights), the rates of missingness and the missing data mechanisms (MCAR or MAR) did not affect the behaviour of the multiple imputation methods that were considered in this study. Thus, based on these results, it can be concluded that when missing values are present on the outcome variables measured on a nominal scale in regression models, the distributional form of the variable with missing values should be taken into account. When these variables are used as predictors (with missing observations), the parametric imputation approach (MVNI) would be a better option than MICE.
Parker, K. N. "Numeric data frames and probabilistic judgments in complex real-world environments." Thesis, University College London (University of London), 2017. http://discovery.ucl.ac.uk/1536437/.
Full textMarés, Soler Jordi. "Categorical Data Protection on Statistical Datasets and Social Networks." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/129327.
Full textThe continuous growth of public sensitive data has increased the risk of breaking the privacy of people or institutions in those datasets. This growing is, nowadays, even faster because of the expansion of the Internet. This fact makes very important the assessment of the performance of all the methods used to protect those datasets. In order to check the performance there exist two kind of measures: the information loss and the disclosure risk. Another area where privacy has an increasing role is the one of social networks. They have become an essential ingredient of interpersonal communication in the modern world. They enable users to express and share common interests, comment upon everyday events with all the people with whom they are connected. Indeed, the growth of social media has been rapid and has resulted in the adoption of social networks to meet specific communities of interest.However, this shared information space can prove to be dangerous in respect of user privacy issues. In addition to explicit ”posts” there is much implicit semantic information that is not explicitly given in the posts that the user shares. For these and other reasons, the protection of information pertaining to each user needs to be supported. This thesis shows some new approaches to face these problems. The main contributions are: • The development of an approach for protecting microdata datasets based on evolutionary algorithms which seeks automatically for better protections in terms of information loss and disclosure risk. • The development of an evolutionary approach to optimize the transition matrices used in the Post-Randomization masking method which performs better protections. • The definition of an approach to deal with categorical microdata protection based on a pre-clustering approach achieving protected data with better utility. • The definition of a way to extract both implicit and explicit information from a real social network like Twitter as well as the development of a protection method to deal with this information and some new measures to evaluate the protection quality.
Tran, Thu Trung. "Bayesian model estimation and comparison for longitudinal categorical data." Queensland University of Technology, 2008. http://eprints.qut.edu.au/19240/.
Full textFung, Siu-leung, and 馮紹樑. "Higher-order Markov chain models for categorical data sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B26666224.
Full text茉莉子, 髙岸, 高岸 茉莉子, and Mariko Takagishi. "Clustering and visualization for enhancing interpretation of categorical data." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13112135/?lang=0, 2019. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13112135/?lang=0.
Full textLarge-scale categorical data are often obtained in various fields. As an interpretation of large-scale data tends to be complicated, methods to capture the latent structure in data, such as a cluster analysis and a visualization method are often used to make data more interpretable. However, there are some situations where these methods failed to capture the latent structure that is interpretable (e.g., interpretation of categories by each respondent is different). Therefore in this paper, two problems that often occur in large-scale categorical data analysis is considered, and new methods to address these issues are proposed.
博士(文化情報学)
Doctor of Culture and Information Science
同志社大学
Doshisha University
Berrett, Candace. "Bayesian Probit Regression Models for Spatially-Dependent Categorical Data." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1285076512.
Full textFore, Neil Koberlein. "A Contrast Pattern based Clustering Algorithm for Categorical Data." Wright State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=wright1285345623.
Full textMiranda, Samantha. "Investigation of Multiple Imputation Methods for Categorical Variables." Digital Commons @ East Tennessee State University, 2020. https://dc.etsu.edu/etd/3722.
Full textKraus, Katrin. "On the Measurement of Model Fit for Sparse Categorical Data." Doctoral thesis, Uppsala universitet, Statistiska institutionen, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-173768.
Full textPhilipp, Brian F. "A categorical analysis of Weapon System Accuracy Trial (WSAT) data." Thesis, Monterey, California. Naval Postgraduate School, 1992. http://hdl.handle.net/10945/23563.
Full textThis thesis contains an analysis of the last five years of Antisubmarine Warfare (ASW) Weapon System Accuracy Trial (WSAT) data from both the Atlantic and Pacific fleet. The analysis is conducted in an effort to provide recommendations to be applied toward future evolution of the ASW Test Program for surface ships. A statistical chi-square test is conducted on fleet and Navy wide data to determine which ASW combat system material categories are most prone to degradation. Additionally, a critical examination of the existing WSAT data base is provided with an aim toward promoting future statistical analysis. Results of this thesis indicate that degradation to weapons delivery systems like torpedo tubes and ASROC launchers is statistically more significant than the other WSAT test categories. The thesis also recommends new ways to adapt the existing WSAT data base to conduct more informative inspections of existing and new construction ships.
Ortiz, Enrique. "A Scalable and Efficient Outlier Detection Strategy for Categorical Data." Honors in the Major Thesis, University of Central Florida, 2007. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1185.
Full textBachelors
Engineering and Computer Science
Computer Engineering
Zhang, Xuhong Ph D. Massachusetts Institute of Technology. "Intelligible models for learning categorical data via generalized fourier spectrum." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/121725.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 167-170).
Machine learning techniques have found ubiquitous applications in recent years and sophisticated models such as neural networks and ensemble methods have achieved impressive predictive performances. However, these models are hard to interpret and usually used as a blackbox. In applications where an explanation is required in addition to a prediction, linear models (e.g. Linear Regression or Logistic Regression) remain to be mainstream tools due to their simplicity and good interpretability. This thesis considers learning problems on categorical data and proposes methods that retain the good interpretability of linear models but significantly improve the predictive performance. In particular, we provide ways to automatically generate and efficiently select new features based on the raw data, and then train a linear model in the new feature space. The proposed methods are inspired by the Boolean function analysis literature, which studies the Fourier spectrum of Boolean functions and in turn provides spectrum-based learning algorithms. Such algorithms are important tools in computational learning theory, but not considered practically useful due to the unrealistic assumption of uniform input distribution. This work generalizes the idea of Fourier spectrum of Boolean functions to allow arbitrary input distribution. The generalized Fourier spectrum is also of theoretical interest. It carries over and meaningfully generalizes many important results of Fourier spectrum. Moreover, it offers a framework to explore how the input distribution and target function jointly affect the difficulty of a learning problem, and provides the right language for discussing data-dependent, algorithm-independent complexity of Boolean functions.
by Xuhong Zhang.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Johansson, Fernstad Sara. "Algorithmically Guided Information Visualization : Explorative Approaches for High Dimensional, Mixed and Categorical Data." Doctoral thesis, Linköpings universitet, Medie- och Informationsteknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70860.
Full textWisnesky, Ryan. "Functional Query Languages with Categorical Types." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11288.
Full textEngineering and Applied Sciences
Fung, Siu-leung. "High-dimensional Markov chain models for categorical data sequences with applications." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B37682702.
Full textGates, Peter J. "Analyzing categorical traits in domestic animal data collected in the field /." Uppsala : Swedish Univ. of Agricultural Sciences (Sveriges lantbruksuniv.), 1999. http://epsilon.slu.se/avh/1999/91-576-5473-5.pdf.
Full textFung, Siu-leung, and 馮紹樑. "High-dimensional Markov chain models for categorical data sequences with applications." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B37682702.
Full text鄭佳文. "A Clustering Algorithm For Mixed Numeric And Categorical Data." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/01872387181676933864.
Full text國立彰化師範大學
資訊工程學系
97
Clustering is considered an important tool for data mining. The goal of data clustering is aimed at dividing the huge data into several groups that objects have a high degree of similarity to each other in the same group, and extracting hidden patterns from data. Many clustering algorithms have been developed in diverse domains. However, most of the traditional clustering algorithms are designed to focus either on numeric data or on categorical data. The collected data in real world always contain both numeric and categorical attributes, it makes the difficult for applying tradition clustering algorithm into these kind of data directly. Thus, in this research, a co-occurrence based method is presented to solve this problem. The basic assumption of co-occurrence is that if two attributes always show up in one object together, there will be a strong similarity between them. All categorical attributes are converted into numeric attributes, the traditional clustering algorithm can be applied to group collected data sets without pain. The proposed approach can yield more concrete results than k-prototypes and SPSS Clementine. Keywords:data mining、clustering、co-occurrence、mixed data