To see the other types of publications on this topic, follow the link: Numeric and categorical data.

Dissertations / Theses on the topic 'Numeric and categorical data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Numeric and categorical data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Jia, Hong. "Clustering of categorical and numerical data without knowing cluster number." HKBU Institutional Repository, 2013. http://repository.hkbu.edu.hk/etd_ra/1495.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Suarez, Alvarez Maria Del Mar. "Design and analysis of clustering algorithms for numerical, categorical and mixed data." Thesis, Cardiff University, 2010. http://orca.cf.ac.uk/54131/.

Full text
Abstract:
In recent times, several machine learning techniques have been applied successfully to discover useful knowledge from data. Cluster analysis that aims at finding similar subgroups from a large heterogeneous collection of records, is one o f the most useful and popular of the available techniques o f data mining. The purpose of this research is to design and analyse clustering algorithms for numerical, categorical and mixed data sets. Most clustering algorithms are limited to either numerical or categorical attributes. Datasets with mixed types o f attributes are common in real life and so to design and analyse clustering algorithms for mixed data sets is quite timely. Determining the optimal solution to the clustering problem is NP-hard. Therefore, it is necessary to find solutions that are regarded as “good enough” quickly. Similarity is a fundamental concept for the definition of a cluster. It is very common to calculate the similarity or dissimilarity between two features using a distance measure. Attributes with large ranges will implicitly assign larger contributions to the metrics than the application to attributes with small ranges. There are only a few papers especially devoted to normalisation methods. Usually data is scaled to unit range. This does not secure equal average contributions of all features to the similarity measure. For that reason, a main part o f this thesis is devoted to normalisation.
APA, Harvard, Vancouver, ISO, and other styles
3

Hjerpe, Adam. "Computing Random Forests Variable Importance Measures (VIM) on Mixed Numerical and Categorical Data." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-185496.

Full text
Abstract:
The Random Forest model is commonly used as a predictor function and the model have been proven useful in a variety of applications. Their popularity stems from the combination of providing high prediction accuracy, their ability to model high dimensional complex data, and their applicability under predictor correlations. This report investigates the random forest variable importance measure (VIM) as a means to find a ranking of important variables. The robustness of the VIM under imputation of categorical noise, and the capability to differentiate informative predictors from non-informative variables is investigated. The selection of variables may improve robustness of the predictor, improve the prediction accuracy, reduce computational time, and may serve as a exploratory data analysis tool. In addition the partial dependency plot obtained from the random forest model is examined as a means to find underlying relations in a non-linear simulation study.
Random Forest (RF) är en populär prediktormodell som visat goda resultat vid en stor uppsättning applikationsstudier. Modellen ger hög prediktionsprecision, har förmåga att modellera komplex högdimensionell data och modellen har vidare visat goda resultat vid interkorrelerade prediktorvariabler. Detta projekt undersöker ett mått, variabel importance measure (VIM) erhållna från RF modellen, för att beräkna graden av association mellan prediktorvariabler och målvariabeln. Projektet undersöker känsligheten hos VIM vid kvalitativt prediktorbrus och undersöker VIMs förmåga att differentiera prediktiva variabler från variabler som endast, med aveende på målvariableln, beskriver brus. Att differentiera prediktiva variabler vid övervakad inlärning kan användas till att öka robustheten hos klassificerare, öka prediktionsprecisionen, reducera data dimensionalitet och VIM kan användas som ett verktyg för att utforska relationer mellan prediktorvariabler och målvariablel.
APA, Harvard, Vancouver, ISO, and other styles
4

Kirsch, Matthew Robert. "Signal Processing Algorithms for Analysis of Categorical and Numerical Time Series: Application to Sleep Study Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=case1278606480.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Obry, Tom. "Apprentissage numérique et symbolique pour le diagnostic et la réparation automobile." Thesis, Toulouse, INSA, 2020. http://www.theses.fr/2020ISAT0014.

Full text
Abstract:
Le clustering est une des méthodes issues de l'apprentissage non-supervisé qui vise à partitionner un ensemble de données en différents groupes homogènes au sens d’un critère de similarité. Les données de chaque groupe partagent alors des caractéristiques communes. DyClee est un classifieur qui réalise une classification à partir de données numériques arrivant en flux continu et qui propose un mécanisme d’adaptation pour mettre à jour cette classification réalisant ainsi un clustering dynamique en accord avec les évolutions du système ou procédé suivi. Néanmoins la seule prise en compte des attributs numériques ne permet pas d’appréhender tous les champs d’application. Dans cet objectif de généralisation, cette thèse propose d’une part une extension aux données catégorielles nominales, d’autre part une extension aux données mixtes. Des approches de clustering hiérarchique sont également proposées afin d’assister les experts dans l’interprétation des clusters obtenus et dans la validation des partitions générées. L'algorithme présenté, appelé DyClee Mixte, peut être appliqué dans des divers domaines applicatifs. Dans le cas de cette thèse, il est utilisé dans le domaine du diagnostic automobile
Clustering is one of the methods resulting from unsupervised learning which aims to partition a data set into different homogeneous groups in the sense of a similarity criterion. The data in each group then share common characteristics. DyClee is a classifier that performs a classification based on digital data arriving in a continuous flow and which proposes an adaptation mechanism to update this classification, thus performing dynamic clustering in accordance with the evolution of the system or process being followed. Nevertheless, the only consideration of numerical attributes does not allow to apprehend all the fields of application. In this generalization objective, this thesis proposes on the one hand an extension to nominal categorical data, and on the other hand an extension to mixed data. Hierarchical clustering approaches are also proposed in order to assist the experts in the interpretation of the obtained clusters and in the validation of the generated partitions. The presented algorithm, called Mixed DyClee, can be applied in various application domains. In the case of this thesis, it is used in the field of automotive diagnostics
APA, Harvard, Vancouver, ISO, and other styles
6

Bashon, Yasmina M. "Contributions to fuzzy object comparison and applications. Similarity measures for fuzzy and heterogeneous data and their applications." Thesis, University of Bradford, 2013. http://hdl.handle.net/10454/6305.

Full text
Abstract:
This thesis makes an original contribution to knowledge in the fi eld of data objects' comparison where the objects are described by attributes of fuzzy or heterogeneous (numeric and symbolic) data types. Many real world database systems and applications require information management components that provide support for managing such imperfect and heterogeneous data objects. For example, with new online information made available from various sources, in semi-structured, structured or unstructured representations, new information usage and search algorithms must consider where such data collections may contain objects/records with di fferent types of data: fuzzy, numerical and categorical for the same attributes. New approaches of similarity have been presented in this research to support such data comparison. A generalisation of both geometric and set theoretical similarity models has enabled propose new similarity measures presented in this thesis, to handle the vagueness (fuzzy data type) within data objects. A framework of new and unif ied similarity measures for comparing heterogeneous objects described by numerical, categorical and fuzzy attributes has also been introduced. Examples are used to illustrate, compare and discuss the applications and e fficiency of the proposed approaches to heterogeneous data comparison.
Libyan Embassy
APA, Harvard, Vancouver, ISO, and other styles
7

Bashon, Yasmina Massoud. "Contributions to fuzzy object comparison and applications : similarity measures for fuzzy and heterogeneous data and their applications." Thesis, University of Bradford, 2013. http://hdl.handle.net/10454/6305.

Full text
Abstract:
This thesis makes an original contribution to knowledge in the fi eld of data objects' comparison where the objects are described by attributes of fuzzy or heterogeneous (numeric and symbolic) data types. Many real world database systems and applications require information management components that provide support for managing such imperfect and heterogeneous data objects. For example, with new online information made available from various sources, in semi-structured, structured or unstructured representations, new information usage and search algorithms must consider where such data collections may contain objects/records with di fferent types of data: fuzzy, numerical and categorical for the same attributes. New approaches of similarity have been presented in this research to support such data comparison. A generalisation of both geometric and set theoretical similarity models has enabled propose new similarity measures presented in this thesis, to handle the vagueness (fuzzy data type) within data objects. A framework of new and unif ied similarity measures for comparing heterogeneous objects described by numerical, categorical and fuzzy attributes has also been introduced. Examples are used to illustrate, compare and discuss the applications and e fficiency of the proposed approaches to heterogeneous data comparison.
APA, Harvard, Vancouver, ISO, and other styles
8

Hollingsworth, Jason Michael. "Foundational Data Repository for Numeric Engine Validation." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2661.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Läuter, Henning, and Ayad Ramadan. "Statistical Scaling of Categorical Data." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2011/4956/.

Full text
Abstract:
Estimation and testing of distributions in metric spaces are well known. R.A. Fisher, J. Neyman, W. Cochran and M. Bartlett achieved essential results on the statistical analysis of categorical data. In the last 40 years many other statisticians found important results in this field. Often data sets contain categorical data, e.g. levels of factors or names. There does not exist any ordering or any distance between these categories. At each level there are measured some metric or categorical values. We introduce a new method of scaling based on statistical decisions. For this we define empirical probabilities for the original observations and find a class of distributions in a metric space where these empirical probabilities can be found as approximations for equivalently defined probabilities. With this method we identify probabilities connected with the categorical data and probabilities in metric spaces. Here we get a mapping from the levels of factors or names into points of a metric space. This mapping yields the scale for the categorical data. From the statistical point of view we use multivariate statistical methods, we calculate maximum likelihood estimations and compare different approaches for scaling.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Yiqun. "Advances in categorical data clustering." HKBU Institutional Repository, 2019. https://repository.hkbu.edu.hk/etd_oa/658.

Full text
Abstract:
Categorical data are common in various research areas, and clustering is a prevalent technique used for analyse them. However, two challenging problems are encountered in categorical data clustering analysis. The first is that most categorical data distance metrics were actually proposed for nominal data (i.e., a categorical data set that comprises only nominal attributes), ignoring the fact that ordinal attributes are also common in various categorical data sets. As a result, these nominal data distance metrics cannot account for the order information of ordinal attributes and may thus inappropriately measure the distances for ordinal data (i.e., a categorical data set that comprises only ordinal attributes) and mixed categorical data (i.e., a categorical data set that comprises both ordinal and nominal attributes). The second problem is that most hierarchical clustering approaches were actually designed for numerical data and have very high computation costs; that is, with time complexity O(N2) for a data set with N data objects. These issues have presented huge obstacles to the clustering analysis of categorical data. To address the ordinal data distance measurement problem, we studied the characteristics of ordered possible values (also called 'categories' interchangeably in this thesis) of ordinal attributes and propose a novel ordinal data distance metric, which we call the Entropy-Based Distance Metric (EBDM), to quantify the distances between ordinal categories. The EBDM adopts cumulative entropy as a measure to indicate the amount of information in the ordinal categories and simulates the thinking process of changing one's mind between two ordered choices to quantify the distances according to the amount of information in the ordinal categories. The order relationship and the statistical information of the ordinal categories are both considered by the EBDM for more appropriate distance measurement. Experimental results illustrate the superiority of the proposed EBDM in ordinal data clustering. In addition to designing an ordinal data distance metric, we further propose a unified categorical data distance metric that is suitable for distance measurement of all three types of categorical data (i.e., ordinal data, nominal data, and mixed categorical data). The extended version uniformly defines distances and attribute weights for both ordinal and nominal attributes, by which the distances measured for the two types of attributes of a mixed categorical data can be directly combined to obtain the overall distances between data objects with no information loss. Extensive experiments on all three types of categorical data sets demonstrate the effectiveness of the unified distance metric in clustering analysis of categorical data. To address the hierarchical clustering problem of large-scale categorical data, we propose a fast hierarchical clustering framework called the Growing Multi-layer Topology Training (GMTT). The most significant merit of this framework is its ability to reduce the time complexity of most existing hierarchical clustering frameworks (i.e., O(N2)) to O(N1.5) without sacrificing the quality (i.e., clustering accuracy and hierarchical details) of the constructed hierarchy. According to our design, the GMTT framework is applicable to categorical data clustering simply by adopting a categorical data distance metric. To make the GMTT framework suitable for the processing of streaming categorical data, we also provide an incremental version of GMTT that can dynamically adopt new inputs into the hierarchy via local updating. Theoretical analysis proves that the GMTT frameworks have time complexity O(N1.5). Extensive experiments show the efficacy of the GMTT frameworks and demonstrate that they achieve more competitive categorical data clustering performance by adopting the proposed unified distance metric.
APA, Harvard, Vancouver, ISO, and other styles
11

Chang, Janis. "Analysis of ordered categorical data." Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/27857.

Full text
Abstract:
Methods of testing for a location shift between two populations in a longitudinal study are investigated when the data of interest are ordered, categorical and non-linear. A non-standard analysis involving modelling of data over time with transition probability matrices is discussed. Next, the relative efficiencies of statistics more frequently used for the analysis of such categorical data at a single time point are examined. The Wilcoxon rank sum, McCullagh, and 2 sample t statistic are compared for the analysis of such cross sectional data using simulation and efficacy calculations. Simulation techniques are then utilized in comparing the stratified Wilcoxon, McCullagh and chi squared-type statistic in their efficiencies at detecting a location shift when the data are examined over two time points. The distribution of a chi squared-type statistic based on the simple contingency table constructed by merely noting whether a subject improved, stayed the same or deteriorated is derived. Applications of these methods and results to a data set of Multiple Sclerosis patients, some of whom were treated with interferon and some of whom received a placebo are provided throughout the thesis and our findings are summarized in the last Chapter.
Science, Faculty of
Statistics, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
12

Läuter, Henning, and Ayad Ramadan. "Modeling and Scaling of Categorical Data." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2011/4957/.

Full text
Abstract:
Estimation and testing of distributions in metric spaces are well known. R.A. Fisher, J. Neyman, W. Cochran and M. Bartlett achieved essential results on the statistical analysis of categorical data. In the last 40 years many other statisticians found important results in this field. Often data sets contain categorical data, e.g. levels of factors or names. There does not exist any ordering or any distance between these categories. At each level there are measured some metric or categorical values. We introduce a new method of scaling based on statistical decisions. For this we define empirical probabilities for the original observations and find a class of distributions in a metric space where these empirical probabilities can be found as approximations for equivalently defined probabilities. With this method we identify probabilities connected with the categorical data and probabilities in metric spaces. Here we get a mapping from the levels of factors or names into points of a metric space. This mapping yields the scale for the categorical data. From the statistical point of view we use multivariate statistical methods, we calculate maximum likelihood estimations and compare different approaches for scaling.
APA, Harvard, Vancouver, ISO, and other styles
13

Pilhöfer, Alexander [Verfasser]. "Categorical Data Analysis Reordered / Alexander Pilhöfer." München : Verlag Dr. Hut, 2014. http://d-nb.info/1063221277/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Zingmark, Per-Henrik. "Models for Ordered Categorical Pharmacodynamic Data." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis: Univ.-bibl. [distributör], 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-6125.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Hommola, Susan Kerstin. "Categorical data analysis of protein structure." Thesis, University of Leeds, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.578618.

Full text
Abstract:
It has long been known that the amino-acid sequence of a protein determines its 3- dimensional structure, but accurate ab initio prediction of structure from sequence remains elusive. In this thesis, we aim to gain insight into generic principles of protein folding through statistical modelling of protein structure. The first part is concerned with local protein structure. We study the relationship of dihedral angles in short protein segments up to a length of three residues. We adopt a contingency table approach, exploring a targeted set of hypotheses through log-linear modelling to detect patterns of association between the dihedral angles in the segments considered. For segments of length two (dipeptides), our models indicate a substantial association of the side-chain conformation of the first residue with the backbone conformation of the second residue (side-to-back interaction) as well as a weaker, but still significant, associa- tion of the backbone conformation of the first residue with the side-chain conformation of the second residue (back-to-side interaction). Comparison of these interactions across dif- ferent dipeptides through cluster analysis reveals a striking pattern. For the side-to-back term, all dipeptides having the same first residue cluster together, whereas for the back- to-side term we observe a much weaker pattern. This suggests that the conformation of the first residue dictates the conformation of the second. Our categorical approach proves difficult for the analysis of longer segments due to the discrepancy between the increased complexity and the shrinking amount of data available. In the second part, we study non-local interactions represented by contact maps. Our approach focuses entirely on the positions of contacting residues and is completely inde- pendent of protein amino-acid sequence. We investigate and quantify patterns in three specific regions of aggregated contact maps of single-domain proteins belonging to the four major SCOP classes (all-α, all-β, α/β, α+β) using logistic regression models. The first two regions represent contacts of residues aligned to the N-terminus with subsequent residues, and contacts of residues aligned to the C-terminus with previous residues, in a symmetric fashion with respect to the chain termini. The third region contains contacts between terminal residues. The models for each region contain factors for the positions of contacting residues as well as factors describing parallel and anti-parallel β-strand contact patterns. There is an interesting asymmetry between N-aligned and C-aligned contacts for the α/β SCOP class. The region around the -terminus shows a strong propensity towards parallel contacts between the first few residues and residues further along the sequence, whereas the last few residues do not show any strong patterns. This N-terminal dominance could indicate cotranslational folding. The other classes do not exhibit this asymmetry, but reveal predominantly anti-parallel β-strand patterns (all-β class), mixed patterns (α+β class) or no distinct patterns (all-a class). Contact patterns in the terminal regions are generally weak showing no strong preferences towards parallel or anti-parallel f3-strand contacts.
APA, Harvard, Vancouver, ISO, and other styles
16

Fear, Simon Charles. "The analysis of categorical longitudinal data." Thesis, University of Liverpool, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266052.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Grapsa, Erofili. "Bayesian analysis for categorical survey data." Thesis, University of Southampton, 2010. https://eprints.soton.ac.uk/197303/.

Full text
Abstract:
In this thesis, we develop Bayesian methodology for univariate and multivariate categorical survey data. The Multinomial model is used and the following problems are addressed. Limited information about the design variables leads us to model the unknown design variables taking into account the sampling scheme. Random effects are incorporated in the model to deal with the effect of sampling design, that produces the Multinomial GLMM and issues such as model comparison and model averaging are also discussed. The methodology is applied in a true dataset and estimates for population counts are obtained
APA, Harvard, Vancouver, ISO, and other styles
18

Beck, John. "Interactive Visualization of Categorical Data Sets." OpenSIUC, 2012. https://opensiuc.lib.siu.edu/theses/950.

Full text
Abstract:
Many people in widely varied fields are exposed to categorical data describing myriad observations. The breadth of applications in which categorical data are used means that many of the people tasked to apply these data have not been trained in data analysis. Visualization of data is often used to alleviate this problem since visualization can convey relevant information in a non-mathematical manner. However, visualizations are frequently static and the tools to create them are largely geared towards quantitative data. It is the purpose of this thesis to demonstrate a method which expands on the parallel coordinates method of visualization and uses a 'Google Maps' style of interaction and view dependent data presentation for visualizing and exploring categorical data that is accessible by non-experts and promotes the use of domain specific knowledge. The parallel coordinates method has enjoyed increasing popularity in recent times, but has several shortcomings. This thesis seeks to address some of these problems in a manner which involves not just addressing the final static image which is generated, but the paradigm of interaction as well.
APA, Harvard, Vancouver, ISO, and other styles
19

Anderlucci, Laura <1984&gt. "Comparing Different Approaches for Clustering Categorical Data." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amsdottorato.unibo.it/4302/.

Full text
Abstract:
There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.
APA, Harvard, Vancouver, ISO, and other styles
20

Pickering, R. M. "Analysis of categorical data on pregnancy outcome." Thesis, University of Glasgow, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.280012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Clarke, Paul Simon. "Nonignorable nonresponse models for categorical survey data." Thesis, University of Southampton, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.262905.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Kahiri, James Mwangi K. "Impact of measurement errors on categorical data." Thesis, University of Southampton, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.318197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Stemp, Iain Charles. "Bayesian model selection ideas for categorical data." Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.308335.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Ehlers, Rene. "Maximum likelihood estimation procedures for categorical data." Pretoria : [s.n.], 2002. http://upetd.up.ac.za/thesis/available/etd-07222005-124541.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Lee, Keunbaik. "Marginalized regression models for longitudinal categorical data." [Gainesville, Fla.] : University of Florida, 2007. http://purl.fcla.edu/fcla/etd/UFE0021244.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Al-Babtain, Abdulhakim A. "Bayesian model determination for categorical data survey." Thesis, University of Southampton, 2001. https://eprints.soton.ac.uk/50629/.

Full text
Abstract:
Inference for survey data needs to take account of the survey design. Failing to consider the survey design in inference may lead to misleading results. The standard analysis of categorical data, developed under the assumption of multinomial sampling, is inadequate as the commonly used sampling schemes clearly violate this assumption. Since, Kish (1965) introduced the idea of a design effect, many classical solutions have been proposed, such as, first- and second-order corrections to Pearson chi-squared, likelihood-ratio chi-squared, and Wald tests. Our objective in this thesis is to present an investigation of a Bayesian approach to the analysis of categorical survey data, arising from designs including simple random sampling, finite population sampling, stratification, and cluster sampling. We focus on Bayesian methods for model selection and model averaging, where Bayes factors and the Bayesian Information Criterion (BIC) approximation have been offered as alternative approaches. These Bayesian methods are reviewed, and comparisons made between their performance. The effect of ignoring the complex sampling design is investigated. Moreover, adjustments to the multinomial-based Bayes factor and BIC are produced and evaluated.
APA, Harvard, Vancouver, ISO, and other styles
27

White, Paul. "Designs and analysis of ordered categorical data." Thesis, Teesside University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.410844.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Lacerda, Fred W. "Comparative advantages of graphic versus numeric representation of quantitative data." Diss., Virginia Polytechnic Institute and State University, 1986. http://hdl.handle.net/10919/49817.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Fatima, Kaniz. "Analysis of longitudinal data with ordered categorical response." Thesis, University of Reading, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.239058.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Pang, Wan-Kai. "Modelling ordinal categorical data : a Gibbs sampler approach." Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.323876.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Chen, Dandan. "Amended Estimators of Several Ratios for Categorical Data." Digital Commons @ East Tennessee State University, 2006. https://dc.etsu.edu/etd/2218.

Full text
Abstract:
Point estimation of several association parameters in categorical data are presented. Typically, a constant is added to the frequency counts before the association measure is computed. We will study the accuracy of these adjusted point estimators based on frequentist and Bayesian methods respectively. In particular, amended estimators for the ratio of independent Poisson rates, relative risk, odds ratio, and the ratio of marginal binomial proportions will be examined in terms of bias and mean squared error.
APA, Harvard, Vancouver, ISO, and other styles
32

Karangwa, Innocent. "Imputation techniques for non-ordered categorical missing data." University of the Western Cape, 2016. http://hdl.handle.net/11394/5061.

Full text
Abstract:
Philosophiae Doctor - PhD
Missing data are common in survey data sets. Enrolled subjects do not often have data recorded for all variables of interest. The inappropriate handling of missing data may lead to bias in the estimates and incorrect inferences. Therefore, special attention is needed when analysing incomplete data. The multivariate normal imputation (MVNI) and the multiple imputation by chained equations (MICE) have emerged as the best techniques to impute or fills in missing data. The former assumes a normal distribution of the variables in the imputation model, but can also handle missing data whose distributions are not normal. The latter fills in missing values taking into account the distributional form of the variables to be imputed. The aim of this study was to determine the performance of these methods when data are missing at random (MAR) or completely at random (MCAR) on unordered or nominal categorical variables treated as predictors or response variables in the regression models. Both dichotomous and polytomous variables were considered in the analysis. The baseline data used was the 2007 Demographic and Health Survey (DHS) from the Democratic Republic of Congo. The analysis model of interest was the logistic regression model of the woman’s contraceptive method use status on her marital status, controlling or not for other covariates (continuous, nominal and ordinal). Based on the data set with missing values, data sets with missing at random and missing completely at random observations on either the covariates or response variables measured on nominal scale were first simulated, and then used for imputation purposes. Under MVNI method, unordered categorical variables were first dichotomised, and then K − 1 (where K is the number of levels of the categorical variable of interest) dichotomised variables were included in the imputation model, leaving the other category as a reference. These variables were imputed as continuous variables using a linear regression model. Imputation with MICE considered the distributional form of each variable to be imputed. That is, imputations were drawn using binary and multinomial logistic regressions for dichotomous and polytomous variables respectively. The performance of these methods was evaluated in terms of bias and standard errors in regression coefficients that were estimated to determine the association between the woman’s contraceptive methods use status and her marital status, controlling or not for other types of variables. The analysis was done assuming that the sample was not weighted fi then the sample weight was taken into account to assess whether the sample design would affect the performance of the multiple imputation methods of interest, namely MVNI and MICE. As expected, the results showed that for all the models, MVNI and MICE produced less biased smaller standard errors than the case deletion (CD) method, which discards items with missing values from the analysis. Moreover, it was found that when data were missing (MCAR or MAR) on the nominal variables that were treated as predictors in the regression model, MVNI reduced bias in the regression coefficients and standard errors compared to MICE, for both unweighted and weighted data sets. On the other hand, the results indicated that MICE outperforms MVNI when data were missing on the response variables, either the binary or polytomous. Furthermore, it was noted that the sample design (sample weights), the rates of missingness and the missing data mechanisms (MCAR or MAR) did not affect the behaviour of the multiple imputation methods that were considered in this study. Thus, based on these results, it can be concluded that when missing values are present on the outcome variables measured on a nominal scale in regression models, the distributional form of the variable with missing values should be taken into account. When these variables are used as predictors (with missing observations), the parametric imputation approach (MVNI) would be a better option than MICE.
APA, Harvard, Vancouver, ISO, and other styles
33

Parker, K. N. "Numeric data frames and probabilistic judgments in complex real-world environments." Thesis, University College London (University of London), 2017. http://discovery.ucl.ac.uk/1536437/.

Full text
Abstract:
This thesis investigates human probabilistic judgment in complex real-world settings to identify processes underpinning biases across groups which relate to numerical frames and formats. Experiments are conducted replicating real-world environments and data to test judgment performance based on framing and format. Regardless of background skills and experience, people in professional and consumer contexts show a strong tendency to perceive the world from a linear perspective, interpreting information in concrete, absolute terms and making judgments based on seeking and applying linear functions. Whether predicting sales, selecting between financial products, or forecasting refugee camp data, people use minimal cues and systematically apply additive methods amidst non-linear trends and percentage points to yield linear estimates in both rich and sparse informational contexts. Depending on data variability and temporality, human rationality and choice may be significantly helped or hindered by informational framing and format. The findings deliver both theoretical and practical contributions. Across groups and individual differences, the effects of informational format and the tendency to linearly extrapolate are connected by the bias to perceive values in concrete terms and make sense of data by seeking simple referent points. People compare and combine referents using additive methods when inappropriate and adhere strongly to defaults when applied in complex numeric environments. The practical contribution involves a framing manipulation which shows that format biases (i.e., additive processing) and optimism (i.e., associated with intertemporal effects) can be counteracted in judgments involving percentages and exponential growth rates by using absolute formats and positioning defaults in future event context information. This framing manipulation was highly effective in improving loan choice and repayment judgments compared to information in standard finance industry formats. There is a strong potential to increase rationality using this data format manipulation in other financial settings and domains such as health behaviour change in which peoples’ erroneous interpretation of percentages and non-linear relations negatively impact choice and behaviours in both the short and long-term.
APA, Harvard, Vancouver, ISO, and other styles
34

Marés, Soler Jordi. "Categorical Data Protection on Statistical Datasets and Social Networks." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/129327.

Full text
Abstract:
L’augment continu de la publicació de dades amb contingut sensible ha incrementat el risc de violar la privacitat de les persones i/o institucions. Actualment aquest augment és cada cop mes ràpid degut a la gran expansió d’Internet. Aquest aspecte fa molt important la comprovació del rendiment dels mètodes de protecció utilitzats. Per tal de fer aquestes comprovacions existeixen dos tipus de mesures a tenir en compte: la pèrdua d’informació i el risc de revelació. Una altra àrea on la privacitat ha incrementat el seu rol n’és el de les xarxes socials. Les xarxes socials han esdevingut un ingredient essencial en la comunicació entre persones en l’actual món modern. Permeten als usuaris expressar i compartir els seus interessos i comentar els esdeveniments diaris amb tota la gent amb la qual estan connectats. Així doncs, el ràpid augment de la popularitat de les xarxes socials ha resultat en l’adopció d’aquestes com a àrea d’interès per a comunitats específiques. No obstant, el volum de dades compartides pot ser molt perillós en termes de privacitat. A més de la informació explícita compartida mitjanant els ”posts” de cada usuari, existeix informació semàntica implícita amagada en el conjunt de d’informació compartida per cada usuari. Per aquestes i altres raons, la protecció de les dades pertanyents a cada usuari ha de ser tractada. Així doncs, les principals contribucions d’aquesta tesi són: • El desenvolupament de mètodes de protecció basats en algorismes evolutius els quals busquen de manera automatitzada millors proteccions en termes de pèrdua d’informació i risc de revelació. • El desenvolupament d’un mètode evolutiu per tal d’optimitzar la matriu de probabilitats de transició amb la qual es basa el mètode Post- Randomization Method per tal de generar proteccions millors. • La definició d’un mètode de protecció per a dades categ`oriques basat en l’execució d’un algorisme de clustering abans de protegir per tal d’obtenir dades protegides amb millor utilitat. • La definició de com es pot extreure tant informació implícita com explicita d’una xarxa social real com Twitter, el desenvolupament d’un mètode de protecció per xarxes socials i la definició de noves mesures per avaluar la qualitat de les proteccions en aquests escenaris.
The continuous growth of public sensitive data has increased the risk of breaking the privacy of people or institutions in those datasets. This growing is, nowadays, even faster because of the expansion of the Internet. This fact makes very important the assessment of the performance of all the methods used to protect those datasets. In order to check the performance there exist two kind of measures: the information loss and the disclosure risk. Another area where privacy has an increasing role is the one of social networks. They have become an essential ingredient of interpersonal communication in the modern world. They enable users to express and share common interests, comment upon everyday events with all the people with whom they are connected. Indeed, the growth of social media has been rapid and has resulted in the adoption of social networks to meet specific communities of interest.However, this shared information space can prove to be dangerous in respect of user privacy issues. In addition to explicit ”posts” there is much implicit semantic information that is not explicitly given in the posts that the user shares. For these and other reasons, the protection of information pertaining to each user needs to be supported. This thesis shows some new approaches to face these problems. The main contributions are: • The development of an approach for protecting microdata datasets based on evolutionary algorithms which seeks automatically for better protections in terms of information loss and disclosure risk. • The development of an evolutionary approach to optimize the transition matrices used in the Post-Randomization masking method which performs better protections. • The definition of an approach to deal with categorical microdata protection based on a pre-clustering approach achieving protected data with better utility. • The definition of a way to extract both implicit and explicit information from a real social network like Twitter as well as the development of a protection method to deal with this information and some new measures to evaluate the protection quality.
APA, Harvard, Vancouver, ISO, and other styles
35

Tran, Thu Trung. "Bayesian model estimation and comparison for longitudinal categorical data." Queensland University of Technology, 2008. http://eprints.qut.edu.au/19240/.

Full text
Abstract:
In this thesis, we address issues of model estimation for longitudinal categorical data and of model selection for these data with missing covariates. Longitudinal survey data capture the responses of each subject repeatedly through time, allowing for the separation of variation in the measured variable of interest across time for one subject from the variation in that variable among all subjects. Questions concerning persistence, patterns of structure, interaction of events and stability of multivariate relationships can be answered through longitudinal data analysis. Longitudinal data require special statistical methods because they must take into account the correlation between observations recorded on one subject. A further complication in analysing longitudinal data is accounting for the non- response or drop-out process. Potentially, the missing values are correlated with variables under study and hence cannot be totally excluded. Firstly, we investigate a Bayesian hierarchical model for the analysis of categorical longitudinal data from the Longitudinal Survey of Immigrants to Australia. Data for each subject is observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia. Secondly, we examine the Bayesian model selection techniques of the Bayes factor and Deviance Information Criterion for our regression models with miss- ing covariates. Computing Bayes factors involve computing the often complex marginal likelihood p(y|model) and various authors have presented methods to estimate this quantity. Here, we take the approach of path sampling via power posteriors (Friel and Pettitt, 2006). The appeal of this method is that for hierarchical regression models with missing covariates, a common occurrence in longitudinal data analysis, it is straightforward to calculate and interpret since integration over all parameters, including the imputed missing covariates and the random effects, is carried out automatically with minimal added complexi- ties of modelling or computation. We apply this technique to compare models for the employment status of immigrants to Australia. Finally, we also develop a model choice criterion based on the Deviance In- formation Criterion (DIC), similar to Celeux et al. (2006), but which is suitable for use with generalized linear models (GLMs) when covariates are missing at random. We define three different DICs: the marginal, where the missing data are averaged out of the likelihood; the complete, where the joint likelihood for response and covariates is considered; and the naive, where the likelihood is found assuming the missing values are parameters. These three versions have different computational complexities. We investigate through simulation the performance of these three different DICs for GLMs consisting of normally, binomially and multinomially distributed data with missing covariates having a normal distribution. We find that the marginal DIC and the estimate of the effective number of parameters, pD, have desirable properties appropriately indicating the true model for the response under differing amounts of missingness of the covariates. We find that the complete DIC is inappropriate generally in this context as it is extremely sensitive to the degree of missingness of the covariate model. Our new methodology is illustrated by analysing the results of a community survey.
APA, Harvard, Vancouver, ISO, and other styles
36

Fung, Siu-leung, and 馮紹樑. "Higher-order Markov chain models for categorical data sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B26666224.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

茉莉子, 髙岸, 高岸 茉莉子, and Mariko Takagishi. "Clustering and visualization for enhancing interpretation of categorical data." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13112135/?lang=0, 2019. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13112135/?lang=0.

Full text
Abstract:
本論文では大規模カテゴリカルデータのデータ解釈の場面で生じる問題を考えた.データが大規模な場合,クラスター分析や視覚化などで,データの潜在的な構造を調べる方法が有用とされるが,対象ごとにカテゴリの解釈が異なったり,同じ属性でも回答傾向が異なったりすると解釈が複雑になる.本論文ではそのように既存手法をシンプルに適用するのでは解釈が難しいようなデータに対して,よりわかりやすい解釈をするための手法を開発した.
Large-scale categorical data are often obtained in various fields. As an interpretation of large-scale data tends to be complicated, methods to capture the latent structure in data, such as a cluster analysis and a visualization method are often used to make data more interpretable. However, there are some situations where these methods failed to capture the latent structure that is interpretable (e.g., interpretation of categories by each respondent is different). Therefore in this paper, two problems that often occur in large-scale categorical data analysis is considered, and new methods to address these issues are proposed.
博士(文化情報学)
Doctor of Culture and Information Science
同志社大学
Doshisha University
APA, Harvard, Vancouver, ISO, and other styles
38

Berrett, Candace. "Bayesian Probit Regression Models for Spatially-Dependent Categorical Data." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1285076512.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Fore, Neil Koberlein. "A Contrast Pattern based Clustering Algorithm for Categorical Data." Wright State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=wright1285345623.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Miranda, Samantha. "Investigation of Multiple Imputation Methods for Categorical Variables." Digital Commons @ East Tennessee State University, 2020. https://dc.etsu.edu/etd/3722.

Full text
Abstract:
We compare different multiple imputation methods for categorical variables using the MICE package in R. We take a complete data set and remove different levels of missingness and evaluate the imputation methods for each level of missingness. Logistic regression imputation and linear discriminant analysis (LDA) are used for binary variables. Multinomial logit imputation and LDA are used for nominal variables while ordered logit imputation and LDA are used for ordinal variables. After imputation, the regression coefficients, percent deviation index (PDI) values, and relative frequency tables were found for each imputed data set for each level of missingness and compared to the complete corresponding data set. It was found that logistic regression outperformed LDA for binary variables, and LDA outperformed both multinomial logit imputation and ordered logit imputation for nominal and ordered variables. Simulations were ran to confirm the validity of the results.
APA, Harvard, Vancouver, ISO, and other styles
41

Kraus, Katrin. "On the Measurement of Model Fit for Sparse Categorical Data." Doctoral thesis, Uppsala universitet, Statistiska institutionen, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-173768.

Full text
Abstract:
This thesis consists of four papers that deal with several aspects of the measurement of model fit for categorical data. In all papers, special attention is paid to situations with sparse data. The first paper concerns the computational burden of calculating Pearson's goodness-of-fit statistic for situations where many response patterns have observed frequencies that equal zero. A simple solution is presented that allows for the computation of the total value of Pearson's goodness-of-fit statistic when the expected frequencies of response patterns with observed frequencies of zero are unknown. In the second paper, a new fit statistic is presented that is a modification of Pearson's statistic but that is not adversely affected by response patterns with very small expected frequencies. It is shown that the new statistic is asymptotically equivalent to Pearson's goodness-of-fit statistic and hence, asymptotically chi-square distributed. In the third paper, comprehensive simulation studies are conducted that compare seven asymptotically equivalent fit statistics, including the new statistic. Situations that are considered concern both multinomial sampling and factor analysis. Tests for the goodness-of-fit are conducted by means of the asymptotic and the bootstrap approach both under the null hypothesis and when there is a certain degree of misfit in the data. Results indicate that recommendations on the use of a fit statistic can be dependent on the investigated situation and on the purpose of the model test. Power varies substantially between the fit statistics and the cause of the misfit of the model. Findings indicate further that the new statistic proposed in this thesis shows rather stable results and compared to the other fit statistics, no disadvantageous characteristics of the fit statistic are found. Finally, in the fourth paper, the potential necessity of determining the goodness-of-fit by two sided model testing is adverted. A simulation study is conducted that investigates differences between the one sided and the two sided approach of model testing. Situations are identified for which two sided model testing has advantages over the one sided approach.
APA, Harvard, Vancouver, ISO, and other styles
42

Philipp, Brian F. "A categorical analysis of Weapon System Accuracy Trial (WSAT) data." Thesis, Monterey, California. Naval Postgraduate School, 1992. http://hdl.handle.net/10945/23563.

Full text
Abstract:
Approved for public release; distribution is unlimited
This thesis contains an analysis of the last five years of Antisubmarine Warfare (ASW) Weapon System Accuracy Trial (WSAT) data from both the Atlantic and Pacific fleet. The analysis is conducted in an effort to provide recommendations to be applied toward future evolution of the ASW Test Program for surface ships. A statistical chi-square test is conducted on fleet and Navy wide data to determine which ASW combat system material categories are most prone to degradation. Additionally, a critical examination of the existing WSAT data base is provided with an aim toward promoting future statistical analysis. Results of this thesis indicate that degradation to weapons delivery systems like torpedo tubes and ASROC launchers is statistically more significant than the other WSAT test categories. The thesis also recommends new ways to adapt the existing WSAT data base to conduct more informative inspections of existing and new construction ships.
APA, Harvard, Vancouver, ISO, and other styles
43

Ortiz, Enrique. "A Scalable and Efficient Outlier Detection Strategy for Categorical Data." Honors in the Major Thesis, University of Central Florida, 2007. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1185.

Full text
Abstract:
This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Engineering and Computer Science
Computer Engineering
APA, Harvard, Vancouver, ISO, and other styles
44

Zhang, Xuhong Ph D. Massachusetts Institute of Technology. "Intelligible models for learning categorical data via generalized fourier spectrum." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/121725.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 167-170).
Machine learning techniques have found ubiquitous applications in recent years and sophisticated models such as neural networks and ensemble methods have achieved impressive predictive performances. However, these models are hard to interpret and usually used as a blackbox. In applications where an explanation is required in addition to a prediction, linear models (e.g. Linear Regression or Logistic Regression) remain to be mainstream tools due to their simplicity and good interpretability. This thesis considers learning problems on categorical data and proposes methods that retain the good interpretability of linear models but significantly improve the predictive performance. In particular, we provide ways to automatically generate and efficiently select new features based on the raw data, and then train a linear model in the new feature space. The proposed methods are inspired by the Boolean function analysis literature, which studies the Fourier spectrum of Boolean functions and in turn provides spectrum-based learning algorithms. Such algorithms are important tools in computational learning theory, but not considered practically useful due to the unrealistic assumption of uniform input distribution. This work generalizes the idea of Fourier spectrum of Boolean functions to allow arbitrary input distribution. The generalized Fourier spectrum is also of theoretical interest. It carries over and meaningfully generalizes many important results of Fourier spectrum. Moreover, it offers a framework to explore how the input distribution and target function jointly affect the difficulty of a learning problem, and provides the right language for discussing data-dependent, algorithm-independent complexity of Boolean functions.
by Xuhong Zhang.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
45

Johansson, Fernstad Sara. "Algorithmically Guided Information Visualization : Explorative Approaches for High Dimensional, Mixed and Categorical Data." Doctoral thesis, Linköpings universitet, Medie- och Informationsteknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70860.

Full text
Abstract:
Facilitated by the technological advances of the last decades, increasing amounts of complex data are being collected within fields such as biology, chemistry and social sciences. The major challenge today is not to gather data, but to extract useful information and gain insights from it. Information visualization provides methods for visual analysis of complex data but, as the amounts of gathered data increase, the challenges of visual analysis become more complex. This thesis presents work utilizing algorithmically extracted patterns as guidance during interactive data exploration processes, employing information visualization techniques. It provides efficient analysis by taking advantage of fast pattern identification techniques as well as making use of the domain expertise of the analyst. In particular, the presented research is concerned with the issues of analysing categorical data, where the values are names without any inherent order or distance; mixed data, including a combination of categorical and numerical data; and high dimensional data, including hundreds or even thousands of variables. The contributions of the thesis include a quantification method, assigning numerical values to categorical data, which utilizes an automated method to define category similarities based on underlying data structures, and integrates relationships within numerical variables into the quantification when dealing with mixed data sets. The quantification is incorporated in an interactive analysis pipeline where it provides suggestions for numerical representations, which may interactively be adjusted by the analyst. The interactive quantification enables exploration using commonly available visualization methods for numerical data. Within the context of categorical data analysis, this thesis also contributes the first user study evaluating the performance of what are currently the two main visualization approaches for categorical data analysis. Furthermore, this thesis contributes two dimensionality reduction approaches, which aim at preserving structure while reducing dimensionality, and provide flexible and user-controlled dimensionality reduction. Through algorithmic quality metric analysis, where each metric represents a structure of interest, potentially interesting variables are extracted from the high dimensional data. The automatically identified structures are visually displayed, using various visualization methods, and act as guidance in the selection of interesting variable subsets for further analysis. The visual representations furthermore provide overview of structures within the high dimensional data set and may, through this, aid in focusing subsequent analysis, as well as enabling interactive exploration of the full high dimensional data set and selected variable subsets. The thesis also contributes the application of algorithmically guided approaches for high dimensional data exploration in the rapidly growing field of microbiology, through the design and development of a quality-guided interactive system in collaboration with microbiologists.
APA, Harvard, Vancouver, ISO, and other styles
46

Wisnesky, Ryan. "Functional Query Languages with Categorical Types." Thesis, Harvard University, 2013. http://dissertations.umi.com/gsas.harvard:11288.

Full text
Abstract:
We study three category-theoretic types in the context of functional query languages (typed lambda-calculi extended with additional operations for bulk data processing). The types we study are:
Engineering and Applied Sciences
APA, Harvard, Vancouver, ISO, and other styles
47

Fung, Siu-leung. "High-dimensional Markov chain models for categorical data sequences with applications." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B37682702.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Gates, Peter J. "Analyzing categorical traits in domestic animal data collected in the field /." Uppsala : Swedish Univ. of Agricultural Sciences (Sveriges lantbruksuniv.), 1999. http://epsilon.slu.se/avh/1999/91-576-5473-5.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Fung, Siu-leung, and 馮紹樑. "High-dimensional Markov chain models for categorical data sequences with applications." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B37682702.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

鄭佳文. "A Clustering Algorithm For Mixed Numeric And Categorical Data." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/01872387181676933864.

Full text
Abstract:
碩士
國立彰化師範大學
資訊工程學系
97
Clustering is considered an important tool for data mining. The goal of data clustering is aimed at dividing the huge data into several groups that objects have a high degree of similarity to each other in the same group, and extracting hidden patterns from data. Many clustering algorithms have been developed in diverse domains. However, most of the traditional clustering algorithms are designed to focus either on numeric data or on categorical data. The collected data in real world always contain both numeric and categorical attributes, it makes the difficult for applying tradition clustering algorithm into these kind of data directly. Thus, in this research, a co-occurrence based method is presented to solve this problem. The basic assumption of co-occurrence is that if two attributes always show up in one object together, there will be a strong similarity between them. All categorical attributes are converted into numeric attributes, the traditional clustering algorithm can be applied to group collected data sets without pain. The proposed approach can yield more concrete results than k-prototypes and SPSS Clementine. Keywords:data mining、clustering、co-occurrence、mixed data
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography