Dissertations / Theses on the topic 'Bayes Algorithm'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Bayes Algorithm.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Bissmark, Johan, and Oscar Wärnling. "The Sparse Data Problem Within Classification Algorithms : The Effect of Sparse Data on the Naïve Bayes Algorithm." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209227.
Full textI dagens samhälle är maskininlärningsbaserade applikationer och mjukvara, tillsammans med förutsägelser, högst aktuellt. Maskininlärning har gett oss möjligheten att förutsäga troliga utfall baserat på tidigare insamlad data och därigenom spara tid och resurser. Ett vanligt förekommande problem inom maskininlärning är gles data, eftersom det påverkar prestationen hos algoritmer för maskininlärning och deras förmåga att kunna beräkna precisa förutsägelser. Data anses vara gles när vissa förväntade värden i ett dataset saknas, vilket generellt är vanligt förekommande i storskaliga dataset. I den här rapporten ligger fokus huvudsakligen på klassificeringsalgoritmen Naïve Bayes och hur den påverkas av gles data jämfört med andra frekvent använda klassifikationsalgoritmer. Omfattningen av prestationssänkningen som resultat av gles data studeras och analyseras för att mäta hur stor effekt gles data har på förmågan att kunna beräkna precisa förutsägelser. Avslutningsvis lägger resultaten i den här rapporten grund för slutsatsen att algoritmen Naïve Bayes påverkas mindre av gles data jämfört med andra vanligt förekommande klassificeringsalgoritmer. Den här rapportens slutsats stöds även av vad tidigare forskning har visat.
Volfson, Alexander. "Exploring the optimal Transformation for Volatility." Digital WPI, 2010. https://digitalcommons.wpi.edu/etd-theses/472.
Full textNguyen, Huu Du. "System Reliability : Inference for Common Cause Failure Model in Contexts of Missing Information." Thesis, Lorient, 2019. http://www.theses.fr/2019LORIS530.
Full textThe effective operation of an entire industrial system is sometimes strongly dependent on the reliability of its components. A failure of one of these components can lead to the failure of the system with consequences that can be catastrophic, especially in the nuclear industry or in the aeronautics industry. To reduce this risk of catastrophic failures, a redundancy policy, consisting in duplicating the sensitive components in the system, is often applied. When one of these components fails, another will take over and the normal operation of the system can be maintained. However, some situations that lead to simultaneous failures of components in the system could be observed. They are called common cause failure (CCF). Analyzing, modeling, and predicting this type of failure event are therefore an important issue and are the subject of the work presented in this thesis. We investigate several methods to deal with the statistical analysis of CCF events. Different algorithms to estimate the parameters of the models and to make predictive inference based on various type of missing data are proposed. We treat confounded data using a BFR (Binomial Failure Rare) model. An EM algorithm is developed to obtain the maximum likelihood estimates (MLE) for the parameters of the model. We introduce the modified-Beta distribution to develop a Bayesian approach. The alpha-factors model is considered to analyze uncertainties in CCF. We suggest a new formalism to describe uncertainty and consider Dirichlet distributions (nested, grouped) to make a Bayesian analysis. Recording of CCF cause data leads to incomplete contingency table. For a Bayesian analysis of this type of tables, we propose an algorithm relying on inverse Bayes formula (IBF) and Metropolis-Hasting algorithm. We compare our results with those obtained with the alpha- decomposition method, a recent method proposed in the literature. Prediction of catastrophic event is addressed and mapping strategies are described to suggest upper bounds of prediction intervals with pivotal method and Bayesian techniques. Recent events have highlighted the importance of reliability redundant systems and we hope that our work will contribute to a better understanding and prediction of the risks of major CCF events
Agarwal, Akrita. "Exploring the Noise Resilience of Combined Sturges Algorithm." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1447070335.
Full textSchmidt, Samuel. "A Massively Parallel Algorithm for Cell Classification Using CUDA." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1448873851.
Full textChao, Yang, and Peng Zhang. "One General Approach For Analysing Compositional Structure Of Terms In Biomedical Field." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-20913.
Full textHarrington, Edward, and edwardharrington@homemail com au. "Aspects of Online Learning." The Australian National University. Research School of Information Sciences and Engineering, 2004. http://thesis.anu.edu.au./public/adt-ANU20060328.160810.
Full textSandberg, Sebastian. "Identifying Hateful Text on Social Media with Machine Learning Classifiers and Normalization Methods - Using Support Vector Machines and Naive Bayes Algorithm." Thesis, Umeå universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-155353.
Full textRamos, Gustavo da Mota. "Seleção entre estratégias de geração automática de dados de teste por meio de métricas estáticas de softwares orientados a objetos." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-05122018-202315/.
Full textSoftware products with different complexity are created daily through analysis of complex and varied demands together with tight deadlines. While these arise, high levels of quality are expected for such, as products become more complex, the quality level may not be acceptable while the timing for testing does not keep up with complexity. In this way, software testing and automatic generation of test data arise in order to deliver products containing high levels of quality through low cost and rapid test activities. However, in this context, software developers depend on the strategies of automatic generation of tests and especially on the selection of the most adequate technique to obtain greater code coverage possible, this is an important factor given that each technique of data generation of test have peculiarities and problems that make its use better in certain types of software. From this scenario, the present work proposes the selection of the appropriate technique for each class of software based on its characteristics, expressed through object oriented software metrics from the naive bayes classification algorithm. Initially, a literature review of the two generation algorithms was carried out, random search algorithm and genetic search algorithm, thus understanding its advantages and disadvantages in both implementation and execution. The CK metrics have also been studied in order to understand how they can better describe the characteristics of a class. The acquired knowledge allowed to collect the generation data of tests of each class as code coverage and generation time from each technique and also the CK metrics, thus allowing the analysis of these data together and finally execution of the classification algorithm. The results of this analysis demonstrated that a reduced and selected set of metrics is more efficient and better describes the characteristics of a class besides demonstrating that the CK metrics have little or no influence on the generation time of the test data and on the random search algorithm . However, the CK metrics showed a medium correlation and influence in the selection of the genetic algorithm, thus participating in its selection by the algorithm naive bayes
Lee, Jun won. "Relationships Among Learning Algorithms and Tasks." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2478.
Full textCheeti, Srilaxmi. "Cross-domain sentiment classification using grams derived from syntax trees and an adapted naive Bayes approach." Thesis, Kansas State University, 2012. http://hdl.handle.net/2097/13733.
Full textDepartment of Computing and Information Sciences
Doina Caragea
There is an increasing amount of user-generated information in online documents, includ- ing user opinions on various topics and products such as movies, DVDs, kitchen appliances, etc. To make use of such opinions, it is useful to identify the polarity of the opinion, in other words, to perform sentiment classification. The goal of sentiment classification is to classify a given text/document as either positive, negative or neutral based on the words present in the document. Supervised learning approaches have been successfully used for sentiment classification in domains that are rich in labeled data. Some of these approaches make use of features such as unigrams, bigrams, sentiment words, adjective words, syntax trees (or variations of trees obtained using pruning strategies), etc. However, for some domains the amount of labeled data can be relatively small and we cannot train an accurate classifier using the supervised learning approach. Therefore, it is useful to study domain adaptation techniques that can transfer knowledge from a source domain that has labeled data to a target domain that has little or no labeled data, but a large amount of unlabeled data. We address this problem in the context of product reviews, specifically reviews of movies, DVDs and kitchen appliances. Our approach uses an Adapted Naive Bayes classifier (ANB) on top of the Expectation Maximization (EM) algorithm to predict the sentiment of a sentence. We use grams derived from complete syntax trees or from syntax subtrees as features, when training the ANB classifier. More precisely, we extract grams from syntax trees correspond- ing to sentences in either the source or target domains. To be able to transfer knowledge from source to target, we identify generalized features (grams) using the frequently co-occurring entropy (FCE) method, and represent the source instances using these generalized features. The target instances are represented with all grams occurring in the target, or with a reduced grams set obtained by removing infrequent grams. We experiment with different types of grams in a supervised framework in order to identify the most predictive types of gram, and further use those grams in the domain adaptation framework. Experimental results on several cross-domains task show that domain adaptation approaches that combine source and target data (small amount of labeled and some unlabeled data) can help learn classifiers for the target that are better than those learned from the labeled target data alone.
Lindberg, Jennifer, and Henrik Siren. "An implementation analysis of a machine learning algorithm on eye-tracking data in order to detect early signs of dementia." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281298.
Full textDenna studie syftar till att undersöka huruvida det är möjligt att använda en maskininlärningsalgoritm på eye-tracking data för att upptäcka tidiga tecken på Alzheimer’s sjukdom, vilket är en typ av demens. Tidiga tecken på Alzheimer’s karaktäriseras av mild kognitiv nedsättning. Vidare fixerar patienter med en mild kognitiv nedsättning mer när de läser. Eye-tracking data samlas in i undersökningar genomförda av specialistläkare på ett sjukhus, där 24 patienter läser en text. Därefter förbehandlas datan genom att extrahera olika features, såsom fixeringar och svårighetsnivåer på specifika avsnitt i texten. Efter detta appliceras features i en naïve Bayes maskininlärningsalgoritm som implementerar så kallad leave-one- out cross validation under två separata fall; användande av enbart fixerings features samt användandet av både fixerings features och features för svårighetsgrad för olika avsnitt i texten. Slutligen erhölls samma resultat i båda fallen – med en accuracy på 64%. Därav drogs slutsatsen att även om mängden data (antalet patienter) var liten, kunde maskininlärningsalgoritmen till viss del förutse om en patient var i ett tidigt stadie av Alzheimer’s sjukdom eller inte, baserat på eye-tracking data. Dessutom analyseras implementationen vidare med användning av en intressentanalys, en SWOT-analys och från ett innovationsperspektiv.
Wedenberg, Kim, and Alexander Sjöberg. "Online inference of topics : Implementation of the topic model Latent Dirichlet Allocation using an online variational bayes inference algorithm to sort news articles." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-222429.
Full textJaradat, Shatha. "OLLDA: Dynamic and Scalable Topic Modelling for Twitter : AN ONLINE SUPERVISED LATENT DIRICHLET ALLOCATION ALGORITHM." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177535.
Full textTillhandahålla högkvalitativa ämnen slutsats i dagens stora och dynamiska korpusar, såsom Twitter, är en utmanande uppgift. Detta är särskilt utmanande med tanke på att innehållet i den här miljön innehåller korta texter och många förkortningar. Projektet föreslår en förbättring med en populär online ämnen modellering algoritm för Latent Dirichlet Tilldelning (LDA), genom att införliva tillsyn för att göra den lämplig för Twitter sammanhang. Denna förbättring motiveras av behovet av en enda algoritm som uppnår båda målen: analysera stora mängder av dokument, inklusive nya dokument som anländer i en bäck, och samtidigt uppnå hög kvalitet på ämnen "upptäckt i speciella fall miljöer, till exempel som Twitter. Den föreslagna algoritmen är en kombination av en online-algoritm för LDA och en övervakad variant av LDA - Labeled LDA. Prestanda och kvalitet av den föreslagna algoritmen jämförs med dessa två algoritmer. Resultaten visar att den föreslagna algoritmen har visat bättre prestanda och kvalitet i jämförelse med den övervakade varianten av LDA, och det uppnådde bättre resultat i fråga om kvalitet i jämförelse med den online-algoritmen. Dessa förbättringar gör vår algoritm till ett attraktivt alternativ när de tillämpas på dynamiska miljöer, som Twitter. En miljö för att analysera och märkning uppgifter är utformad för att förbereda dataset innan du utför experimenten. Möjliga användningsområden för den föreslagna algoritmen är tweets rekommendation och trender upptäckt.
Mathema, Najma. "Predicting Plans and Actions in Two-Player Repeated Games." BYU ScholarsArchive, 2020. https://scholarsarchive.byu.edu/etd/8683.
Full textNieuwenhoff, Nathalia. "Uma comparação da aplicação de métodos computacionais de classificação de dados aplicados ao consumo de cinema no Brasil." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-01062017-085136/.
Full textMachine learning techniques for data classification or categorization are increasingly being used for extracting information or patterns from volumous databases in various application areas. Simultaneously, the application of these computational methods to identify patterns, as well as data classification related to the consumption of information goods is considered a complex task, since such decision consumption paterns are related to the preferences of individuals and depend on a composition of individual characteristics, cultural, economic and social variables segregated and grouped, as well as being not a topic explored in the Brazilian market. In this context, this study performed an experimental study of application of the Knowledge Discovery (KDD) process, which includes data selection and data mining steps, for a binary classification problem, Brazilian individuals who consume and do not consume a information good, film at theaters in Brazil, from the microdata obtained from the Brazilian Household Budget Survey (POF), 2008-2009, performed by the Brazilian Institute of Geography and Statistics (IBGE). The experimental study resulted in a comparative analysis of the application of two machine-learning techniques for data classification, based on supervised learning, such as Naïve Bayes (NB) and Support Vector Machine (SVM). Initially, a systematic review with the objective of identifying studies related to the application of computational techniques of machine learning to classification and identification of consumption patterns indicates that the use of these techniques in this context is not a mature and developed research topic, since was not studied in any of the papers analyzed. The results obtained from the comparative analysis performed between the algorithms suggest that the choice of the machine learning algorithms for data classification is directly related to factors such as: (i) importance of the classes for the problem to be studied; (ii) balancing between classes; (iii) universe of attributes to be considered in relation to the quantity and degree of importance of these to the classifiers. In addition, the attributes selected by the Information Gain variable selection algorithm suggest that the decision to consume culture, more specifically information good, film at theaters, is directly related to aspects of individuals regarding income, educational level, as well as preferences for cultural goods
Mauuary, Didier. "Détection, estimation et identification pour la tomographie acoustique océanique : étude théorique et expérimentale." Grenoble INPG, 1994. http://www.theses.fr/1994INPG0033.
Full textBorgos, Hilde Grude. "Stochastic Modeling and Statistical Inference of Geological Fault Populations and Patterns." Doctoral thesis, Norwegian University of Science and Technology, Department of Mathematical Sciences, 2000. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-503.
Full textThe focus of this work is on faults, and the main issue is statistical analysis and stochastic modeling of faults and fault patterns in petroleum reservoirs. The thesis consists of Part I-V and Appendix A-C. The units can be read independently. Part III is written for a geophysical audience, and the topic of this part is fault and fracture size-frequency distributions. The remaining parts are written for a statistical audience, but can also be read by people with an interest in quantitative geology. The topic of Part I and II is statistical model choice for fault size distributions, with a samling algorithm for estimating Bayes factor. Part IV describes work on spatial modeling of fault geometry, and Part V is a short note on line partitioning. Part I, II and III constitute the main part of the thesis. The appendices are conference abstracts and papers based on Part I and IV.
Paper III: reprinted with kind permission of the American Geophysical Union. An edited version of this paper was published by AGU. Copyright [2000] American Geophysical Union
Petřík, Patrik. "Predikce vývoje akciového trhu prostřednictvím technické a psychologické analýzy." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2010. http://www.nusl.cz/ntk/nusl-222507.
Full textSengupta, Aritra. "Empirical Hierarchical Modeling and Predictive Inference for Big, Spatial, Discrete, and Continuous Data." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1350660056.
Full textLacasse, Alexandre. "Bornes PAC-Bayes et algorithmes d'apprentissage." Thesis, Université Laval, 2010. http://www.theses.ulaval.ca/2010/27635/27635.pdf.
Full textThe main purpose of this thesis is the theoretical study and the design of learning algorithms returning majority-vote classifiers. In particular, we present a PAC-Bayes theorem allowing us to bound the variance of the Gibbs’ loss (not only its expectation). We deduce from this theorem a bound on the risk of a majority vote tighter than the famous bound based on the Gibbs’ risk. We also present a theorem that allows to bound the risk associated with general loss functions. From this theorem, we design learning algorithms building weighted majority vote classifiers minimizing a bound on the risk associated with the following loss functions : linear, quadratic and exponential. Also, we present algorithms based on the randomized majority vote. Some of these algorithms compare favorably with AdaBoost.
Vau, Bernard. "Algorithmes d’identification par régression pseudo-linéaire avec prédicteurs paramétrisés sur des bases généralisées de fonctions de transfert orthonormales." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLN062.
Full textThis thesis deals with identification of linear time invariant systems described by discrete-time transfer functions. For a given order, contrary to identification methods minimizing explicitly the prediction error variance, algorithms based on pseudo-linear regression produce models with a bias distribution dependent on the predictor parametrization. This has been demonstrated by the innovating concept of equivalent prediction error, a signal in general non-measurable, whose variance is effectively minimized by the pseudo-linear regression.In a second step, revisited versions of recursive algorithms are proposed (Output Error, extended least squares, and their equivalents in closed-loop), whose predictors are expressed on generalized bases of transfer functions introduced by Heuberger et al. in the 1990s and 2000s. The selection of the basis poles is equivalent to define the reproducing kernel of the Hilbert space associated to these functions, and to impose how approximation is achieved by the algorithms. A particular expression of this reproducing kernel is employed to introduce an indicator of the basis poles effect on the model fit in the frequency domain. This indicator plays a great role from a heuristic point of view.At last, a validation test in accordance with these algorithms is proposed. Its statistical properties are given. This set of algorithms provides to the user some simple tuning parameters (the basis poles) that can be selected in function of the implicit purpose assigned to the identification procedure. Obtaining reduced order models is made easier, while identification of stiff systems –impossible until now in discrete-time- becomes accessible
Chang, Eun R. "Grobner Bases and Ideals of Points." VCU Scholars Compass, 2006. http://scholarscompass.vcu.edu/etd/1214.
Full textRobert, Valérie. "Classification croisée pour l'analyse de bases de données de grandes dimensions de pharmacovigilance." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS111/document.
Full textThis thesis gathers methodological contributions to the statistical analysis of large datasets in pharmacovigilance. The pharmacovigilance datasets produce sparse and large matrices and these two characteritics are the main statistical challenges for modelling them. The first part of the thesis is dedicated to the coclustering of the pharmacovigilance contingency table thanks to the normalized Poisson latent block model. The objective is on the one hand, to provide pharmacologists with some interesting and reduced areas to explore more precisely. On the other hand, this coclustering remains a useful background information for dealing with individual database. Within this framework, a parameter estimation procedure for this model is detailed and objective model selection criteria are developed to choose the best fit model. Datasets are so large that we propose a procedure to explore the model space in coclustering, in a non exhaustive way but a relevant one. Additionnally, to assess the performances of the methods, a convenient coclustering index is developed to compare partitions with high numbers of clusters. The developments of these statistical tools are not specific to pharmacovigilance and can be used for any coclustering issue. The second part of the thesis is devoted to the statistical analysis of the large individual data, which are more numerous but also provides even more valuable information. The aim is to produce individual clusters according their drug profiles and subgroups of drugs and adverse effects with possible links, which overcomes the coprescription and masking phenomenons, common contingency table issues in pharmacovigilance. Moreover, the interaction between several adverse effects is taken into account. For this purpose, we propose a new model, the multiple latent block model which enables to cocluster two binary tables by imposing the same row ranking. Assertions inherent to the model are discussed and sufficient identifiability conditions for the model are presented. Then a parameter estimation algorithm is studied and objective model selection criteria are developed. Moreover, a numeric simulation model of the individual data is proposed to compare existing methods and study its limits. Finally, the proposed methodology to deal with individual pharmacovigilance data is presented and applied to a sample of the French pharmacovigilance database between 2002 and 2010
Savulionienė, Loreta. "Association rules search in large data bases." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2014~D_20140519_102242-45613.
Full textInformacinių technologijų įtaka neatsiejama nuo šiuolaikinio gyvenimo. Bet kokia veiklos sritis yra susijusi su informacijos, duomenų kaupimu, saugojimu. Šiandien nebepakanka tradicinio duomenų apdorojimo bei įvairių ataskaitų formavimo. Duomenų tyrybos technologijų taikymas leidžia iš turimų duomenų išgauti naujus faktus ar žinias, kurios leidžia prognozuoti veiklą, pavyzdžiui, pirkėjų elgesį ar finansines tendencijas, diagnozuoti ligas ir pan. Disertacijoje nagrinėjami duomenų tyrybos algoritmai dažniems posekiams ir susietumo taisyklėms nustatyti. Disertacijoje sukurtas naujas stochastinis dažnų posekių paieškos algoritmas, jo modifikacijos SDPA1, SDPA2 ir stochastinis susietumo taisyklių nustatymo algoritmas bei pateiktas šių algoritmų paklaidų įvertinimas. Šie algoritmai yra apytiksliai, tačiau leidžia suderinti du svarbius kriterijus laiką ir tikslumą. Šie algoritmai buvo testuojami naudojant realias bei imitacines duomenų bazes.
Ben, Mohamed Khalil. "Traitement de requêtes conjonctives avec négation : algorithmes et expérimentations." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2010. http://tel.archives-ouvertes.fr/tel-00563217.
Full textSTEINBRUCH, DAVID. "A STUDY OF MULTILABEL TEXT CLASSIFICATION ALGORITHMS USING NAIVE-BAYES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2006. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=9637@1.
Full textA quantidade de informação eletrônica vem crescendo de forma acelerada, motivada principalmente pela facilidade de publicação e divulgação que a Internet proporciona. Desta forma, é necessária a organização da informação de forma a facilitar a sua aquisição. Muitos trabalhos propuseram resolver este problema através da classificação automática de textos associando a eles vários rótulos (classificação multirótulo). No entanto, estes trabalhos transformam este problema em subproblemas de classificação binária, considerando que existe independência entre as categorias. Além disso, utilizam limiares (thresholds), que são muito específicos para o conjunto de treinamento utilizado, não possuindo grande capacidade de generalização na aprendizagem. Esta dissertação propõe dois algoritmos de classificação automática de textos baseados no algoritmo multinomial naive Bayes e sua utilização em um ambiente on-line de classificação automática de textos com realimentação de relevância pelo usuário. Para testar a eficiência dos algoritmos propostos, foram realizados experimentos na base de notícias Reuters 21758 e na base de documentos médicos Ohsumed.
The amount of electronic information has been growing fast, mainly due to the easiness of publication and spreading that Internet provides. Therefore, is necessary the organisation of information to facilitate its retrieval. Many works have solved this problem through the automatic text classification, associating to them several labels (multilabel classification). However, those works have transformed this problem into binary classification subproblems, considering there is not dependence among categories. Moreover, they have used thresholds, which are very sepecific of the classifier document base, and so, does not have great generalization capacity in the learning process. This thesis proposes two text classifiers based on the multinomial algorithm naive Bayes and its usage in an on-line text classification environment with user relevance feedback. In order to test the proposed algorithms efficiency, experiments have been performed on the Reuters 21578 news base, and on the Ohsumed medical document base.
Dunn, Will-Matthis III. "Algorithms and applications of comprehensive Groebner bases." Diss., The University of Arizona, 1995. http://hdl.handle.net/10150/187274.
Full textEnkosky, Thomas. "Grobner Bases and an Algorithm to Find the Monomials of an Ideal." Fogler Library, University of Maine, 2004. http://www.library.umaine.edu/theses/pdf/EnkoskyT2004.pdf.
Full textGermain, Pascal. "Algorithmes d'apprentissage automatique inspirés de la théorie PAC-Bayes." Thesis, Université Laval, 2009. http://www.theses.ulaval.ca/2009/26191/26191.pdf.
Full textAt first, this master thesis presents a general PAC-Bayes theorem, from which we can easily obtain some well-known PAC-Bayes bounds. Those bounds allow us to compute a guarantee on the risk of a classifier from its achievements on the training set. We analyze the behavior of two PAC-Bayes bounds and we determine peculiar characteristics of classifiers favoured by those bounds. Then, we present a specialization of those bounds to the linear classifiers family. Secondly, we conceive three new machine learning algorithms based on the minimization, by conjugate gradient descent, of various mathematical expressions of the PAC-Bayes bounds. The last algorithm uses a part of the training set to capture a priori knowledges. One can use those algorithms to construct majority vote classifiers as well as linear classifiers implicitly represented by the kernel trick. Finally, an elaborated empirical study compares the three algorithms and shows that some versions of those algorithms are competitive with both AdaBoost and SVM.
Inscrit au Tableau d'honneur de la Faculté des études supérieures
Cabarcas, Daniel. "An Implementation of Faugère's F4 Algorithm for Computing Gröbner Bases." University of Cincinnati / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1277120935.
Full textDieng, Cheikh Tidiane. "Etude et implantation de l'extraction de requetes frequentes dans les bases de donnees multidimensionnelles." Thesis, Cergy-Pontoise, 2011. http://www.theses.fr/2011CERG0530.
Full textThe problem of mining frequent queries in a database has motivated many research efforts during the last two decades. This is so because many interesting patterns, such as association rules, exact or approximative functional dependencies and exact or approximative conditional functional dependencies can be easily retrieved, which is not possible using standard techniques.However, the problem mining frequent queries in a relational database is not easy because, on the one hand, the size of the search space is huge (because encompassing all possible queries that can be addressed to a given database), and on the other hand, testing whether two queries are equivalent (which entails redundant support computations) is NP-Complete.In this thesis, we focus on projection-selection-join queries, assuming that the database is defined over a star schema. In this setting, we define a pre-ordering (≼) between queries and we prove the following basic properties:1. The support measure is anti-monotonic with respect to ≼, and2. Defining q ≡ q′ if and only if q ≼ q′ and q′ ≼ q, all equivalent queries have the same support.The main contributions of the thesis are, on the one hand to formally sudy properties of the pre-ordering and the equivalence relation mentioned above, and on the other hand, to prose a levewise, Apriori like algorithm for the computation of all frequent queries in a relational database defined over a star schema. Moreover, this algorithm has been implemented and the reported experiments show that, in our approach, runtime is acceptable, even in the case of large fact tables
Keller, Benjamin J. "Algorithms and Orders for Finding Noncummutative Gröbner Bases." Diss., Virginia Tech, 1997. http://hdl.handle.net/10919/30506.
Full textPh. D.
Larrieu, Robin. "Arithmétique rapide pour des corps finis." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLX073/document.
Full textThe multiplication of polynomials is a fundamental operation in complexity theory. Indeed, for many arithmetic problems, the complexity of algorithms is expressed in terms of the complexity of polynomial multiplication. For example, the complexity of euclidean division or of multi-point evaluation/interpolation (and others) is often expressed in terms of the complexity of polynomial multiplication. This shows that a better multiplication algorithm allows to perform the corresponding operations faster. A 2016 result gave an improved asymptotic complexity for the multiplication of polynomials over finite fields. This article is the starting point of the thesis; the present work aims to study the implications of the new complexity bound, from a theoretical and practical point of view.The first part focuses on the multiplication of univariate polynomials. This part presents two new algorithms that should make the computation faster in practice (rather than asymptotically speaking). While it is difficult in general to observe the predicted speed-up, some specific cases are particularly favorable. Actually, the second proposed algorithm, which is specific to finite fields, leads to a better implementation for the multiplication in F 2 [X], about twice as fast as state-of-the-art software.The second part deals with the arithmetic of multivariate polynomials modulo an ideal, as considered typically for polynomial system solving. This work assumes a simplified situation, with only two variables and under certain regularity assumptions. In this particular case, there are algorithms whose complexity is asymptotically optimal (up to logarithmic factors), with respect to input/output size. The implementation for this specific case is then significantly faster than general-purpose software, the speed-up becoming larger and larger when the input size increases
Gillet, Noel. "Optimisation de requêtes sur des données massives dans un environnement distribué." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0553/document.
Full textDistributed data store are massively used in the actual context of Big Data. In addition to provide data management features, those systems have to deal with an increasing amount of queries sent by distant users in order to process data mining or data visualization operations. One of the main challenge is to evenly distribute the workload of queries between the nodes which compose these system in order to minimize the treatment time. In this thesis, we tackle the problem of query allocation in a distributed environment. We consider that data are replicated and a query can be handle only by a node storing the concerning data. First, near-optimal algorithmic proposals are given when communications between nodes are asynchronous. We also consider that some nodes can be faulty. Second, we study more deeply the impact of data replication on the query treatement. Particularly, we present an algorithm which manage the data replication based on the demand on these data. Combined with our allocation algorithm, we guaranty a near-optimal allocation. Finally, we focus on the impact of data replication when queries are received as a stream by the system. We make an experimental evaluation using the distributed database Apache Cassandra. The experiments confirm the interest of our algorithmic proposals to improve the query treatement compared to the native allocation scheme in Cassandra
Collobert, Ronan. "Algorithmes d'Apprentissage pour grandes bases de données." Paris 6, 2004. http://www.theses.fr/2004PA066063.
Full textBender, Matias Rafael. "Algorithms for sparse polynomial systems : Gröbner bases and resultants." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS029.
Full textSolving polynomial systems is one of the oldest and most important problems in computational mathematics and has many applications in several domains of science and engineering. It is an intrinsically hard problem with complexity at least single exponential in the number of variables. However, in most of the cases, the polynomial systems coming from applications have some kind of structure. In this thesis we focus on exploiting the structure related to the sparsity of the supports of the polynomials; that is, we exploit the fact that the polynomials only have a few monomials with non-zero coefficients. Our objective is to solve the systems faster than the worst case estimates that assume that all the terms are present. We say that a sparse system is unmixed if all its polynomials have the same Newton polytope, and mixed otherwise. Most of the work on solving sparse systems concern the unmixed case, with the exceptions of mixed sparse resultants and homotopy methods. In this thesis, we develop algorithms for mixed systems. We use two prominent tools in nonlinear algebra: sparse resultants and Groebner bases. We work on each theory independently, but we also combine them to introduce new algorithms: we take advantage of the algebraic properties of the systems associated to a non-vanishing resultant to improve the complexity of computing their Groebner bases; for example, we exploit the exactness of some strands of the associated Koszul complex to deduce an early stopping criterion for our Groebner bases algorithms and to avoid every redundant computation (reductions to zero). In addition, we introduce quasi-optimal algorithms to decompose binary forms
O'Brien, Killian Michael. "Seifert's algorithm, Châtelet bases and the Alexander ideals of classical knots." Thesis, Durham University, 2002. http://etheses.dur.ac.uk/4192/.
Full textShanian, Sara. "Sample Compressed PAC-Bayesian Bounds and Learning Algorithms." Thesis, Université Laval, 2012. http://www.theses.ulaval.ca/2012/29037/29037.pdf.
Full textIn classification, sample compression algorithms are the algorithms that make use of the available training data to construct the set of possible predictors. If the data belongs to only a small subspace of the space of all "possible" data, such algorithms have the interesting ability of considering only the predictors that distinguish examples in our areas of interest. This is in contrast with non sample compressed algorithms which have to consider the set of predictors before seeing the training data. The Support Vector Machine (SVM) is a very successful learning algorithm that can be considered as a sample-compression learning algorithm. Despite its success, the SVM is currently limited by the fact that its similarity function must be a symmetric positive semi-definite kernel. This limitation by design makes SVM hardly applicable for the cases where one would like to be able to use any similarity measure of input example. PAC-Bayesian theory has been shown to be a good starting point for designing learning algorithms. In this thesis, we propose a PAC-Bayes sample-compression approach to kernel methods that can accommodate any bounded similarity function. We show that the support vector classifier is actually a particular case of sample-compressed classifiers known as majority votes of sample-compressed classifiers. We propose two different groups of PAC-Bayesian risk bounds for majority votes of sample-compressed classifiers. The first group of proposed bounds depends on the KL divergence between the prior and the posterior over the set of sample-compressed classifiers. The second group of proposed bounds has the unusual property of having no KL divergence when the posterior is aligned with the prior in some precise way that we define later in this thesis. Finally, for each bound, we provide a new learning algorithm that consists of finding the predictor that minimizes the bound. The computation times of these algorithms are comparable with algorithms like the SVM. We also empirically show that the proposed algorithms are very competitive with the SVM.
Bost, Raphaël. "Algorithmes de recherche sur bases de données chiffrées." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1S001/document.
Full textSearchable encryption aims at making efficient a seemingly easy task: outsourcing the storage of a database to an untrusted server, while keeping search features. With the development of Cloud storage services, for both private individuals and businesses, efficiency of searchable encryption became crucial: inefficient constructions would not be deployed on a large scale because they would not be usable. The key problem with searchable encryption is that any construction achieving ''perfect security'' induces a computational or a communicational overhead that is unacceptable for the providers or for the users --- at least with current techniques and by today's standards. This thesis proposes and studies new security notions and new constructions of searchable encryption, aiming at making it more efficient and more secure. In particular, we start by considering the forward and backward privacy of searchable encryption schemes, what it implies in terms of security and efficiency, and how we can realize them. Then, we show how to protect an encrypted database user against active attacks by the Cloud provider, and that such protections have an inherent efficiency cost. Finally, we take a look at existing attacks against searchable encryption, and explain how we might thwart them
Park, Chee-Hang. "Algorithmes de jointure parallele et n-aire : application aux reseaux locaux et aux machines bases de donnees multiprocesseurs." Paris 6, 1987. http://www.theses.fr/1987PA066569.
Full textVial-Montpellier, Pierre. "Sur la construction des bases d'ondelettes." Aix-Marseille 3, 1993. http://www.theses.fr/1993AIX30063.
Full textSvärd, Mikael, and Philip Rumman. "COMBATING DISINFORMATION : Detecting fake news with linguistic models and classification algorithms." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209755.
Full textSyftet med denna studie är att undersöka möjligheten att på ett pålitligt sättskilja mellan fabricerade och autentiska nyheter med hjälp av Naive bayesalgoritmer,detta involverar en komparativ studie mellan två olika typer avalgoritmer. Arbetet innehåller även en översikt över hur lingvistisk textanalyskan användas för detektion och ett försök gjordes att extrahera information medhjälp av ordfrekvenser. Det förs även en diskussion kring hur de olika aktörernaoch parterna inom näringsliv och regeringar påverkas av och hur de hanterarbedrägeri kopplat till falska nyheter. Studien försöker vidare undersöka vilkasteg som kan tas mot en fungerande lösning för att motarbeta falska nyheter. Algoritmernagav i slutändan otillfredställande resultat och ordfrekvenserna kundeinte ensamma ge nog med information. De tycktes dock potentiellt användbarasom en del i ett större maskineri av algoritmer och strategier ämnade att hanteradesinformation.
Linfoot, Andy James. "A Case Study of A Multithreaded Buchberger Normal Form Algorithm." Diss., The University of Arizona, 2006. http://hdl.handle.net/10150/305141.
Full textWinkleblack, Scott Kenneth swinkleb. "ReGen: Optimizing Genetic Selection Algorithms for Heterogeneous Computing." DigitalCommons@CalPoly, 2014. https://digitalcommons.calpoly.edu/theses/1236.
Full textBrum, Vinicius Campista. "Análise de agrupamento e estabilidade para aquisição e validação de conhecimento em bases de dados de alta dimensionalidade." Universidade Federal de Juiz de Fora (UFJF), 2015. https://repositorio.ufjf.br/jspui/handle/ufjf/4826.
Full textApproved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-06-06T14:06:19Z (GMT) No. of bitstreams: 1 viniciuscampistabrum.pdf: 846002 bytes, checksum: 5ac93812c3739c70741f6052b77b22c8 (MD5)
Made available in DSpace on 2017-06-06T14:06:19Z (GMT). No. of bitstreams: 1 viniciuscampistabrum.pdf: 846002 bytes, checksum: 5ac93812c3739c70741f6052b77b22c8 (MD5) Previous issue date: 2015-08-28
CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Análise de agrupamento é uma tarefa descritiva e não-supervisionada de mineração de dados que utiliza amostras não-rotuladas com o objetivo de encontrar grupos naturais, isto é, grupos de amostras fortemente relacionadas de forma que as amostras que per-tençam a um mesmo grupo sejam mais similares entre si do que amostras em qualquer outro grupo. Avaliação ou validação é considerada uma tarefa essencial dentro da análise de agrupamento. Essa tarefa apresenta técnicas que podem ser divididas em dois tipos: técnicas não-supervisionadas ou de validação interna e técnicas supervisionadas ou de va-lidação externa. Trabalhos recentes introduziram uma abordagem de validação interna que busca avaliar e melhorar a estabilidade do algoritmo de agrupamento por meio de identificação e remoção de amostras que são consideradas prejudiciais e, portanto, de-veriam ser estudadas isoladamente. Por meio de experimentos foi identificado que essa abordagem apresenta características indesejáveis que podem resultar em remoção de todo um grupo e ainda não garante melhoria de estabilidade. Considerando essas questões, neste trabalho foi desenvolvida uma abordagem mais ampla utilizando algoritmo genético para análise de agrupamento e estabilidade de dados. Essa abordagem busca garantir melhoria de estabilidade, reduzir o número de amostras para remoção e permitir que o usuário controle o processo de análise de estabilidade, o que resulta em maior aplicabi-lidade e confiabilidade para tal processo. A abordagem proposta foi avaliada utilizando diferentes algoritmos de agrupamento e diferentes bases de dados, sendo que uma base de dados genotípicos também foi utilizada com o intuito de aquisição e validação de conhe-cimento. Os resultados mostram que a abordagem proposta é capaz de garantir melhoria de estabilidade e também é capaz de reduzir o número de amostras para remoção. Os resultados também sugerem a utilização da abordagem como uma ferramenta promissora para aquisição e validação de conhecimento em estudos de associação ampla do genoma (GWAS). Este trabalho apresenta uma abordagem que contribui para aquisição e valida-ção de conhecimento por meio de análise de agrupamento e estabilidade de dados.
Clustering analysis is a descriptive and unsupervised data mining task, which uses non-labeled samples in order to find natural groups, i.e. groups of closely related samples such that samples within the same cluster are more similar than samples within the other clusters. Evaluation and validation are considered essential tasks within the clustering analysis. These tasks present techniques that can be divided into two kinds: unsuper-vised or internal validation techniques and supervised or external validation techniques. Recent works introduced an internal clustering validation approach to evaluate and im-prove the clustering algorithm stability through identifying and removing samples that are considered harmful and therefore they should be studied separately. Through experi-mentation, it was identified that this approach has two undesirable characteristics, it can remove an entire cluster from dataset and still decrease clustering stability. Taking into account these issues, in this work a broader approach was developed using genetic algo-rithm for clustering and data stability analysis. This approach aims to increase stability, to reduce the number of samples for removal and to allow the user control the stability analysis process, which gives greater applicability and reliability for such process. This approach was evaluated using different kinds of clustering algorithm and datasets. A genotype dataset was also used in order to knowledge acquisition and validation. The results show the approach proposed in this work is able to increase stability, and it is also able to reduce the number of samples for removal. The results also suggest the use of this approach as a promising tool for knowledge acquisition and validation on genome-wide association studies (GWAS). This work presents an approach that contributes for knowledge acquisition and validation through clustering and data stability analysis.
Mahfoudi, Abdelwahab. "Contribution a l'algorithmique pour l'analyse des bases de données statistiques hétérogènes." Dijon, 1995. http://www.theses.fr/1995DIJOS009.
Full textCabarcas, Daniel. "Gröbner Bases Computation and Mutant Polynomials." University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1307321300.
Full textDjenderedjian, Ivan. "Bases de Gröbner minimales sur un anneau principal." Aix-Marseille 1, 2000. http://www.theses.fr/2000AIX11040.
Full textMedeiros, Camila Alves de [UNESP]. "Extração de conhecimento em bases de dados espaciais: algoritmo CHSMST+." Universidade Estadual Paulista (UNESP), 2014. http://hdl.handle.net/11449/110986.
Full textO desenvolvimento de tecnologias de coleta de informações espaciais resultou no armazenamento de um grande volume de dados, que devido à complexidade dos dados e dos respectivos relacionamentos torna-se impraticável a aplicação de técnicas tradicionais para prospecção em bases de dados espaciais. Nesse sentido, diversos algoritmos vêm sendo propostos, sendo que os algoritmos de agrupamento de dados espaciais são os que mais se destacam devido a sua alta aplicabilidade em diversas áreas. No entanto, tais algoritmos ainda necessitam superar vários desafios para que encontrem resultados satisfatórios em tempo hábil. Com o propósito de contribuir neste sentido, neste trabalho é apresentado um novo algoritmo, denominado CHSMST+, que realiza o agrupamento de dados considerando tanto a distância quanto a similaridade, o que possibilita correlacionar atributos espaciais e não espaciais. Tais tarefas são realizadas sem parâmetros de entrada e interação com usuário, o que elimina a dependência da interpretação do usuário para geração dos agrupamentos, bem como possibilita a obtenção de agrupamentos mais eficientes uma vez que os cálculos realizados pelo algoritmo são mais precisos que uma análise visual dos mesmos. Além destas técnicas, é utilizada a abordagem multithreading, que possibilitou uma redução média de 38,52% no tempo de processamento. O algoritmo CHSMST+ foi aplicado em bases de dados espaciais da área da saúde e meio ambiente, mostrando a capacidade de utilizá-lo em diferentes contextos, o que torna ainda mais relevante o trabalho realizado
The development of technologies for collecting spatial information has resulted in a large volume of stored data, which makes inappropriate the use of conventional data mining techniques for knowledge extraction in spatial databases, due to the high complexity of these data and its relationships. Therefore, several algorithms have been proposed, and the spatial clustering ones stand out due to their high applicability in many fields. However, these algorithms still need to overcome many challenges to reach satisfactory results in a timely manner. In this work, we present a new algorithm, namely CHSMST+, which works with spatial clustering considering both distance and similarity, allowing to correlate spatial and non-spatial attributes. These tasks are performed without input parameters and user interaction, eliminating the dependence of the user interpretation for cluster generation and enabling the achievement of cluster in a more efficient way, since the calculations performed by the algorithm are more accurate than visual analysis of them. Together with these techniques, we use a multithreading approach, which allowed an average reduction of 38,52% in processing time. The CHSMST+ algorithm was applied in spatial databases of health and environment, showing the ability to apply it in different contexts, which makes this work even more relevant