Dissertations / Theses on the topic 'Knowledge Discovery in Databases (KDD)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Knowledge Discovery in Databases (KDD).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Storti, Emanuele. "KDD process design in collaborative and distributed environments." Doctoral thesis, Università Politecnica delle Marche, 2012. http://hdl.handle.net/11566/242061.
Full textKnowledge Discovery in Databases (KDD), as well as scientific experimentation in e-Science, is a complex and computationally intensive process aimed at gaining knowledge from a huge set of data. Often performed in distributed settings, KDD projects usually involve a deep interaction among heterogeneous tools and several users with specific expertise. Given the high complexity of the process, such users need effective support to achieve their goal of knowledge extraction. This work presents the Knowledge Discovery in Database Virtual Mart (KDDVM), a user- and knowledge-centric framework aimed at supporting the design of KDD processes in a highly distributed and collaborative scenario, in which computational resources and actors dynamically interoperate to share and elaborate knowledge. The contribution of the work is two-fold: firstly, a conceptual systematization of the relevant knowledge is provided, with the aim to formalize, through semantic technologies, each element taking part in the design and execution of a KDD process, including computational resources, data and actors; secondly, we propose an implementation of the framework as an open, modular and extensible Service-Oriented platform, in which several services are available both to perform basic operations of data manipulations and to support more advanced functionalities. Among them, the management of deployment/activation of computational resources, service discovery and their composition to build KDD processes. Since the cooperative design and execution of a distributed KDD process typically require several skills, both technical and managerial, collaboration can easily become a source of complexity if not supported by any kind of coordination. For such reasons, a set of functionalities of the platform is specifically addressed to support collaboration within a distributed team, by providing an environment in which users can work on the same project and share processes, results and ideas.
Huynh, Xuan-Hiep. "Interestingness Measures for Association Rules in a KDD Process : PostProcessing of Rules with ARQAT Tool." Phd thesis, Université de Nantes, 2006. http://tel.archives-ouvertes.fr/tel-00482649.
Full textRibeiro, Lamark dos Santos. "Uma abordagem semântica para seleção de atributos no processo de KDD." Universidade Federal da Paraíba, 2010. http://tede.biblioteca.ufpb.br:8080/handle/tede/6048.
Full textCoordenação de Aperfeiçoamento de Pessoal de Nível Superior
Currently, two issues of great importance for the computation are being used together in an increasingly apparent: a Knowledge Discovery in Databases (KDD) and Ontologies. By developing the ways in which data is stored, the amount of information available for analysis has increased exponentially, making it necessary techniques to analyze data and gain knowledge for different purposes. In this sense, the KDD process introduces stages that enable the discovery of useful knowledge, and new features that usually cannot be seen only by viewing the data in raw form. In a complementary field, the Knowledge Discovery can be benefited with Ontologies. These, in a sense, have the capacity to store the "knowledge" about certain areas. The knowledge that can be retrieved through inference classes, descriptions, properties and constraints. Phases existing in the process of knowledge discovery, the selection of attributes allows the area of analysis for data mining algorithms can be improved with attributes more relevant to the problem analyzed. But sometimes these screening methods do not eliminate the attributes satisfactorily, do allow a preliminary analysis on the area treated. To address this problem this paper proposes a system that uses ontologies to store the prior knowledge about a specific domain, enabling a semantic analysis previously not possible using conventional methodologies. Was elaborated an ontology, with reuse of various repositories of ontologies available on the Web, specific to the medical field with a possible common specifications in key areas of medicine. To introduce semantics in the selection of attributes is first performed the mapping between data base attributes and classes of the ontology. Done this mapping, the user can now select attributes by semantic categories, reducing the dimensionality of the data and view redundancies between semantically related attributes.
Atualmente, dois temas de grande importância para a computação, estão sendo utilizados conjuntamente de uma forma cada vez mais aparente: a Descoberta de Conhecimento em Bancos de Dados (Knowledge Discovery in Databases KDD) e as Ontologias. Com o aperfeiçoamento das formas com que os dados são armazenados, a quantidade de informação disponível para análise aumentou exponencialmente, tornando necessário técnicas para analisar esses dados e obter conhecimento para os mais diversos propósitos. Nesse contexto, o processo de KDD introduz etapas que possibilitam a descoberta de conhecimentos úteis, novos e com características que geralmente não podiam ser vistas apenas visualizando os dados de forma bruta. Em um campo complementar, a Descoberta de Conhecimento em Banco de Dados pode ser beneficiada com Ontologias. Essas, de certa forma, apresentam a capacidade para armazenar o conhecimento , segundo um modelo de alta expressividade semântica, sobre determinados domínios. As ontologias permitem que o conhecimento seja recuperado através de inferências nas classes, descrições, propriedades e restrições. Nas fases existentes no processo de descoberta do conhecimento, a Seleção de Atributos permite que o espaço de análise para os algoritmos de Mineração de Dados possa ser melhorado com atributos mais relevantes para o problema analisado. Porém, algumas vezes esses métodos de seleção não eliminam de forma satisfatória os atributos irrelevantes, pois não permitem uma análise prévia sobre o domínio tratado. Para tratar esse problema, esse trabalho propõe um sistema que utiliza ontologias para armazenar o conhecimento prévio sobre um domínio específico, possibilitando uma análise semântica antes não viável pelas metodologias convencionais. Foi elaborada uma ontologia, com reuso de diversos repositórios de ontologias disponíveis na Web, específica para o domínio médico e com possíveis especificações comuns nas principais áreas da medicina. Para introduzir semântica no processo de seleção de atributos primeiro é realizado o mapeamento entre os atributos do banco de dados e as classes da ontologia. Feito esse mapeamento, o usuário agora pode selecionar atributos através de categorias semânticas, reduzir a dimensionalidade dos dados e ainda visualizar redundâncias existentes entre atributos correlacionados semanticamente.
Storopoli, José Eduardo. "O uso do Knowledge Discovery in Database (KDD) de informações patentárias sobre ensino a distância: contribuições para instituições de ensino superior." Universidade Nove de Julho, 2016. http://bibliotecatede.uninove.br/handle/tede/1517.
Full textMade available in DSpace on 2016-09-01T19:37:54Z (GMT). No. of bitstreams: 1 José Eduardo Storopoli.pdf: 3248722 bytes, checksum: c6f49ec5728d3ca3b10f36aa03c94865 (MD5) Previous issue date: 2016-04-14
Distance learning (DL) has a long history of success and failures, and has existed for at least since the end of the XVIII century. Higher education DL began in Brazil during 1994, having the expansion of the internet as the main factor. The search of innovations and new models related to the process of DL has become critical, both from the operational and strategic aspect. Regarding those challenges, the available information in patent databases can contrive add to, in an important manner, the design of DL strategies in higher education institutions (HIE), therefore, the thesis’ objective is: to analyze the employment of Knowledge Discovery in Database (KDD) in patent information and its main contributions to DL in HIE. The method employed was the KDD structure to discovery, analysis, selection, pre-processing, filtering, transformation, data mining, interpretation and assessment of patent information data from the European Patent Office’s (EPO) database, composed of 90 million documents. The data collection was based on a sample of patents acquired through enhanced search expressions, by the crawler software Patent2Netv.2. The data of 3.090 patents were analyzed by dynamic tables, network analysis, mindmaps, content analysis and clustering. The main results: (1) provided the diagnosis of patents related to DL in a global perspective; (2) developed a methodology for the use of KDD to analyze the content of DL patent information to HIE; (3) mapping of DL patents from HIE; and, ultimately, (4) assessment the use of patent information in order to formulate strategies on adopting DL in HIE, in the light of the Resouce-based View.
O ensino a distância tem uma longa história de sucessos e fracassos, existe pelo menos desde o final do século XVIII. O ensino superior a distância iniciou no Brasil em meados de 1994, tendo como principal fator a expansão da internet. A busca de inovações e novos modelos relacionados ao processo de ensino a distância (EAD) torna-se importante, tanto do aspecto operacional como estratégico. Face a estes desafios, as informações existentes nos bancos de dados de patentes podem contribuir de forma significativa para a definição de estratégias de EAD em instituições de ensino superior (IES), portanto, o objetivo da tese foi analisar o uso do Knowledge Discovery in Database (KDD) de informações patentárias e suas possíveis contribuições para o EAD em IES. A metodologia utilizada foi a estrutura do KDD para exploração, análise, seleção, pré-processamento, limpeza, transformação, data mining, interpretação e avaliação de dados de informações patentárias sobre EAD da base do European Patent Office (EPO) que possui aproximadamente 90 milhões de documentos. A coleta dos dados utilizou-se o emprego de data mining por meio do software crawler Patent2Netv.2. A amostra de patentes adquiridas com o uso de aprimoradas expressões de busca resultou em 3.090 patentes, que foram analisadas por meio de tabelas dinâmicas, análises de rede, mapas mentais, análise de conteúdo e clustering. Os principais resultados: (1) possibilitaram apresentar o diagnóstico sobre as patentes relacionadas a EAD no mundo; (2) o desenvolvimento de uma metodologia de uso do KDD para análise de conteúdo de informações patentearias em EAD para IES; (3) o mapeamento das patentes em EAD em Universidades; e, finalmente, (4) a avaliação do uso de informações patentárias e sua utilização na definição de estratégias de adoção de EAD em IES, à luz do Visão Baseada em Recursos.
Oliveira, Robson Butaca Taborelli de. "O processo de extração de conhecimento de base de dados apoiado por agentes de software." Universidade de São Paulo, 2000. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-23092001-231242/.
Full textNowadays, commercial and scientific application systems generate huge amounts of data that cannot be easily analyzed without the use of appropriate tools and techniques. A great number of these applications are also based on the Internet which makes it even more difficult to collect data, for instance. The field of Computer Science called Knowledge Discovery in Databases deals with issues of the use and creation of the tools and techniques that allow for the automatic discovery of knowledge from data. Applying these techniques in an Internet environment can be particulary difficult. Thus, new techniques need to be used in order to aid the knowledge discovery process. Software agents are computer programs with properties such as autonomy, reactivity and mobility that can be used in this way. In this context, this work has the main goal of presenting the proposal of a multiagent system, called Minador, aimed at supporting the execution and management of the Knowledge Discovery in Databases process.
Scarinci, Rui Gureghian. "SES : sistema de extração semântica de informações." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 1997. http://hdl.handle.net/10183/18398.
Full textOne of the most challenging area in Computer Science is related to Internet technology. This network offers to the users a large variety and amount of information, mainly, data storage in unstructured or semi-structured formats. However, the vast data volume and heterogeneity transforms the retrieved data manipulation a very arduous work. This problem was the prime motivation of this work. As with many tools for data retrieval and specific searching, the user has to manipulate in his personal computer an increasing amount of information, because these tools do not realize a precise data selection process. Many retrieval data are not interesting for the user. There are, also, a big diversity of subjects and standards in information transmission and storage, creating the most heterogeneous environments in data searching and retrieval. Due to this heterogeneity, the user has to know many data standards and searching tools to obtain the requested information. However, the fundamental problem for data manipulation is the partially or fully unstructured data formats, as text, mail and news data structures. For files in these formats, the user has to read each of the files to filter the relevant information, originating a loss of time, because the document could be not interesting for the user, or if it is interesting, its complete reading may be unnecessary at the moment. Some information as call-for-papers, product prices, economic statistics and others, has associated a temporal validity. Other information are updated periodically. Some of these temporal characteristics are explicit, others are implicitly embedded in other data types. As it is very difficult to retrieve the temporal data automatically, which generate, many times, the use of invalid information, as a result, some opportunities are lost. On this paper a system for extraction and summarizing of data is described. The main objective is to satisfy the user's selection needs and consequently information manipulation stored in a personal computer. To achieve this goal we are employed the concepts of Information Extraction (IE) and Knowledge Based Systems. The input data manipulation is done by an extraction procedure configured by a user who defined knowledge base. The objective of this paper is to develop a System of Semantic Extraction of Information which classifies the data extracted in meaningful classes for the user and to deduce the temporal validity of this data. This goal was achieved by the generation of a structured temporal data base.
Moretti, Caio Benatti. "Análise de grandezas cinemáticas e dinâmicas inerentes à hemiparesia através da descoberta de conhecimento em bases de dados." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/18/18149/tde-13062016-184240/.
Full textAs a result of a higher life expectancy, the high probability of natural accidents and traumas occurences entails an increasing need for rehabilitation. Physical therapy, under the robotic rehabilitation paradigm with serious games, offers the patient better motivation and engagement to the treatment, being a method recommended by American Heart Association (AHA), pointing the highest assessment (Level A) for inpatients and outpatients. However, the rich potential of the data analysis provided by robotic devices is poorly exploited, discarding the opportunity to aggregate valuable information to treatments. The aim of this work consists of applying knowledge discovery techniques by classifying the performance of patients diagnosed with chronic hemiparesis. The patients, inserted into a robotic rehabilitation environment, exercised with the InMotion ARM, a robotic device for upper-limb rehabilitation which also does the collection of performance data. A Knowledge Discovery roadmap was applied over collected data in order to preprocess, transform and perform data mining through machine learning methods. The strategy of this work culminated in a pattern classification with the abilty to distinguish hemiparetic sides with an accuracy rate of 94%, having eight attributes feeding the input of the obtained mechanism. The interpretation of these attributes has shown that force-related data are more significant, comprising half of the composition of a sample.
Schneider, Luís Felipe. "Aplicação do processo de descoberta de conhecimento em dados do poder judiciário do estado do Rio Grande do Sul." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2003. http://hdl.handle.net/10183/8968.
Full textWith the purpose of exploring existing connections among data, a space has been created for the search of Knowledge an useful unknown information based on large sets of stored data. This field was dubbed Knowledge Discovery in Databases (KDD) and it was formalized in 1989. The KDD consists of a process made up of iterative and interactive stages or phases. This work was based on the CRISP-DM methodology. Regardless of the methodology used, this process features a phase that may be considered as the nucleus of KDD, the “data mining” (or modeling according to CRISP-DM) which is associated with the task, as well as the techniques and algorithms that may be employed in an application of KDD. What will be highlighted in this study is affinity grouping and clustering, techniques associated with these tasks and Apriori and K-means algorithms. All this contextualization will be embodied in the selected data mining tool, Weka (Waikato Environment for Knowledge Analysis). The research plan focuses on the application of the KDD process in the Judiciary Power regarding its related activity, court proceedings, seeking findings based on the influence of the procedural classification concerning the incidence of proceedings, the proceduring time, the kind of sentences pronounced and hearing attendance. Also, the search for defendants’ profiles in criminal proceedings such as sex, marital status, education background, professional and race. In chapters 2 and 3, the study presents the theoretical grounds of KDD, explaining the CRISP-DM methodology. Chapter 4 explores all the application preformed in the data of the Judiciary Power, and lastly, in Chapter conclusions are drawn
Yu, Xiaobo. "Knowledge discovery in Internet databases." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/mq30577.pdf.
Full textHoward, Craig M. "Tools and techniques for knowledge discovery." Thesis, University of East Anglia, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368357.
Full textWu, Fei. "Knowledge discovery in time-series databases." Versailles-St Quentin en Yvelines, 2001. http://www.theses.fr/2001VERS0023.
Full textKrogel, Mark-André. "On propositionalization for knowledge discovery in relational databases." [S.l. : s.n.], 2005. http://deposit.ddb.de/cgi-bin/dokserv?idn=976835835.
Full textCorzo, F. A. "Abstraction and structure in knowledge discovery in databases." Thesis, University College London (University of London), 2011. http://discovery.ucl.ac.uk/1126395/.
Full textGrissa, Dhouha. "Etude comportementale des mesures d'intérêt d'extraction de connaissances." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2013. http://tel.archives-ouvertes.fr/tel-01023975.
Full textRydzi, Daniel. "Metodika vývoje a nasazování Business Intelligence v malých a středních podnicích." Doctoral thesis, Vysoká škola ekonomická v Praze, 2005. http://www.nusl.cz/ntk/nusl-77060.
Full textGhoorah, Anisah W. "Extraction de connaissances pour la modélisation tri-dimensionnelle de l'interactome structural." Thesis, Université de Lorraine, 2012. http://www.theses.fr/2012LORR0204/document.
Full textUnderstanding how the protein interactome works at a structural level could provide useful insights into the mechanisms of diseases. Comparative homology modelling and ab initio protein docking are two computational methods for modelling the three-dimensional (3D) structures of protein-protein interactions (PPIs). Previous studies have shown that both methods give significantly better predictions when they incorporate experimental PPI information. However, in general, PPI information is often not available in an easily accessible way, and cannot be re-used by 3D PPI modelling algorithms. Hence, there is currently a need to develop a reliable framework to facilitate the reuse of PPI data. This thesis presents a systematic knowledge-based approach for representing, describing and manipulating 3D interactions to study PPIs on a large scale and to facilitate knowledge-based modelling of protein-protein complexes. The main contributions of this thesis are: (1) it describes an integrated database of non-redundant 3D hetero domain interactions; (2) it presents a novel method of describing and clustering DDIs according to the spatial orientations of the binding partners, thus introducing the notion of "domain family-level binding sites" (DFBS); (3) it proposes a structural classification of DFBSs similar to the CATH classification of protein folds, and it presents a study of secondary structure propensities of DFBSs and interaction preferences; (4) it introduces a systematic case-base reasoning approach to model on a large scale the 3D structures of protein complexes from existing structural DDIs. All these contributions have been made publicly available through a web server (http://kbdock.loria.fr)
Hertkorn, Peter. "Knowledge discovery in databases auf der Grundlage dimensionshomogener Funktionen /." Stuttgart : Univ., Inst. f. Statik u. Dynamik d. Luft- u. Raumfahrtkonstruktionen, 2005. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=014636277&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.
Full textHertkorn, Peter. "Knowledge discovery in databases auf der Grundlage dimensionshomogener Funktionen." Stuttgart ISD, 2004. http://deposit.ddb.de/cgi-bin/dokserv?id=2710474&prov=M&dok_var=1&dok_ext=htm.
Full textChowdhury, Israt Jahan. "Knowledge discovery from tree databases using balanced optimal search." Thesis, Queensland University of Technology, 2016. https://eprints.qut.edu.au/92263/1/Israt%20Jahan_Chowdhury_Thesis.pdf.
Full textGhoorah, Anisah W. "Extraction de connaissances pour la modélisation tri-dimensionnelle de l'interactome structural." Electronic Thesis or Diss., Université de Lorraine, 2012. http://www.theses.fr/2012LORR0204.
Full textUnderstanding how the protein interactome works at a structural level could provide useful insights into the mechanisms of diseases. Comparative homology modelling and ab initio protein docking are two computational methods for modelling the three-dimensional (3D) structures of protein-protein interactions (PPIs). Previous studies have shown that both methods give significantly better predictions when they incorporate experimental PPI information. However, in general, PPI information is often not available in an easily accessible way, and cannot be re-used by 3D PPI modelling algorithms. Hence, there is currently a need to develop a reliable framework to facilitate the reuse of PPI data. This thesis presents a systematic knowledge-based approach for representing, describing and manipulating 3D interactions to study PPIs on a large scale and to facilitate knowledge-based modelling of protein-protein complexes. The main contributions of this thesis are: (1) it describes an integrated database of non-redundant 3D hetero domain interactions; (2) it presents a novel method of describing and clustering DDIs according to the spatial orientations of the binding partners, thus introducing the notion of "domain family-level binding sites" (DFBS); (3) it proposes a structural classification of DFBSs similar to the CATH classification of protein folds, and it presents a study of secondary structure propensities of DFBSs and interaction preferences; (4) it introduces a systematic case-base reasoning approach to model on a large scale the 3D structures of protein complexes from existing structural DDIs. All these contributions have been made publicly available through a web server (http://kbdock.loria.fr)
Xie, Tian. "Knowledge discovery and machinelearning for capacity optimizationof Automatic Milking RotarySystem." Thesis, KTH, Kommunikationsteori, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-199630.
Full textMjölkproduktion är en del av vårt jordbruks tusenåriga historia. Med ökande krav påmejeriprodukter tillsammans med den snabba utvecklingen utav tekniken för det enormaförändringar i mjölkproduktionen. Mjölkproduktion började inledningsvis med handmjölkningsedan har mjölkproduktionsmetoder utvecklats genom olika tekniker och gettoss t.ex. vakuum mjölkning, rörledning mjölkning, fram till dagens mjölkningskarusell.Nu har det automatiska och tekniska mjölkningssystem försedd bönder med högeffektivmjölkning, effektiv djurhållningen och framför allt blomstrande inkomster.DeLaval Automatic Milking Rotary (AMRTM) är världens ledande automatiska roterandemjölkningssystemet. Den presenterar en ultimat kombination av teknik och maskinersom ger mjölkproduktionen betydande fördelar. DeLaval Automatic Milking Rotarytekniska mjölknings kapacitet är 90 kor per timme. Den begränsas utav jordbruksdrift,tillståndet hos kor och hantering av systemet. Det gör att den faktiska kapaciteten blirlägre än den tekniska. I denna avhandling undersöks hur ett optimeringssystem kan analyseraoch förbättra DeLaval Automatic Milking Rotary prestanda genom fokusering påkors beteenden och robot timeout. Genom att tillämpa kunskap från databas (KDD), skapamaskininlärande system som förutsäger kors beteenden samt utveckla modelleringsmetoderför systemsimulering, ges lösningsförslag av optimering samt validering.
Schneider, Ulrike, and Joachim Hagleitner. "Knowledge Discovery in Databases am Beispiel des österreichischen Nonprofit Sektors." Institut für Sozialpolitik, WU Vienna University of Economics and Business, 2005. http://epub.wu.ac.at/1352/1/document.pdf.
Full textHealy, Jerome V. "Computational knowledge discovery techniques and their application to options market databases." Thesis, London Metropolitan University, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.426594.
Full textAmirbekyan, Artak. "Protocols and Data Structures for Knowledge Discovery on Distributed Private Databases." Thesis, Griffith University, 2007. http://hdl.handle.net/10072/367447.
Full textThesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Science, Environment, Engineering and Technology
Full Text
PaÌirceÌir, RoÌnaÌn. "Knowledge discovery from distributed aggregate data in data warehouses and statistical databases." Thesis, University of Ulster, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.274398.
Full textPrášil, Zdeněk. "Využití data miningu v řízení podniku." Master's thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-150279.
Full textOrlygsdottir, Brynja. "Using knowledge discovery to identify potentially useful patterns of health promotion behavior of 10-12 year old Icelandic children." Diss., University of Iowa, 2008. http://ir.uiowa.edu/etd/6.
Full textHayward, John T. "Mining Oncology Data: Knowledge Discovery in Clinical Performance of Cancer Patients." Worcester, Mass. : Worcester Polytechnic Institute, 2006. http://www.wpi.edu/Pubs/ETD/Available/etd-081606-083026/.
Full textKeywords: Clinical Performance; Databases; Cancer; oncology; Knowledge Discovery in Databases; data mining. Includes bibliographical references (leaves 267-270).
Chang, Namsik. "Knowledge discovery in databases with joint decision outcomes: A decision-tree induction approach." Diss., The University of Arizona, 1995. http://hdl.handle.net/10150/187227.
Full textCowley, Jonathan Bowes. "The use of knowledge discovery databases in the identification of patients with colorectal cancer." Thesis, University of Hull, 2012. http://hydra.hull.ac.uk/resources/hull:7082.
Full textIglesia, Beatriz de la. "The development and application of heuristic techniques for the data mining task of nugget discovery." Thesis, University of East Anglia, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368386.
Full textPonsan, Christiane. "Computing with words for data mining." Thesis, University of Bristol, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.310744.
Full textAydin, Ugur [Verfasser]. "Interstitial solution enthalpies derived from first-principles : knowledge discovery using high-throughput databases / Ugur Aydin." Paderborn : Universitätsbibliothek, 2016. http://d-nb.info/1098210433/34.
Full textHamed, Ahmed A. "An Exploratory Analysis of Twitter Keyword-Hashtag Networks and Knowledge Discovery Applications." ScholarWorks @ UVM, 2014. http://scholarworks.uvm.edu/graddis/325.
Full textDopitová, Kateřina. "Empirické porovnání systémů dobývání znalostí z databází." Master's thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-18159.
Full textBeth, Madariaga Daniel Guillermo. "Identificación de las tendencias de reclamos presentes en reclamos.cl y que apunten contra instituciones de educación y organizaciones públicas." Tesis, Universidad de Chile, 2012. http://www.repositorio.uchile.cl/handle/2250/113396.
Full textEn la siguiente memoria se busca corroborar, por medio de una experiencia práctica y aplicada, si a caso el uso de las técnicas de Web Opinion Mining (WOM) y de herramientas informáticas, permiten determinar las tendencias generales que pueden poseer un conjunto de opiniones presentes en la Web. Particularmente, los reclamos publicados en el sitio web Reclamos.cl, y que apuntan contra instituciones pertenecientes a las industrias nacionales de Educación y de Gobierno. En ese sentido, los consumidores cada vez están utilizando más la Web para publicar en ella las apreciaciones positivas y negativas que poseen sobre lo que adquieren en el mercado, situación que hace de esta una mina de oro para diversas instituciones, especialmente para lo que es el identificar las fortalezas y las debilidades de los productos y los servicios que ofrecen, su imagen pública, entre varios otros aspectos. Concretamente, el experimento se realiza a través de la confección y la ejecución de una aplicación informática que integra e implementa conceptos de WOM, tales como Knowledge Discovery from Data (KDD), a modo de marco metodológico para alcanzar el objetivo planteado, y Latent Dirichlet Allocation (LDA), para lo que es la detección de tópicos dentro de los contenidos de los reclamos abordados. También se hace uso de programación orientada a objetos, basada en el lenguaje Python, almacenamiento de datos en bases de datos relacionales, y se incorporan herramientas pre fabricadas con tal de simplificar la realización de ciertas tareas requeridas. La ejecución de la aplicación permitió descargar las páginas web en cuyo interior se encontraban los reclamos de interés para la realización experimento, detectando en ellas 6.460 de estos reclamos; los cueles estaban dirigidos hacia 245 instituciones, y cuya fecha de publicación fue entre el 13 de Julio de 2006 y el 5 de Diciembre de 2011. Así también, la aplicación, mediante el uso de listas de palabras a descartar y de herramientas de lematización, procesó los contenidos de los reclamos, dejando en ellos sólo las versiones canónicas de las palabras que los constituían y que aportasen significado a estos. Con ello, la aplicación llevó a cabo varios análisis LDA sobre estos contenidos, los que arbitrariamente se definieron para ser ejecutados por cada institución detectada, tanto sobre el conjunto total de sus reclamos, como en segmentos de estos agrupados por año de publicación, con tal de generar, por cada uno de estos análisis, resultados compuestos por 20 tópicos de 30 palabras cada uno. Con los resultados de los análisis LDA, y mediante una metodología de lectura e interpretación manual de las palabras que constituían cada uno de los conjuntos de tópicos obtenidos, se procedió a generar frases y oraciones que apuntasen a hilarlas, con tal de obtener una interpretación que reflejase la tendencia a la cual los reclamos, representados en estos resultados, apuntaban. De esto se pudo concluir que es posible detectar las tendencias generales de los reclamos mediante el uso de las técnicas de WOM, pero con observaciones al respecto, pues al surgir la determinación de las tendencias desde un proceso de interpretación manual, se pueden generar subjetividades en torno al objeto al que apuntan dichas tendencias, ya sea por los intereses, las experiencias, entre otros, que posea la persona que realice el ejercicio de interpretación de los resultados.
Bogorny, Vania. "Enhancing spatial association rule mining in geographic databases." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2006. http://hdl.handle.net/10183/7841.
Full textThe association rule mining technique emerged with the objective to find novel, useful, and previously unknown associations from transactional databases, and a large amount of association rule mining algorithms have been proposed in the last decade. Their main drawback, which is a well known problem, is the generation of large amounts of frequent patterns and association rules. In geographic databases the problem of mining spatial association rules increases significantly. Besides the large amount of generated patterns and rules, many patterns are well known geographic domain associations, normally explicitly represented in geographic database schemas. The majority of existing algorithms do not warrant the elimination of all well known geographic dependences. The result is that the same associations represented in geographic database schemas are extracted by spatial association rule mining algorithms and presented to the user. The problem of mining spatial association rules from geographic databases requires at least three main steps: compute spatial relationships, generate frequent patterns, and extract association rules. The first step is the most effort demanding and time consuming task in the rule mining process, but has received little attention in the literature. The second and third steps have been considered the main problem in transactional association rule mining and have been addressed as two different problems: frequent pattern mining and association rule mining. Well known geographic dependences which generate well known patterns may appear in the three main steps of the spatial association rule mining process. Aiming to eliminate well known dependences and generate more interesting patterns, this thesis presents a framework with three main methods for mining frequent geographic patterns using knowledge constraints. Semantic knowledge is used to avoid the generation of patterns that are previously known as non-interesting. The first method reduces the input problem, and all well known dependences that can be eliminated without loosing information are removed in data preprocessing. The second method eliminates combinations of pairs of geographic objects with dependences, during the frequent set generation. A third method presents a new approach to generate non-redundant frequent sets, the maximal generalized frequent sets without dependences. This method reduces the number of frequent patterns very significantly, and by consequence, the number of association rules.
TALEBZADEH, SAEED. "Data Mining in Scientific Databases for Knowledge Discovery, the Case of Interpreting Support Vector Machines via Genetic Programming as Simple Understandable Terms." Doctoral thesis, Università degli Studi di Roma "Tor Vergata", 2015. http://hdl.handle.net/2108/202257.
Full textOtine, Charles. "HIV Patient Monitoring Framework Through Knowledge Engineering." Doctoral thesis, Blekinge Tekniska Högskola [bth.se], School of Planning and Media Design, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-00540.
Full textAbeysekara, Thusitha Bernad. "A proposal for the protection of digital databases in Sri Lanka." Thesis, University of Exeter, 2013. http://hdl.handle.net/10871/14172.
Full textFihn, John, and Johan Finndahl. "A Framework for How to Make Use of an Automatic Passenger Counting System." Thesis, Uppsala universitet, Datorteknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-158139.
Full textBeskyba, Jan. "Automatizace předzpracování dat za využití doménových znalosti." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-193429.
Full textNigita, Giovanni. "Knowledge bases and stochastic algorithms for mining biological data: applications on A-to-I RNA editing and RNAi." Doctoral thesis, Università di Catania, 2014. http://hdl.handle.net/10761/1555.
Full textSanavia, Tiziana. "Biomarker lists stability in genomic studies: analysis and improvement by prior biological knowledge integration into the learning process." Doctoral thesis, Università degli studi di Padova, 2012. http://hdl.handle.net/11577/3422197.
Full textL’analisi di dati high-throughput basata sull’utilizzo di tecnologie di sequencing, microarray e spettrometria di massa si è dimostrata estremamente utile per l’identificazione di quei geni e proteine, chiamati biomarcatori, utili per rispondere a quesiti sia di tipo diagnostico/prognostico che funzionale. In tale contesto, la stabilità dei risultati è cruciale sia per capire i meccanismi biologici che caratterizzano le malattie sia per ottenere una sufficiente affidabilità per applicazioni in campo clinico/farmaceutico. Recentemente, diversi studi hanno dimostrato che le liste di biomarcatori identificati sono scarsamente riproducibili, rendendo la validazione di tali biomarcatori come indicatori stabili di una malattia un problema ancora aperto. Le ragioni di queste differenze sono imputabili sia alla dimensione dei dataset (pochi soggetti rispetto al numero di variabili) sia all’eterogeneità di malattie complesse, caratterizzate da alterazioni di più pathway di regolazione e delle interazioni tra diversi geni e l’ambiente. Tipicamente in un disegno sperimentale, i dati da analizzare provengono da diversi soggetti e diversi fenotipi (e.g. normali e patologici). Le metodologie maggiormente utilizzate per l’identificazione di geni legati ad una malattia si basano sull’analisi differenziale dell’espressione genica tra i diversi fenotipi usando test statistici univariati. Tale approccio fornisce le informazioni sull’effetto di specifici geni considerati come variabili indipendenti tra loro, mentre è ormai noto che l’interazione tra geni debolmente up/down regolati, sebbene non differenzialmente espressi, potrebbe rivelarsi estremamente importante per caratterizzare lo stato di una malattia. Gli algoritmi di machine learning sono, in linea di principio, capaci di identificare combinazioni non lineari delle variabili e hanno quindi la possibilità di selezionare un insieme più dettagliato di geni che sono sperimentalmente rilevanti. In tale contesto, i metodi di classificazione supervisionata vengono spesso utilizzati per selezionare i biomarcatori, e diversi approcci, quali discriminant analysis, random forests e support vector machines tra altri, sono stati utilizzati, soprattutto in studi oncologici. Sebbene con tali approcci di classificazione si ottenga un alto livello di accuratezza di predizione, la riproducibilità delle liste di biomarcatori rimane ancora una questione aperta, dato che esistono molteplici set di variabili biologiche (i.e. geni o proteine) che possono essere considerati ugualmente rilevanti in termini di predizione. Quindi in teoria è possibile avere un’insufficiente stabilità anche raggiungendo il massimo livello di accuratezza. Questa tesi rappresenta uno studio su diversi aspetti computazionali legati all’identificazione di biomarcatori in genomica: dalle strategie di classificazione e di feature selection adottate alla tipologia e affidabilità dell’informazione biologica utilizzata, proponendo nuovi approcci in grado di affrontare il problema della riproducibilità delle liste di biomarcatori. Tale studio ha evidenziato che sebbene un’accettabile e comparabile accuratezza nella predizione può essere ottenuta attraverso diversi metodi, ulteriori sviluppi sono necessari per raggiungere una robusta stabilità nelle liste di biomarcatori, a causa dell’alto numero di variabili e dell’alto livello di correlazione tra loro. In particolare, questa tesi propone due diversi approcci per migliorare la stabilità delle liste di biomarcatori usando l’informazione a priori legata alle interazioni biologiche e alla correlazione funzionale tra le features analizzate. Entrambi gli approcci sono stati in grado di migliorare la selezione di biomarcatori. Il primo approccio, usando l’informazione a priori per dividere l’applicazione del metodo in diversi sottoproblemi, migliora l’interpretabilità dei risultati e offre un modo alternativo per verificare la riproducibilità delle liste. Il secondo, integrando l’informazione a priori in una funzione kernel dell’algoritmo di learning, migliora la stabilità delle liste. Infine, l’interpretabilità dei risultati è fortemente influenzata dalla qualità dell’informazione biologica disponibile e l’analisi delle eterogeneità delle annotazioni effettuata sul database Gene Ontology rivela l’importanza di fornire nuovi metodi in grado di verificare l’attendibilità delle proprietà biologiche che vengono assegnate ad una specifica variabile, distinguendo la mancanza o la minore specificità di informazione da possibili inconsistenze tra le annotazioni. Questi aspetti verranno sempre più approfonditi in futuro, dato che le nuove tecnologie di sequencing monitoreranno un maggior numero di variabili e il numero di annotazioni funzionali derivanti dai database genomici crescer`a considerevolmente nei prossimi anni.
Černý, Ján. "Implementace procedur pro předzpracování dat v systému Rapid Miner." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-193216.
Full textVálek, Martin. "Analýza reálných dat produktové redakce Alza.cz pomocí metod DZD." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-198448.
Full textRazavi, Amir Reza. "Applications of Knowledge Discovery in Quality Registries - Predicting Recurrence of Breast Cancer and Analyzing Non-compliance with a Clinical Guideline." Doctoral thesis, Linköping : Univ, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10142.
Full textMohd, Saudi Madihah. "A new model for worm detection and response : development and evaluation of a new model based on knowledge discovery and data mining techniques to detect and respond to worm infection by integrating incident response, security metrics and apoptosis." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5410.
Full textKolafa, Ondřej. "Reálná úloha dobývání znalostí." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-200136.
Full textAldas, Cem Nuri. "An Analysis Of Peculiarity Oriented Interestingness Measures On Medical Data." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609856/index.pdf.
Full text