To see the other types of publications on this topic, follow the link: Knowledge discovery model (KDM).

Dissertations / Theses on the topic 'Knowledge discovery model (KDM)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 34 dissertations / theses for your research on the topic 'Knowledge discovery model (KDM).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Santibáñez, Daniel Gustavo San Martín. "Mineração de interesses no processo de modernização dirigida a arquitetura." Universidade Federal de São Carlos, 2013. https://repositorio.ufscar.br/handle/ufscar/549.

Full text
Abstract:
Made available in DSpace on 2016-06-02T19:06:09Z (GMT). No. of bitstreams: 1 5515.pdf: 2859644 bytes, checksum: 8f2473af784eb07ff38067a957051dde (MD5) Previous issue date: 2013-08-27<br>Universidade Federal de Sao Carlos<br>Software systems are considered legacy when they were developed many years ago with outdated technologies and their maintenance process consumes a large amount of resources. One cause of these problems is the inadequate modularization of its crosscutting concerns. In this situation, an alternative is to modernize the system with a new language to provide better support for concern modularization. ADM (Architecture-Driven Modernization) is an OMG model-driven proposal to modernize legacy systems and consist of a set of metamodels in which the main metamodel is KDM (Knowledge Discovery Metamodel), which allows to represent all the characteristics of a system. The modernization process begins with reverse engineering to represent the legacy system in a KDM model. Thereafter, refactorings can be applied to the model and then generate the modernized code. However, the current proposals do not support crosscutting concerns modularization. This occurs because the first step is to identify the elements which contribute with the implementation of a particular concern and it is not supplied by ADM. In this sense, this dissertation presents an approach for mining crosscutting concerns in KDM models, thus establishing the first step towards to a Concern-Driven modernization. The approach is a combination of two techniques, a concern library and a modified K-means clustering algorithm, which comprises four steps where the input is a KDM model and the result is the same KDM model with annotated concerns and some log files. In addition, we developed an Eclipse plugin called CCKDM to implement the approach. An evaluation was performed involving three software systems. The results show that for systems using APIs to implement their concerns the developed technique is an effective method for identifying them, achieving good values of precision and recall.<br>Sistemas de software são considerados legados quando foram desenvolvidos há muitos anos com tecnologias obsoletas e seu processo de manutenção consome uma quantidade de recursos além da desejada. Uma das causas desses problemas é a modularização inadequada de seus interesses transversais. Quando se encontram nessa situação, uma alternativa é modernizar o sistema para novas linguagens que forneçam melhor suporte à modularização desse tipo de interesse. A ADM (Architecture-Driven Modernization) é uma proposta do OMG para a modernização orientada a modelos de sistemas legados, sendo composta por um conjunto de metamodelos, em que o principal é o KDM (Knowledge Discovery Metamodel), que permite representar todas as particularidades de um sistema. O processo de modernização inicia-se com a engenharia reversa, em que o sistema legado é inteiramente representado em KDM. Depois disso, pode-se aplicar refatorações nesse modelo e gerar o código modernizado. Entretanto, a proposta atual da ADM não inclui suporte para modularizar interesses transversais de um sistema. Isso ocorre porque o primeiro passo desse processo é minerar e encontrar os elementos que contribuem para a implementação de um dado interesse, e isso não é fornecido pela ADM. Nesse sentido, nesta dissertação é apresentada uma abordagem para mineração de interesses no metamodelo KDM, estabelecendo o primeiro passo para um processo de modernização dirigido a interesses. A abordagem de mineração proposta atua com uma combinação de duas técnicas; uma biblioteca de interesses e um algoritmo modificado K-means para agrupar strings similares. A abordagem inclui quatro passos onde a entrada é um modelo KDM e o resultado é o mesmo modelo KDM com os interesses anotados e mais alguns arquivos de registro. Além disso, desenvolveuse um plugin chamado CCKDM para o ambiente Eclipse que implementa a abordagem. Uma avaliação foi realizada envolvendo três sistemas de software. Os resultados da avaliação mostraram que para sistemas que utilizam APIs para implementar seus interesses a técnica desenvolvida é efetiva para a identificação deles, atingindo bons valores de precisão e cobertura.
APA, Harvard, Vancouver, ISO, and other styles
2

Mohd, Saudi Madihah. "A new model for worm detection and response : development and evaluation of a new model based on knowledge discovery and data mining techniques to detect and respond to worm infection by integrating incident response, security metrics and apoptosis." Thesis, University of Bradford, 2011. http://hdl.handle.net/10454/5410.

Full text
Abstract:
Worms have been improved and a range of sophisticated techniques have been integrated, which make the detection and response processes much harder and longer than in the past. Therefore, in this thesis, a STAKCERT (Starter Kit for Computer Emergency Response Team) model is built to detect worms attack in order to respond to worms more efficiently. The novelty and the strengths of the STAKCERT model lies in the method implemented which consists of STAKCERT KDD processes and the development of STAKCERT worm classification, STAKCERT relational model and STAKCERT worm apoptosis algorithm. The new concept introduced in this model which is named apoptosis, is borrowed from the human immunology system has been mapped in terms of a security perspective. Furthermore, the encouraging results achieved by this research are validated by applying the security metrics for assigning the weight and severity values to trigger the apoptosis. In order to optimise the performance result, the standard operating procedures (SOP) for worm incident response which involve static and dynamic analyses, the knowledge discovery techniques (KDD) in modeling the STAKCERT model and the data mining algorithms were used. This STAKCERT model has produced encouraging results and outperformed comparative existing work for worm detection. It produces an overall accuracy rate of 98.75% with 0.2% for false positive rate and 1.45% is false negative rate. Worm response has resulted in an accuracy rate of 98.08% which later can be used by other researchers as a comparison with their works in future.
APA, Harvard, Vancouver, ISO, and other styles
3

Sharma, Sumana. "An Integrated Knowledge Discovery and Data Mining Process Model." VCU Scholars Compass, 2008. http://scholarscompass.vcu.edu/etd/1615.

Full text
Abstract:
Enterprise decision making is continuously transforming in the wake of ever increasing amounts of data. Organizations are collecting massive amounts of data in their quest for knowledge nuggets in form of novel, interesting, understandable patterns that underlie these data. The search for knowledge is a multi-step process comprising of various phases including development of domain (business) understanding, data understanding, data preparation, modeling, evaluation and ultimately, the deployment of the discovered knowledge. These phases are represented in form of Knowledge Discovery and Data Mining (KDDM) Process Models that are meant to provide explicit support towards execution of the complex and iterative knowledge discovery process. Review of existing KDDM process models reveals that they have certain limitations (fragmented design, only a checklist-type description of tasks, lack of support towards execution of tasks, especially those of the business understanding phase etc) which are likely to affect the efficiency and effectiveness with which KDDM projects are currently carried out. This dissertation addresses the various identified limitations of existing KDDM process models through an improved model (named the Integrated Knowledge Discovery and Data Mining Process Model) which presents an integrated view of the KDDM process and provides explicit support towards execution of each one of the tasks outlined in the model. We also evaluate the effectiveness and efficiency offered by the IKDDM model against CRISP-DM, a leading KDDM process model, in aiding data mining users to execute various tasks of the KDDM process. Results of statistical tests indicate that the IKDDM model outperforms the CRISP model in terms of efficiency and effectiveness; the IKDDM model also outperforms CRISP in terms of quality of the process model itself.
APA, Harvard, Vancouver, ISO, and other styles
4

Wu, Sheng-Tang. "Knowledge discovery using pattern taxonomy model in text mining." Thesis, Queensland University of Technology, 2007. https://eprints.qut.edu.au/16675/1/Sheng-Tang_Wu_Thesis.pdf.

Full text
Abstract:
In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches.
APA, Harvard, Vancouver, ISO, and other styles
5

Wu, Sheng-Tang. "Knowledge discovery using pattern taxonomy model in text mining." Queensland University of Technology, 2007. http://eprints.qut.edu.au/16675/.

Full text
Abstract:
In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhuang, Chenyi. "Location Knowledge Discovery from User Activities." Kyoto University, 2017. http://hdl.handle.net/2433/227660.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Bani, Mustafa Ahmed Mahmood. "A knowledge discovery and data mining process model for metabolomics." Thesis, Aberystwyth University, 2012. http://hdl.handle.net/2160/6889468e-851f-47fd-bd44-fe65fe516c7a.

Full text
Abstract:
This thesis presents a novel knowledge discovery and data mining process model for metabolomics, which was successfully developed, implemented and applied to a number of metabolomics applications. The process model provides a formalised framework and a methodology for conducting justifiable and traceable data mining in metabolomics. It promotes the achievement of metabolomics analytical objectives and contributes towards the reproducibility of its results. The process model was designed to satisfy the requirements of data mining in metabolomics and to be consistent with the scientific nature of metabolomics investigations. It considers the practical aspects of the data mining process, covering management, human interaction, quality assurance and standards, in addition to other desired features such as visualisation, data exploration, knowledge presentation and automation. The development of the process model involved investigating data mining concepts, approaches and techniques; in addition to the popular data mining process models, which were critically analysed in order to utilise their better features and to overcome their shortcomings. Inspiration from process engineering, software engineering, machine learning and scientific methodology was also used in developing the process model along with the existing ontologies of scientific experiments and data mining. The process model was designed to support both data-driven and hypothesis-driven data mining. It provides a mechanism for defining the analytical objectives of metabolomics data mining, considering their achievability, feasibility, measurability and success criteria. The process model also provides a novel strategy for performing justifiable selection of data mining techniques, taking into consideration the achievement of the process's analytical objectives and taking into account the nature and quality of the metabolomics data, in addition to the requirements and feasibility of the selected data mining techniques. The model ensures validity and reproducibility of the outcomes by defining traceability and assessment mechanisms, which cover all the procedures applied and the deliveries generated throughout the process. The process also defines evaluation mechanisms, which cover not only the technical aspects of the data mining model, but also the contextual aspects of the acquired knowledge. The process model was implemented using a software environment, and was applied to four real-world metabolomics applications. The applications demonstrated the proposed process model's applicability to various data mining approaches, goals, tasks, and techniques. They also confirmed the process's applicability to various metabolomics investigations and approaches using data generated by a variety of data acquisition instruments. The outcomes of the process execution in these applications were used in evaluating the process model's design and its satisfaction of the requirements of metabolomics data mining.
APA, Harvard, Vancouver, ISO, and other styles
8

Santos, Bruno Marinho. "Extensões do metamodelo KDM para apoiar modernizações orientadas a aspectos de sistemas legados." Universidade Federal de São Carlos, 2014. https://repositorio.ufscar.br/handle/ufscar/593.

Full text
Abstract:
Made available in DSpace on 2016-06-02T19:06:20Z (GMT). No. of bitstreams: 1 6500.pdf: 5345644 bytes, checksum: e886844b4ba61e58d704a8babf113721 (MD5) Previous issue date: 2014-10-21<br>Maintaining legacy systems is a complex and expensive activity for many companies. A recently proposal to solve this problem is Architecture-Driven Modernization (ADM), proposed by Object Management Group (OMG). The ADM consists of a set of concepts and standard metamodels that support systems modernization using models. The Knowledge Discovery Metamodel (KDM) is the main metamodel of ADM, it can represent many artifacts of a legacy system, such as source code, architecture, user interface, configuration files and business process. In general, legacy systems have crosscutting concerns, it can show source code problems like tangling and scattering, and it raises the maintenance costs. The aspect orientation is an alternative to improve crosscutting concerns modularization. Thus, in this dissertation is presented the term Aspect Oriented Modernization that uses the aspect oriented concepts in the ADM context. This modernization process consists in modularize legacy systems with aspects represented in model level. To achieve this goal, in this work were performed a lightweight and a heavyweight extension in the KDM metamodel, to analyze which one would present a better performance if used by Modernization Engineers. The evaluation of these extensions was performed by a case study that considered the modernization with aspects of a small-sized system. To evaluate the case study in both extensions, a set of comparison criteria were created to support the software engineers in choosing the best extension mechanism, according to their needs. In the context of this dissertation an experimental study were developed that aimed reproducing the scenarios that the modernization engineers had to perform maintenances and developing new refactorings in a aspect oriented KDM model. The experiment data considered the development time of the activities and the found number of errors. Finally, it was noticed that the extension mechanism to be choose will depend on the context that it will be applied, however, considering the approach studied here the best extension mechanism is the heavyweight one.<br>Manter sistemas legados é uma atividade complexa e onerosa para muitas empresas. Uma proposta recente para esse problema é a Modernização Dirigida à Arquitetura (Architecture-Driven Modernization - ADM), proposta pela OMG (Object Management Group). A ADM consiste em um conjunto de princípios e metamodelos padrões que apoiam a modernização de sistemas utilizando modelos. O Knowledge Discovery Metamodel (KDM) é o principal metamodelo da ADM, podendo representar diversos artefatos de um sistema, como código-fonte, arquitetura, interface de usuário, arquivos de configuração e processos de negócio. Em geral, sistemas legados possuem interesses transversais, apresentando problemas de entrelaçamento e espalhamento de código, o que eleva os custos de manutenção. A orientação a aspectos é uma alternativa para melhorar a modularização de interesses transversais. Mediante isso, neste trabalho é apresentado o termo Modernização Orientada a Aspectos que utiliza os conceitos da orientação a aspectos na ADM. Essa modernização consiste em remodularizar sistemas legados utilizando aspectos representados em nível de modelo. Para atingir esse objetivo, foi realizada uma extensão leve e outra pesada do metamodelo KDM, para analisar em qual das duas o desempenho dos engenheiros de modernização seria melhor. Para fazer a avaliação das extensões, foi realizado um estudo de caso levando em consideração a modernização com aspectos em um sistema de pequeno porte. Com o objetivo de avaliar o estudo de caso usando as duas extensões, foram desenvolvidos critérios de comparação que auxiliassem os engenheiros de software a escolher qual dos dois mecanismos de extensão utilizar em seu projeto. Foi feito também um estudo experimental que buscou reproduzir os cenários em que engenheiros de modernização tivessem que realizar manutenções e desenvolver novas refatorações em um modelo KDM orientado a aspectos. Os dados do experimento foram avaliados em relação ao tempo de desenvolvimento das atividades e quantidade de erros encontrados. Por fim, percebeu-se que o mecanismo de extensão a ser utilizado vai depender do contexto em que ele será aplicado, mas, para o domínio aqui estudado a extensão que melhor atendeu aos requisitos foi a pesada.
APA, Harvard, Vancouver, ISO, and other styles
9

Liang, Wen. "Integrated feature, neighbourhood, and model optimization for personalised modelling and knowledge discovery." Click here to access this resource online, 2009. http://hdl.handle.net/10292/749.

Full text
Abstract:
“Machine learning is the process of discovering and interpreting meaningful information, such as new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques” (Larose, 2005). From my understanding, machine learning is a process of using different analysis techniques to observe previously unknown, potentially meaningful information, and discover strong patterns and relationships from a large dataset. Professor Kasabov (2007b) classified computational models into three categories (e.g. global, local, and personalised) which have been widespread and used in the areas of data analysis and decision support in general, and in the areas of medicine and bioinformatics in particular. Most recently, the concept of personalised modelling has been widely applied to various disciplines such as personalised medicine, personalised drug design for known diseases (e.g. cancer, diabetes, brain disease, etc.) as well as for other modelling problems in ecology, business, finance, crime prevention, and so on. The philosophy behind the personalised modelling approach is that every person is different from others, thus he/she will benefit from having a personalised model and treatment. However, personalised modelling is not without issues, such as defining the correct number of neighbours or defining an appropriate number of features. As a result, the principal goal of this research is to study and address these issues and to create a novel framework and system for personalised modelling. The framework would allow users to select and optimise the most important features and nearest neighbours for a new input sample in relation to a certain problem based on a weighted variable distance measure in order to obtain more precise prognostic accuracy and personalised knowledge, when compared with global modelling and local modelling approaches.
APA, Harvard, Vancouver, ISO, and other styles
10

Jones, David. "Improving engineering information access and knowledge discovery through model-based information navigation." Thesis, University of Bristol, 2019. http://hdl.handle.net/1983/2d1c1535-e582-41fd-a6f6-cc1178c21d2a.

Full text
Abstract:
An organisation's data, information, and knowledge is widely considered to be one of its greatest assets. As such, the capture, storage and dissemination of this asset is the focus of both academic and organisational efforts. This is true at the Airbus Group, the industrial partner of this thesis. Their Knowledge Management team invests in state-of-the-art tools and techniques, and actively participates in research in a bid to maximise their organisation's reuse of knowledge and ultimately their competitiveness. A successful knowledge management strategy creates a knowledgeable and wise workforce that ultimately benefits both the individual and the organisation. The dissemination of information and knowledge such that it is easily and readily accessible is one key aspect within such a strategy. Search engines are a typical means for information and knowledge dissemination yet, unlike the Internet, search within organisations (intranet or enterprise search) is frequently found lacking. This thesis contributes to this area of knowledge management. Research in the field of enterprise search has been shown to improve search through the application of context to expand search queries. The novel approach taken in this thesis takes this context and applies it visually, moving the search for information away from a text-based user interface towards a user interface that reflects the function and form of the product. The approach: model-based information navigation, is based on the premise that leveraging the visual and functional nature of engineers through a model-based user interface can improve information access and knowledge discovery. From the perspectives of information visualisation, engineering information management, product life-cycle management, and building information modelling, this thesis contributes through: The development of techniques that enable documents to be indexed against the product structure; The development of techniques for navigation within engineering three-dimensional virtual environments; The design of a range visual information object for the display of information within engineering three-dimensional virtual environments; The determination of the affordance of a model-based approach to information navigation. This thesis presents the development of a framework for model-based information navigation: a novel approach to finding information that places a three-dimensional representation of the product at the heart of searching document collections.
APA, Harvard, Vancouver, ISO, and other styles
11

Li, Yan. "NEW ARTIFACTS FOR THE KNOWLEDGE DISCOVERY VIA DATA ANALYTICS (KDDA) PROCESS." VCU Scholars Compass, 2014. http://scholarscompass.vcu.edu/etd/3609.

Full text
Abstract:
Recently, the interest in the business application of analytics and data science has increased significantly. The popularity of data analytics and data science comes from the clear articulation of business problem solving as an end goal. To address limitations in existing literature, this dissertation provides four novel design artifacts for Knowledge Discovery via Data Analytics (KDDA). The first artifact is a Snail Shell KDDA process model that extends existing knowledge discovery process models, but addresses many existing limitations. At the top level, the KDDA Process model highlights the iterative nature of KDDA projects and adds two new phases, namely Problem Formulation and Maintenance. At the second level, generic tasks of the KDDA process model are presented in a comparative manner, highlighting the differences between the new KDDA process model and the traditional knowledge discovery process models. Two case studies are used to demonstrate how to use KDDA process model to guide real world KDDA projects. The second artifact, a methodology for theory building based on quantitative data is a novel application of KDDA process model. The methodology is evaluated using a theory building case from the public health domain. It is not only an instantiation of the Snail Shell KDDA process model, but also makes theoretical contributions to theory building. It demonstrates how analytical techniques can be used as quantitative gauges to assess important construct relationships during the formative phase of theory building. The third artifact is a data mining ontology, the DM3 ontology, to bridge the semantic gap between business users and KDDA expert and facilitate analytical model maintenance and reuse. The DM3 ontology is evaluated using both criteria-based approach and task-based approach. The fourth artifact is a decision support framework for MCDA software selection. The framework enables users choose relevant MCDA software based on a specific decision making situation (DMS). A DMS modeling framework is developed to structure the DMS based on the decision problem and the users' decision preferences and. The framework is implemented into a decision support system and evaluated using application examples from the real-estate domain.
APA, Harvard, Vancouver, ISO, and other styles
12

Tuovinen, L. (Lauri). "From machine learning to learning with machines:remodeling the knowledge discovery process." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526205243.

Full text
Abstract:
Abstract Knowledge discovery (KD) technology is used to extract knowledge from large quantities of digital data in an automated fashion. The established process model represents the KD process in a linear and technology-centered manner, as a sequence of transformations that refine raw data into more and more abstract and distilled representations. Any actual KD process, however, has aspects that are not adequately covered by this model. In particular, some of the most important actors in the process are not technological but human, and the operations associated with these actors are interactive rather than sequential in nature. This thesis proposes an augmentation of the established model that addresses this neglected dimension of the KD process. The proposed process model is composed of three sub-models: a data model, a workflow model, and an architectural model. Each sub-model views the KD process from a different angle: the data model examines the process from the perspective of different states of data and transformations that convert data from one state to another, the workflow model describes the actors of the process and the interactions between them, and the architectural model guides the design of software for the execution of the process. For each of the sub-models, the thesis first defines a set of requirements, then presents the solution designed to satisfy the requirements, and finally, re-examines the requirements to show how they are accounted for by the solution. The principal contribution of the thesis is a broader perspective on the KD process than what is currently the mainstream view. The augmented KD process model proposed by the thesis makes use of the established model, but expands it by gathering data management and knowledge representation, KD workflow and software architecture under a single unified model. Furthermore, the proposed model considers issues that are usually either overlooked or treated as separate from the KD process, such as the philosophical aspect of KD. The thesis also discusses a number of technical solutions to individual sub-problems of the KD process, including two software frameworks and four case-study applications that serve as concrete implementations and illustrations of several key features of the proposed process model<br>Tiivistelmä Tiedonlouhintateknologialla etsitään automoidusti tietoa suurista määristä digitaalista dataa. Vakiintunut prosessimalli kuvaa tiedonlouhintaprosessia lineaarisesti ja teknologiakeskeisesti sarjana muunnoksia, jotka jalostavat raakadataa yhä abstraktimpiin ja tiivistetympiin esitysmuotoihin. Todellisissa tiedonlouhintaprosesseissa on kuitenkin aina osa-alueita, joita tällainen malli ei kata riittävän hyvin. Erityisesti on huomattava, että eräät prosessin tärkeimmistä toimijoista ovat ihmisiä, eivät teknologiaa, ja että heidän toimintansa prosessissa on luonteeltaan vuorovaikutteista eikä sarjallista. Tässä väitöskirjassa ehdotetaan vakiintuneen mallin täydentämistä siten, että tämä tiedonlouhintaprosessin laiminlyöty ulottuvuus otetaan huomioon. Ehdotettu prosessimalli koostuu kolmesta osamallista, jotka ovat tietomalli, työnkulkumalli ja arkkitehtuurimalli. Kukin osamalli tarkastelee tiedonlouhintaprosessia eri näkökulmasta: tietomallin näkökulma käsittää tiedon eri olomuodot sekä muunnokset olomuotojen välillä, työnkulkumalli kuvaa prosessin toimijat sekä niiden väliset vuorovaikutukset, ja arkkitehtuurimalli ohjaa prosessin suorittamista tukevien ohjelmistojen suunnittelua. Väitöskirjassa määritellään aluksi kullekin osamallille joukko vaatimuksia, minkä jälkeen esitetään vaatimusten täyttämiseksi suunniteltu ratkaisu. Lopuksi palataan tarkastelemaan vaatimuksia ja osoitetaan, kuinka ne on otettu ratkaisussa huomioon. Väitöskirjan pääasiallinen kontribuutio on se, että se avaa tiedonlouhintaprosessiin valtavirran käsityksiä laajemman tarkastelukulman. Väitöskirjan sisältämä täydennetty prosessimalli hyödyntää vakiintunutta mallia, mutta laajentaa sitä kokoamalla tiedonhallinnan ja tietämyksen esittämisen, tiedon louhinnan työnkulun sekä ohjelmistoarkkitehtuurin osatekijöiksi yhdistettyyn malliin. Lisäksi malli kattaa aiheita, joita tavallisesti ei oteta huomioon tai joiden ei katsota kuuluvan osaksi tiedonlouhintaprosessia; tällaisia ovat esimerkiksi tiedon louhintaan liittyvät filosofiset kysymykset. Väitöskirjassa käsitellään myös kahta ohjelmistokehystä ja neljää tapaustutkimuksena esiteltävää sovellusta, jotka edustavat teknisiä ratkaisuja eräisiin yksittäisiin tiedonlouhintaprosessin osaongelmiin. Kehykset ja sovellukset toteuttavat ja havainnollistavat useita ehdotetun prosessimallin merkittävimpiä ominaisuuksia
APA, Harvard, Vancouver, ISO, and other styles
13

Al, Harbi H. Y. M. "Semantically aware hierarchical Bayesian network model for knowledge discovery in data : an ontology-based framework." Thesis, University of Salford, 2017. http://usir.salford.ac.uk/43293/.

Full text
Abstract:
Several mining algorithms have been invented over the course of recent decades. However, many of the invented algorithms are confined to generating frequent patterns and do not illustrate how to act upon them. Hence, many researchers have argued that existing mining algorithms have some limitations with respect to performance and workability. Quantity and quality are the main limitations of the existing mining algorithms. While quantity states that the generated patterns are abundant, quality indicates that they cannot be integrated into the business domain seamlessly. Consequently, recent research has suggested that the limitations of the existing mining algorithms are the result of treating the mining process as an isolated and autonomous data-driven trial-and-error process and ignoring the domain knowledge. Accordingly, the integration of domain knowledge into the mining process has become the goal of recent data mining algorithms. Domain knowledge can be represented using various techniques. However, recent research has stated that ontology is the natural way to represent knowledge for data mining use. The structural nature of ontology makes it a very strong candidate for integrating domain knowledge with data mining algorithms. It has been claimed that ontology can play the following roles in the data mining process: • Bridging the semantic gap. • Providing prior knowledge and constraints. • Formally representing the DM results. Despite the fact that a variety of research has used ontology to enrich different tasks in the data mining process, recent research has revealed that the process of developing a framework that systematically consolidates ontology and the mining algorithms in an intelligent mining environment has not been realised. Hence, this thesis proposes an automatic, systematic and flexible framework that integrates the Hierarchical Bayesian Network (HBN) and domain ontology. The ultimate aim of this thesis is to propose a data mining framework that implicitly caters for the underpinning domain knowledge and eventually leads to a more intelligent and accurate mining process. To a certain extent the proposed mining model will simulate the cognitive system in the human being. The similarity between ontology, the Bayesian Network (BN) and bioinformatics applications establishes a strong connection between these research disciplines. This similarity can be summarised in the following points: • Both ontology and BN have a graphical-based structure. • Biomedical applications are known for their uncertainty. Likewise, BN is a powerful tool for reasoning under uncertainty. • The medical data involved in biomedical applications is comprehensive and ontology is the right model for representing comprehensive data. Hence, the proposed ontology-based Semantically Aware Hierarchical Bayesian Network (SAHBN) is applied to eight biomedical data sets in the field of predicting the effect of the DNA repair gene in the human ageing process and the identification of hub protein. Consequently, the performance of SAHBN was compared with existing Bayesian-based classification algorithms. Overall, SAHBN demonstrated a very competitive performance. The contribution of this thesis can be summarised in the following points. • Proposed an automatic, systematic and flexible framework to integrate ontology and the HBN. Based on the literature review, and to the best of our knowledge, no such framework has been proposed previously. • The complexity of learning HBN structure from observed data is significant. Hence, the proposed SAHBN model utilized the domain knowledge in the form of ontology to overcome this challenge. • The proposed SAHBN model preserves the advantages of both ontology and Bayesian theory. It integrates the concept of Bayesian uncertainty with the deterministic nature of ontology without extending ontology structure and adding probability-specific properties that violate the ontology standard structure. • The proposed SAHBN utilized the domain knowledge in the form of ontology to define the semantic relationships between the attributes involved in the mining process, guides the HBN structure construction procedure, checks the consistency of the training data set and facilitates the calculation of the associated conditional probability tables (CPTs). • The proposed SAHBN model lay out a solid foundation to integrate other semantic relations such as equivalent, disjoint, intersection and union.
APA, Harvard, Vancouver, ISO, and other styles
14

Hwang, Yuan-Chun. "Local and personalised models for prediction, classification and knowledge discovery on real world data modelling problems." Click here to access this resource online, 2009. http://hdl.handle.net/10292/776.

Full text
Abstract:
This thesis presents several novel methods to address some of the real world data modelling issues through the use of local and individualised modelling approaches. A set of real world data modelling issues such as modelling evolving processes, defining unique problem subspaces, identifying and dealing with noise, outliers, missing values, imbalanced data and irrelevant features, are reviewed and their impact on the models are analysed. The thesis has made nine major contributions to information science, includes four generic modelling methods, three real world application systems that apply these methods, a comprehensive review of the real world data modelling problems and a data analysis and modelling software. Four novel methods have been developed and published in the course of this study. They are: (1) DyNFIS – Dynamic Neuro-Fuzzy Inference System, (2) MUFIS – A Fuzzy Inference System That Uses Multiple Types of Fuzzy Rules, (3) Integrated Temporal and Spatial Multi-Model System, (4) Personalised Regression Model. DyNFIS addresses the issue of unique problem subspaces by identifying them through a clustering process, creating a fuzzy inference system based on the clusters and applies supervised learning to update the fuzzy rules, both antecedent and consequent part. This puts strong emphasis on the unique problem subspaces and allows easy to understand rules to be extracted from the model, which adds knowledge to the problem. MUFIS takes DyNFIS a step further by integrating a mixture of different types of fuzzy rules together in a single fuzzy inference system. In many real world problems, some problem subspaces were found to be more suitable for one type of fuzzy rule than others and, therefore, by integrating multiple types of fuzzy rules together, a better prediction can be made. The type of fuzzy rule assigned to each unique problem subspace also provides additional understanding of its characteristics. The Integrated Temporal and Spatial Multi-Model System is a different approach to integrating two contrasting views of the problem for better results. The temporal model uses recent data and the spatial model uses historical data to make the prediction. By combining the two through a dynamic contribution adjustment function, the system is able to provide stable yet accurate prediction on real world data modelling problems that have intermittently changing patterns. The personalised regression model is designed for classification problems. With the understanding that real world data modelling problems often involve noisy or irrelevant variables and the number of input vectors in each class may be highly imbalanced, these issues make the definition of unique problem subspaces less accurate. The proposed method uses a model selection system based on an incremental feature selection method to select the best set of features. A global model is then created based on this set of features and then optimised using training input vectors in the test input vector’s vicinity. This approach focus on the definition of the problem space and put emphasis the test input vector’s residing problem subspace. The novel generic prediction methods listed above have been applied to the following three real world data modelling problems: 1. Renal function evaluation which achieved higher accuracy than all other existing methods while allowing easy to understand rules to be extracted from the model for future studies. 2. Milk volume prediction system for Fonterra achieved a 20% improvement over the method currently used by Fonterra. 3. Prognoses system for pregnancy outcome prediction (SCOPE), achieved a more stable and slightly better accuracy than traditional statistical methods. These solutions constitute a contribution to the area of applied information science. In addition to the above contributions, a data analysis software package, NeuCom, was primarily developed by the author prior and during the PhD study to facilitate some of the standard experiments and analysis on various case studies. This is a full featured data analysis and modelling software that is freely available for non-commercial purposes (see Appendix A for more details). In summary, many real world problems consist of many smaller problems. It was found beneficial to acknowledge the existence of these sub-problems and address them through the use of local or personalised models. The rules extracted from the local models also brought about the availability of new knowledge for the researchers and allowed more in-depth study of the sub-problems to be carried out in future research.
APA, Harvard, Vancouver, ISO, and other styles
15

Seyedarabi, Faezeh. "Developing a model of teachers' web-based information searching : a study of search options and features to support personalised educational resource discovery." Thesis, University College London (University of London), 2013. http://discovery.ucl.ac.uk/10018062/.

Full text
Abstract:
This study has investigated the search options and features teachers use and prefer to have, when personalising their online search for teaching resources. This study focused on making web searching easier for UK teacher practitioners at primary, secondary and post-compulsory levels. In this study, a triangulated mixed method approach was carried out in a two phase iterative case study involving 75 teacher practitioners working in the UK educational setting. In this case study, a sequential evidence gathering method called ‘System Development Life Cycle’ (SDLC) was adapted linking findings obtained from the structured questionnaires, observations and semi-structured interviews in order to design, develop and test two versions of an experimental search tool called “PoSTech!”. This research has contributed to knowledge by offering a model of teachers’ web information needs and search behaviour. In this model twelve search options and features mostly used by teachers when personalising their search for finding online teaching resources via the revised search tool are listed, in order of popularity. A search options is selected by the teacher and features is the characteristic of an option teachers experiences. For example, search options 'Subject', ‘Age Group’, ‘Resource Type’, ‘Free and/ Paid resources’, ‘Search results language’, and search features that ‘Store search options selected by individual teachers and their returned results’. Teachers’ model of web information needs and search behaviour could be used by the Government, teacher trainers and search engine designers to gain an insight into the information needs and search behaviours of teachers when searching for online teaching resources by means of tackling technical barriers faced by teachers, when using the internet. In conclusion, the research work presented in this thesis has provided the initial and important steps towards understanding the web searching information needs and search behaviours of individual teachers, working in the UK educational setting.
APA, Harvard, Vancouver, ISO, and other styles
16

Honda, Raphael Rodrigues. "Modelagem e cômputo de métricas de interesse no contexto de modernização de sistemas legados." Universidade Federal de São Carlos, 2014. https://repositorio.ufscar.br/handle/ufscar/587.

Full text
Abstract:
Made available in DSpace on 2016-06-02T19:06:19Z (GMT). No. of bitstreams: 1 6441.pdf: 3762020 bytes, checksum: cf0babba26cd55b52382a3a068029d68 (MD5) Previous issue date: 2014-10-13<br>Universidade Federal de Sao Carlos<br>Maintaining legacy systems is a complex and expensive activity for many companies. An alternative to this problem is the Architecture-Driven Modernization (ADM), proposed by the OMG (Object Management Group). ADM is a set of principles that support the modernization of systems using models. The Knowledge Discovery Metamodel (KDM) is the main ADM metamodel and it is able to represent various characteristics of a system, such as source code, configuration files and GUI. Through a reverse engineering process supported by tools is possible to extract knowledge from legacy source code and store it in KDM metamodel instances. Another metamodel that is important to this project is the Structured Metrics Metamodel (SMM) that allows the specification of metrics and also the representation of the measurements results performed on KDM models. When we decide to modernize a legacy system, an alternative that aims to improve concerns modularization of a system is the Aspect-Oriented Programming. Considering this alternative, the main goal of this project is to present an approach to defining and computing concern metrics in instances of KDM metamodel. This kind of measurement needs a prior concern mining that make notes on system components indicating concerns which it implements. To achieve the project objective, a complete approach to measure concerns using ADM models was developed, this approached is composed by an extension of KDM metamodel for representing Aspect- Oriented Software (AO-KDM), a concern metrics library in SMM format (CCML) developed in order to be parameterized by the Modernization Engineer. Therefore, the metrics defined in this project can be reused in other projects. Furthermore, we have developed a tool (CMEE) capable of handling parameterization annotations (notes about concerns made by the mining tools) that allows that models annotated by different mining tools could be measured by SMM metrics.<br>Manter sistemas legados é uma atividade complexa e cara para muitas empresas. Uma alternativa para este problema é a Modernização Dirigida à Arquitetura (Architecture- Driven Modernization - ADM), proposta pelo OMG (Object Management Group). A ADM consiste em um conjunto de princípios que apoiam a modernização de sistemas utilizando modelos. O Knowledge Discovery Metamodel (KDM) é o principal metamodelo da ADM e é capaz de representar diversas características de um sistema, como código-fonte, arquivos de configuração e de interface gráfica. Por meio de um processo de engenharia reversa apoiado por ferramentas é possível extrair conhecimento do código-fonte legado e armazená-lo em instâncias do metamodelo KDM. Outro metamodelo da ADM pertinente a este projeto é o Structured Metrics Metamodel (SMM) que torna possível a especificação de métricas e também a representação dos resultados de medições realizadas em modelos KDM. Quando decide-se modernizar um sistema legado, uma alternativa que procura melhorar o nível de modularização dos interesses de um sistema é a orientação a aspetos. Considerando essa alternativa, o objetivo deste projeto é apresentar uma abordagem para definição e aplicação de métricas de interesse em instâncias do metamodelo KDM. Esse tipo de medição precisa de uma mineração de interesses prévia, que realiza anotações nos componentes do sistema indicando qual interesse ele implementa. Para alcançar o objetivo do projeto, foi desenvolvida uma abordagem completa de medição de interesses utilizando modelos da ADM, composta por uma extensão do KDM para a representação de software orientado a aspectos (AO-KDM), uma biblioteca de métricas de interesses no formato SMM (CCML) desenvolvida com o intuito de ser parametrizável pelo Engenheiro de Modernização. Portanto, as métricas definidas neste projeto podem ser reusadas em outros projetos. Além disso, foi desenvolvida uma ferramenta de apoio computacional (CMEE) capaz de lidar com parametrização de anotações (anotações de interesses realizadas por ferramentas de mineração) que permite que modelos anotados com diferentes ferramentas de mineração possam ser medidos por métricas SMM.
APA, Harvard, Vancouver, ISO, and other styles
17

Hopkins, Mark E. "A Study of Physicians' Serendipitous Knowledge Discovery: An Evaluation of Spark and the IF-SKD Model in a Clinical Setting." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1157586/.

Full text
Abstract:
This research study is conducted to test Workman, Fiszman, Rindflesch and Nahl's information flow-serendipitous knowledge discovery (IF-SKD) model of information behavior, in a clinical care context. To date, there have been few attempts to model the serendipitous knowledge discovery of physicians. Due to the growth and complexity of the biomedical literature, as well as the increasingly specialized nature of medicine, there is a need for advanced systems that can quickly present information and assist physicians to discover new knowledge. The National Library of Medicine's (NLM) Lister Hill Center for Biocommunication's Semantic MEDLINE project is focused on identifying and visualizing semantic relationships in the biomedical literature to support knowledge discovery. This project led to the development of a new information discovery system, Spark. The aim of Spark is to promote serendipitous knowledge discovery by assisting users in maximizing the use of their conceptual short-term memory to iteratively search for, engage, clarify and evaluate information presented from the biomedical literature. Using Spark, this study analyzes the IF- SKD model by capturing and analyzing physician feedback. The McCay-Peet, Toms and Kelloway's Perception of Serendipity and Serendipitous Digital Environment (SDE) questionnaires are used. Results are evaluated to determine whether Spark contributes to physicians' serendipitous knowledge discovery and the ability of the IF-SKD ability to capture physicians' information behavior in a clinical setting.
APA, Harvard, Vancouver, ISO, and other styles
18

Cleverley, Paul Hugh. "Re-examining and re-conceptualising enterprise search and discovery capability : towards a model for the factors and generative mechanisms for search task outcomes." Thesis, Robert Gordon University, 2017. http://hdl.handle.net/10059/2403.

Full text
Abstract:
Many organizations are trying to re-create the ‘Google experience’, to find and exploit their own corporate information. However, there is evidence that finding information in the workplace using search engine technology has remained difficult, with socio-technical elements largely neglected in the literature. Explication of the factors and generative mechanisms (ultimate causes) to effective search task outcomes (user satisfaction, search task performance and serendipitous encountering) may provide a first step in making improvements. A transdisciplinary (holistic) lens was applied to Enterprise Search and Discovery capability, combining critical realism and activity theory with complexity theories to one of the world’s largest corporations. Data collection included an in-situ exploratory search experiment with 26 participants, focus groups with 53 participants and interviews with 87 business professionals. Thousands of user feedback comments and search transactions were analysed. Transferability of findings was assessed through interviews with eight industry informants and ten organizations from a range of industries. A wide range of informational needs were identified for search filters, including a need to be intrigued. Search term word co-occurrence algorithms facilitated serendipity to a greater extent than existing methods deployed in the organization surveyed. No association was found between user satisfaction (or self assessed search expertise) with search task performance and overall performance was poor, although most participants had been satisfied with their performance. Eighteen factors were identified that influence search task outcomes ranging from user and task factors, informational and technological artefacts, through to a wide range of organizational norms. Modality Theory (Cybersearch culture, Simplicity and Loss Aversion bias) was developed to explain the study observations. This proposes that at all organizational levels there are tendencies for reductionist (unimodal) mind-sets towards search capability leading to ‘fixes that fail’. The factors and mechanisms were identified in other industry organizations suggesting some theory generalizability. This is the first socio-technical analysis of Enterprise Search and Discovery capability. The findings challenge existing orthodoxy, such as the criticality of search literacy (agency) which has been neglected in the practitioner literature in favour of structure. The resulting multifactorial causal model and strategic framework for improvement present opportunities to update existing academic models in the IR, LIS and IS literature, such as the DeLone and McLean model for information system success. There are encouraging signs that Modality Theory may enable a reconfiguration of organizational mind-sets that could transform search task outcomes and ultimately business performance.
APA, Harvard, Vancouver, ISO, and other styles
19

Kučera, Petr. "Meta-učení v oblasti dolování dat." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236213.

Full text
Abstract:
This paper describes the use of meta-learning in the area of data mining. It describes the problems and tasks of data mining where meta-learning can be applied, with a focus on classification. It provides an overview of meta-learning techniques and their possible application in data mining, especially  model selection. It describes design and implementation of meta-learning system to support classification tasks in data mining. The system uses statistics and information theory to characterize data sets stored in the meta-knowledge base. The meta-classifier is created from the base and predicts the most suitable model for the new data set. The conclusion discusses results of the experiments with more than 20 data sets representing clasification tasks from different areas and suggests possible extensions of the project.
APA, Harvard, Vancouver, ISO, and other styles
20

KAVOOSIFAR, MOHAMMAD REZA. "Data Mining and Indexing Big Multimedia Data." Doctoral thesis, Politecnico di Torino, 2019. http://hdl.handle.net/11583/2742526.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Munch, Mélanie. "Améliorer le raisonnement dans l'incertain en combinant les modèles relationnels probabilistes et la connaissance experte." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASB011.

Full text
Abstract:
Cette thèse se concentre sur l'intégration des connaissances d'experts pour améliorer le raisonnement dans l'incertitude. Notre objectif est de guider l'apprentissage des relations probabilistes avec les connaissances d'experts pour des domaines décrits par les ontologies.Pour ce faire, nous proposons de coupler des bases de connaissances (BC) et une extension orientée objet des réseaux bayésiens, les modèles relationnels probabilistes (PRM). Notre objectif est de compléter l'apprentissage statistique par des connaissances expertes afin d'apprendre un modèle aussi proche que possible de la réalité et de l'analyser quantitativement (avec des relations probabilistes) et qualitativement (avec la découverte causale). Nous avons développé trois algorithmes à travers trois approches distinctes, dont les principales différences résident dans leur automatisation et l'intégration (ou non) de la supervision d'experts humains.L'originalité de notre travail est la combinaison de deux philosophies opposées : alors que l'approche bayésienne privilégie l'analyse statistique des données fournies pour raisonner avec, l'approche ontologique est basée sur la modélisation de la connaissance experte pour représenter un domaine. La combinaison de la force des deux permet d'améliorer à la fois le raisonnement dans l'incertitude et la connaissance experte<br>This thesis focuses on integrating expert knowledge to enhance reasoning under uncertainty. Our goal is to guide the probabilistic relations’ learning with expert knowledge for domains described by ontologies.To do so we propose to couple knowledge bases (KBs) and an oriented-object extension of Bayesian networks, the probabilistic relational models (PRMs). Our aim is to complement the statistical learning with expert knowledge in order to learn a model as close as possible to the reality and analyze it quantitatively (with probabilistic relations) and qualitatively (with causal discovery). We developped three algorithms throught three distinct approaches, whose main differences lie in their automatisation and the integration (or not) of human expert supervision.The originality of our work is the combination of two broadly opposed philosophies: while the Bayesian approach favors the statistical analysis of the given data in order to reason with it, the ontological approach is based on the modelization of expert knowledge to represent a domain. Combining the strenght of the two allows to improve both the reasoning under uncertainty and the expert knowledge
APA, Harvard, Vancouver, ISO, and other styles
22

Cavadenti, Olivier. "Contribution de la découverte de motifs à l’analyse de collections de traces unitaires." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI084/document.

Full text
Abstract:
Dans le contexte manufacturier, un ensemble de produits sont acheminés entre différents sites avant d’être vendus à des clients finaux. Chaque site possède différentes fonctions : création, stockage, mise en vente, etc. Les données de traçabilités décrivent de manière riche (temps, position, type d’action,…) les événements de création, acheminement, décoration, etc. des produits. Cependant, de nombreuses anomalies peuvent survenir, comme le détournement de produits ou la contrefaçon d’articles par exemple. La découverte des contextes dans lesquels surviennent ces anomalies est un objectif central pour les filières industrielles concernées. Dans cette thèse, nous proposons un cadre méthodologique de valorisation des traces unitaires par l’utilisation de méthodes d’extraction de connaissances. Nous montrons comment la fouille de données appliquée à des traces transformées en des structures de données adéquates permet d’extraire des motifs intéressants caractéristiques de comportements fréquents. Nous démontrons que la connaissance a priori, celle des flux de produits prévus par les experts et structurée sous la forme d’un modèle de filière, est utile et efficace pour pouvoir classifier les traces unitaires comme déviantes ou non, et permettre d’extraire les contextes (fenêtre de temps, type de produits, sites suspects,…) dans lesquels surviennent ces comportements anormaux. Nous proposons de plus une méthode originale pour détecter les acteurs de la chaîne logistique (distributeurs par exemple) qui auraient usurpé une identité (faux nom). Pour cela, nous utilisons la matrice de confusion de l’étape de classification des traces de comportement pour analyser les erreurs du classifieur. L’analyse formelle de concepts (AFC) permet ensuite de déterminer si des ensembles de traces appartiennent en réalité au même acteur<br>In a manufacturing context, a product is moved through different placements or sites before it reaches the final customer. Each of these sites have different functions, e.g. creation, storage, retailing, etc. In this scenario, traceability data describes in a rich way the events a product undergoes in the whole supply chain (from factory to consumer) by recording temporal and spatial information as well as other important elements of description. Thus, traceability is an important mechanism that allows discovering anomalies in a supply chain, like diversion of computer equipment or counterfeits of luxury items. In this thesis, we propose a methodological framework for mining unitary traces using knowledge discovery methods. We show how the process of data mining applied to unitary traces encoded in specific data structures allows extracting interesting patterns that characterize frequent behaviors. We demonstrate that domain knowledge, that is the flow of products provided by experts and compiled in the industry model, is useful and efficient for classifying unitary traces as deviant or not. Moreover, we show how data mining techniques can be used to provide a characterization for abnormal behaviours (When and how did they occur?). We also propose an original method for detecting identity usurpations in the supply chain based on behavioral data, e.g. distributors using fake identities or concealing them. We highlight how the knowledge discovery in databases, applied to unitary traces encoded in specific data structures (with the help of expert knowledge), allows extracting interesting patterns that characterize frequent behaviors. Finally, we detail the achievements made within this thesis with the development of a platform of traces analysis in the form of a prototype
APA, Harvard, Vancouver, ISO, and other styles
23

Hlavička, Ladislav. "Dolování asociačních pravidel z datových skladů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-235501.

Full text
Abstract:
This thesis deals with association rules mining over data warehouses. In the first part the reader will be familiarized with terms like knowledge discovery in databases and data mining. The following part of the work deals with data warehouses. Further the association analysis, the association rules, their types and mining possibilities are described. The architecture of Microsoft SQL Server and its tools for working with data warehouses are presented. The rest of the thesis includes description and analysis of the Star-miner algorithm, design, implementation and testing of the application.
APA, Harvard, Vancouver, ISO, and other styles
24

Pumprla, Ondřej. "Získávání znalostí z datových skladů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236715.

Full text
Abstract:
This Master's thesis deals with the principles of the data mining process, especially with the mining  of association rules. The theoretical apparatus of general description and principles of the data warehouse creation is set. On the basis of this theoretical knowledge, the application for the association rules mining is implemented. The application requires the data in the transactional form or the multidimensional data organized in the Star schema. The implemented algorithms for finding  of the frequent patterns are Apriori and FP-tree. The system allows the variant setting of parameters for mining process. Also, the validation tests and efficiency proofs were accomplished. From the point of view of the association rules searching support, the resultant application is more applicable and robust than the existing compared systems SAS Miner and Oracle Data Miner.
APA, Harvard, Vancouver, ISO, and other styles
25

Jaroš, Ondřej. "Získávání znalostí z obrazových databází." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237154.

Full text
Abstract:
This thesis is focused on knowledge discovery from databases, especially on methods of classification and prediction. These methods are described in detail.  Furthermore, this work deals with multimedia databases and the way these databases store data. In particular, the method for processing low-level image and video data is described.  The practical part of the thesis focuses on the implementation of this GMM method used for extracting low-level features of video data and images. In other parts, input data and tools, which the implemented method was compared with, are described.  The last section focuses on experiments comparing extraction efficiency features of high-level attributes of low-level data and the methods implemented in selected classification tools LibSVM.
APA, Harvard, Vancouver, ISO, and other styles
26

Jurčák, Petr. "Získávání znalostí z multimediálních databází." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-236662.

Full text
Abstract:
This master's thesis is dedicated to theme of knowledge discovery in Multimedia Databases, especially basic methods of classification and prediction used for data mining. The other part described about extraction of low level features from video data and images and summarizes information about content-based search in multimedia content and indexing this type of data. Final part is dedicated to implementation Gaussian mixtures model for classification and compare the final result with other method SVM.
APA, Harvard, Vancouver, ISO, and other styles
27

Ballout, Ali. "Apprentissage actif pour la découverte d'axiomes." Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4026.

Full text
Abstract:
Cette thèse aborde le défi de l'évaluation des formules logiques candidates, avec un accent particulier sur les axiomes, en combinant de manière synergique l'apprentissage automatique et le raisonnement symbolique. Cette approche innovante facilite la découverte automatique d'axiomes, principalement dans la phase d'évaluation des axiomes candidats générés. La recherche vise à résoudre le problème de la validation efficace et précise de ces candidats dans le contexte plus large de l'acquisition de connaissances sur le Web sémantique.Reconnaissant l'importance des heuristiques de génération existantes pour les axiomes candidats, cette recherche se concentre sur l'avancement de la phase d'évaluation de ces candidats. Notre approche consiste à utiliser ces candidats basés sur des heuristiques, puis à évaluer leur compatibilité et leur cohérence avec les bases de connaissances existantes. Le processus d'évaluation, qui nécessite généralement beaucoup de calculs, est révolutionné par le développement d'un modèle prédictif qui évalue efficacement l'adéquation de ces axiomes en tant que substitut du raisonnement traditionnel. Ce modèle innovant réduit considérablement les exigences en matière de calcul, en utilisant le raisonnement comme un "oracle" occasionnel pour classer les axiomes complexes lorsque cela est nécessaire.L'apprentissage actif joue un rôle essentiel dans ce cadre. Il permet à l'algorithme d'apprentissage automatique de sélectionner des données spécifiques pour l'apprentissage, améliorant ainsi son efficacité et sa précision avec un minimum de données étiquetées. La thèse démontre cette approche dans le contexte du Web sémantique, où le raisonneur joue le rôle d'"oracle" et où les nouveaux axiomes potentiels représentent des données non étiquetées.Cette recherche contribue de manière significative aux domaines du raisonnement automatique, du traitement du langage naturel et au-delà, en ouvrant de nouvelles possibilités dans des domaines tels que la bio-informatique et la preuve automatique de théorèmes. En mariant efficacement l'apprentissage automatique et le raisonnement symbolique, ces travaux ouvrent la voie à des processus de découverte de connaissances plus sophistiqués et autonomes, annonçant un changement de paradigme dans la manière dont nous abordons et exploitons la vaste étendue de données sur le web sémantique<br>This thesis addresses the challenge of evaluating candidate logical formulas, with a specific focus on axioms, by synergistically combining machine learning with symbolic reasoning. This innovative approach facilitates the automatic discovery of axioms, primarily in the evaluation phase of generated candidate axioms. The research aims to solve the issue of efficiently and accurately validating these candidates in the broader context of knowledge acquisition on the semantic Web.Recognizing the importance of existing generation heuristics for candidate axioms, this research focuses on advancing the evaluation phase of these candidates. Our approach involves utilizing these heuristic-based candidates and then evaluating their compatibility and consistency with existing knowledge bases. The evaluation process, which is typically computationally intensive, is revolutionized by developing a predictive model that effectively assesses the suitability of these axioms as a surrogate for traditional reasoning. This innovative model significantly reduces computational demands, employing reasoning as an occasional "oracle" to classify complex axioms where necessary.Active learning plays a pivotal role in this framework. It allows the machine learning algorithm to select specific data for learning, thereby improving its efficiency and accuracy with minimal labeled data. The thesis demonstrates this approach in the context of the semantic Web, where the reasoner acts as the "oracle," and the potential new axioms represent unlabeled data.This research contributes significantly to the fields of automated reasoning, natural language processing, and beyond, opening up new possibilities in areas like bioinformatics and automated theorem proving. By effectively marrying machine learning with symbolic reasoning, this work paves the way for more sophisticated and autonomous knowledge discovery processes, heralding a paradigm shift in how we approach and leverage the vast expanse of data on the semantic Web
APA, Harvard, Vancouver, ISO, and other styles
28

Tsai, Ying-Chieh, and 蔡英傑. "A New knowledge Discovery Model for Extracting Diagnosis Rules of Manufacturing Process." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/76459094305707116692.

Full text
Abstract:
碩士<br>國立雲林科技大學<br>資訊管理系碩士班<br>93<br>Information classification plays an important role in business decision-making task. As information technology advances, large amounts of information stored in database will not be handled or even seen by human beings. Moreover, while the problem of uncertainty and noise is considered, it would be very difficult to clearly classify the data. To solve such problems, a new knowledge discovery model based on soft computing is proposed. The proposed contributions includes: (1) a new data transformation algorithm used for transforming categorical class variable into numerical one; (2) a new algorithm Modified Correlation-based Feature Selection (MCFS) for quickly identifying and screening irrelevant, redundant, and noisy features, (3) a new algorithm Modified Minimum Entropy Principle Algorithm (MMEPA) for constructing membership functions for fuzzifing reduced dataset, and (4) an algorithm of extracting classification rules by Variable Precision Rough Set Model (VP-model). In verification and comparison, from the results of iris and industrial conveyor belt classification problems conducted by the proposed model, some useless features are discarded and the extracted classification rules can obtain a higher classification accuracy rate than that of some existing methods.
APA, Harvard, Vancouver, ISO, and other styles
29

Chen, Tso-Lin, and 陳作琳. "A Fuzzy Knowledge Discovery Model Using Fuzzy Decision Tree and Fuzzy Adaptive Learning Control Network." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/97631994096302009774.

Full text
Abstract:
碩士<br>中原大學<br>工業工程研究所<br>91<br>To explore business information and operation experience from relational databases is a challenge, because many cause-effect relationships and business rules are fuzzy. It is therefore difficult for a decision-maker to discover important factors. At first, we defined fuzzy sets of the membership functions by Dodgson’s function and quartile statistic. Next, we developed a data-mining model base on both the fuzzy decision tree and fuzzy adaptive learning control network—these two concepts help generate concrete rules. This research adopted a decision-tree based learning algorithm and back-propagation neuro network to develop a fuzzy decision tree and fuzzy adaptive learning control network. In order to refine rules, we took the advantage of the Chi-square test of homogeneity to reduce the connection of weight. This research also used the Prediction of Tardiness in Semi-conductor Testing and the Prediction of Grades by an Advanced General Knowledge Course as samples. The results showed that the two models generated a compact fuzzy rule-base that yielded high accuracy.
APA, Harvard, Vancouver, ISO, and other styles
30

Lin, Ynu-Syuan, and 林昀萱. "A Fuzzy-Based Knowledge Discovery Model for Capacity Allocation and Dispatching in Final Test of Semiconductor Industry." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/26212052553836008148.

Full text
Abstract:
碩士<br>中原大學<br>工業工程研究所<br>93<br>Statistics techniques have been successfully applied to process ample data. However, one of the drawback of statistical approaches is that one may not be able to learn and discover knowledge from a large data base wth noise, as applying the approaches. The purpose of this thesis is thus, in the conext of semiconductor final testing industry, to develop a dispatching knowledge base by using artificial intelligence techniques, to extract useful business rules from the data. One of the most challenging decisions regarding production in semiconductor testing industry is to select the most appropriate dispatching rule that can be employed in the shop floor to achieve high manufacturing performance against a changing environment. Semiconductor testing is characterized by multi-resource constraints and has many performance measures from the perspective of controlling and managing the system. In the study, we develop a hybrid knowledge discovery model, using a conjunction of decision tree and back-propagation neural network, to determine an appropriate dispatching rule using production data with noise information, and to predict its performance. Experiments have shown that the proposed decision tree found the most suitable dispatching rule given a specific performance measure and system status, and the back propagation neural network then predicted precisely the performance of the selected rule. Second, this study presents a knowledge discovery model which uses a genetic algorithm to find the best priority sequence of customer orders for resource allocation and a fuzzy logic model to allocate the resources and determine the order-completion times, following the priority sequence of orders. Experiments showed that by using realistic resource data and randomly generated orders our proposed models have achieved promising results.
APA, Harvard, Vancouver, ISO, and other styles
31

Liu, Wei-Lun, and 劉偉倫. "An Integrated Model of Data Analysis and Knowledge Discovery for the Expense Control And Management of National Health Insurance." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/91808500293526526055.

Full text
Abstract:
碩士<br>元智大學<br>資訊工程學系<br>89<br>The main purpose of National Health Insurance (NHI) is to promote the medical quality, to share the financial risk of medical treatment, and to alleviate the financial burden of each individual. However, since it has been established, the financial shortfall of NHI has been the focal point of many debates. The reason includes the consumers'' behaviors, the waste of health resources, and the insurance system itself. Among these various causes, the most trouble one is the abuse of medical resources. In this thesis, aiming at the control of medical expenditure, not only do we propose an integrated model of data analysis and knowledge discovery for the expense control of NHI, but also develop a management model for monitoring the anomalous claims of medical services. A prototype implemented in the NHI reveals that the claims of medical services could be effectively controlled. Besides, shortening the time of analysis process, improving the abnormal situation, and decreasing the waste of medical expenditure are all bonuses to this system. On the other hand, we can also record the process of analyzing abnormal hospitals to get experiences, which could be offered as a decision support while we analyze the medical expenditure. Finally, the simulation results show that the data classification rate is up to 98.64%, and the sensitivity is up to 90.1%. These results present that this automatic selection of abnormal medical expense filings model is effective. It will do great help to the present NHI in the sampling audit operation.
APA, Harvard, Vancouver, ISO, and other styles
32

Alakari, Alaa A. "A situation refinement model for complex event processing." Thesis, 2020. http://hdl.handle.net/1828/12535.

Full text
Abstract:
Complex Event Processing (CEP) systems aim at processing large flows of events to discover situations of interest (SOI). Primarily, CEP uses predefined pattern templates to detect occurrences of complex events in an event stream. Extracting complex event is achieved by employing techniques such as filtering and aggregation to detect complex patterns of many simple events. In general, CEP systems rely on domain experts to de fine complex pattern rules to recognize SOI. However, the task of fine tuning complex pattern rules in the event streaming environment face two main challenges: the issue of increased pattern complexity and the event streaming constraints where such rules must be acquired and processed in near real-time. Therefore, to fine-tune the CEP pattern to identify SOI, the following requirements must be met: First, a minimum number of rules must be used to re fine the CEP pattern to avoid increased pattern complexity, and second, domain knowledge must be incorporated in the refinement process to improve awareness about emerging situations. Furthermore, the event data must be processed upon arrival to cope with the continuous arrival of events in the stream and to respond in near real-time. In this dissertation, we present a Situation Refi nement Model (SRM) that considers these requirements. In particular, by developing a Single-Scan Frequent Item Mining algorithm to acquire the minimal number of CEP rules with the ability to adjust the level of re refinement to t the applied scenario. In addition, a cost-gain evaluation measure to determine the best tradeoff to identify a particular SOI is presented.<br>Graduate
APA, Harvard, Vancouver, ISO, and other styles
33

Azevedo, Ana Isabel Rojão Lourenço. "Data mining languages for business intelligence." Doctoral thesis, 2012. http://hdl.handle.net/1822/22892.

Full text
Abstract:
Tese de doutoramento in Information Systems and Technologies (area of Engineering and Management Information Systems)<br>Desde que Lunh usou, pela primeira vez, em 1958, o termo Business Intelligence (BI), grandes transformações se operaram na área dos sistemas e tecnologias de informação e, em especial, na área dos sistemas de apoio à decisão. Atualmente, os sistemas de BI são amplamente utilizados nas organizações e a sua importância estratégica é largamente reconhecida. Estes sistemas apresentam-se como essenciais para um completo conhecimento do negócio e como uma ferramenta insubstituível no apoio à tomada de decisão. A divulgação das ferramentas de Data Mining (DM) tem vindo a aumentar na área do BI, assim como o reconhecimento da relevância da sua utilização nos sistemas de BI empresariais. As ferramentas de BI são ferramentas amigáveis, iterativas e interativas, permitindo aos utilizadores finais um acesso fácil. Desta forma, é possível ao utilizador final manipular diretamente os dados, tendo assim a possibilidade de extrair todo o valor para o negócio neles contido. Um dos problemas apontados na utilização do DM na área do BI prende-se com o facto de os modelos de DM serem, em geral, demasiado complexos para que os utilizadores de negócio os possam manipular diretamente, contrariamente ao que ocorre com as outras ferramentas de BI. Neste contexto, foi identificado como problema de investigação a não existência de ferramentas de BI que possibilitem ao utilizador de negócio a manipulação direta dos modelos de DM e, consequentemente, não possibilitando extrair todo o valor potencial neles contidos. Este aspeto reveste-se de particular importância num universo empresarial no qual a concorrência é cada vez mais forte e no qual o conhecimento do negócio, das variáveis envolvidas e dos potenciais cenários representam um papel fundamental para as organizações poderem concorrer num mercado extremamente exigente. Considerando que os sistemas de BI assentam, maioritariamente, sobre sistemas operacionais que utilizam sobretudo o modelo relacional de bases de dados, a investigação efetuada inspirouse nos conceitos ligados ao modelo relacional de bases de dados e nas linguagens a ele associadas em particular as linguagens Query-By-Example (QBE). Estas linguagens têm uma forte componente de interactividade, são amigáveis e permitem iteratividade e são amplamente utilizadas em ambiente de negócio pelos utilizadores finais. Têm vindo a ser desenvolvidos esforços no sentido do desenvolvimento de padrões e normas na área do DM, sendo dada grande relevância ao tema das bases de dados indutivas. No contexto das bases de dados indutivas é dada grande relevância às chamadas linguagens de DM. Estes conceitos serviram, igualmente, de inspiração a esta investigação. Apesar da importância destas linguagens de DM, elas não estão orientadas para os utilizadores finais em ambientes de negócio. Ligando os conceitos relacionados com as linguagens QBE e com as linguagens de DM, foi concebida e implementada uma linguagem de DM para BI, à qual foi dado o nome QMBE. Esta nova linguagem é por natureza amigável, iterativa e interativa, isto é, apresenta as mesmas características que as ferramentas de BI habituais permitindo aos utilizadores finais a manipulação direta dos modelos de DM e, deste modo, aceder a todo o valor potencial desses modelos com todos as vantagens que daí poderão advir. Utilizando um protótipo de um sistema de BI, a linguagem foi implementada, testada e avaliada conceptualmente. Verificou-se que a linguagem possui as propriedades desejadas, a saber, é amigável, iterativa, interativa. Finalmente, a linguagem foi avaliada por utilizadores finais que já tinham experiência anterior na utilização de DM em contexto de BI. Verificou-se que na ótica destes utilizadores a utilização da linguagem apresenta vantagens em relação à utilização tradicional de DM no âmbito do BI.<br>Since Lunh first used the term Business Intelligence (BI) in 1958, major transformations happened in the field of information systems and technologies, especially in the area of decision support systems. Nowadays, BI systems are widely used in organizations and their strategic importance is clearly recognized. These systems present themselves as an essential part of a complete knowledge of business and an irreplaceable tool in the support to decision making. The dissemination of data mining (DM) tools is increasing in the BI field, as well as the acknowledgement of the relevance of its usage in enterprise BI systems. BI tools are friendly, iterative and interactive, allowing business users an easy access. This way, the user can directly manipulate data, thus having the possibility to extract all the value contained into that business data. One of the problems noted in the use of DM in the field of BI is related to the fact that DM models are, generally, too complex in order to be directly manipulated by business users, as opposite to other BI tools. Within this context, the nonexistence of BI tools allowing business users the direct manipulation of DM models was identified as the research problem, since that, as a consequence of business users not directly manipulating DM models, they can be not able of extracting all the potential value contained in DM models. This aspect has a particular relevance in an entrepreneurial universe where competition is stronger every day and the knowledge of the business, the variables involved and the possible scenarios play a fundamental role in allowing organizations to compete in an extremely demanding market. Considering that the majority of BI systems are built on top of operational systems, which use mainly the relational model for databases, the research was inspired on the concepts related to this model and associated languages in particular Query-By-Example (QBE) languages. These languages are widely used by business users in business environments, and have got a strong interactivity component, are user-friendly, and allow for iterativeness. Efforts are being developed in order to create standards and rules in the field of DM with great relevance being given to the subject of inductive databases. Within the context of inductive databases a great relevance is given to the so called DM languages. These concepts were also an inspiration for this research. Despite their importance, these languages are not oriented to business users in business environments. Linking concepts related with QBE languages and with DM languages, a new DM language for BI, named as Query-Models-By-Example (QMBE) was conceived and implemented. This new language is, by nature, user-friendly, iterative and interactive; it presents the same characteristics as the usual BI tools allowing business users the direct manipulation of DM models and, through this, the access to the potential value of these models with all the advantages that may arise. Using a BI system prototype, the language was implemented, tested, and conceptually evaluated. It has been verified that the language possesses the desired properties, namely, being userfriendly, iterative, and interactive. The language was evaluated later by business users who were already experienced in using DM within the context of BI. It has been verified that, according to these users, using the language presents advantages when comparing to the traditional use of DM within BI.
APA, Harvard, Vancouver, ISO, and other styles
34

Dlamini, Wisdom Mdumiseni Dabulizwe. "Spatial analysis of invasive alien plant distribution patterns and processes using Bayesian network-based data mining techniques." Thesis, 2016. http://hdl.handle.net/10500/20692.

Full text
Abstract:
Invasive alien plants have widespread ecological and socioeconomic impacts throughout many parts of the world, including Swaziland where the government declared them a national disaster. Control of these species requires knowledge on the invasion ecology of each species including how they interact with the invaded environment. Species distribution models are vital for providing solutions to such problems including the prediction of their niche and distribution. Various modelling approaches are used for species distribution modelling albeit with limitations resulting from statistical assumptions, implementation and interpretation of outputs. This study explores the usefulness of Bayesian networks (BNs) due their ability to model stochastic, nonlinear inter-causal relationships and uncertainty. Data-driven BNs were used to explore patterns and processes influencing the spatial distribution of 16 priority invasive alien plants in Swaziland. Various BN structure learning algorithms were applied within the Weka software to build models from a set of 170 variables incorporating climatic, anthropogenic, topo-edaphic and landscape factors. While all the BN models produced accurate predictions of alien plant invasion, the globally scored networks, particularly the hill climbing algorithms, performed relatively well. However, when considering the probabilistic outputs, the constraint-based Inferred Causation algorithm which attempts to generate a causal BN structure, performed relatively better. The learned BNs reveal that the main pathways of alien plants into new areas are ruderal areas such as road verges and riverbanks whilst humans and human activity are key driving factors and the main dispersal mechanism. However, the distribution of most of the species is constrained by climate particularly tolerance to very low temperatures and precipitation seasonality. Biotic interactions and/or associations among the species are also prevalent. The findings suggest that most of the species will proliferate by extending their range resulting in the whole country being at risk of further invasion. The ability of BNs to express uncertain, rather complex conditional and probabilistic dependencies and to combine multisource data makes them an attractive technique for species distribution modeling, especially as joint invasive species distribution models (JiSDM). Suggestions for further research are provided including the need for rigorous invasive species monitoring, data stewardship and testing more BN learning algorithms.<br>Environmental Sciences<br>D. Phil. (Environmental Science)
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography