Dissertations / Theses: 'Sequential rules and patterns'

1

Wu, Sheng-Tang. "Knowledge discovery using pattern taxonomy model in text mining." Queensland University of Technology, 2007. http://eprints.qut.edu.au/16675/.

Full text

Abstract:

In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches.

APA, Harvard, Vancouver, ISO, and other styles

2

Andrade, Rodrigo Bomfim de. "Sequential cost-reimbursement rules." reponame:Repositório Institucional do FGV, 2014. http://hdl.handle.net/10438/11736.

Full text

Abstract:

Submitted by Rodrigo Andrade (rodrigo.bomfim@fgvmail.br) on 2014-04-24T15:32:00Z No. of bitstreams: 1 diss_RodrigoAndrade.pdf: 591074 bytes, checksum: 533c9582d0fd79341698968896535e09 (MD5)
Approved for entry into archive by ÁUREA CORRÊA DA FONSECA CORRÊA DA FONSECA (aurea.fonseca@fgv.br) on 2014-04-30T19:55:08Z (GMT) No. of bitstreams: 1 diss_RodrigoAndrade.pdf: 591074 bytes, checksum: 533c9582d0fd79341698968896535e09 (MD5)
Approved for entry into archive by Marcia Bacha (marcia.bacha@fgv.br) on 2014-05-08T13:35:03Z (GMT) No. of bitstreams: 1 diss_RodrigoAndrade.pdf: 591074 bytes, checksum: 533c9582d0fd79341698968896535e09 (MD5)
Made available in DSpace on 2014-05-08T13:40:17Z (GMT). No. of bitstreams: 1 diss_RodrigoAndrade.pdf: 591074 bytes, checksum: 533c9582d0fd79341698968896535e09 (MD5) Previous issue date: 2014-03-17
This paper studies cost-sharing rules under dynamic adverse selection. We present a typical principal-agent model with two periods, set up in Laffont and Tirole's (1986) canonical regulation environment. At first, when the contract is signed, the firm has prior uncertainty about its efficiency parameter. In the second period, the firm learns its efficiency and chooses the level of cost-reducing effort. The optimal mechanism sequentially screens the firm's types and achieves a higher level of welfare than its static counterpart. The contract is indirectly implemented by a sequence of transfers, consisting of a fixed advance payment based on the reported cost estimate, and an ex-post compensation linear in cost performance.
Este trabalho estuda regras de compartilhamento de custos sob seleção adversa dinâmica. Apresentamos um modelo típico de agente-principal com dois períodos, fundamentado no ambiente canônico de regulação de Laffont e Tirole (1986). De início, quando da assinatura do contrato, a firma possui incerteza prévia sobre seu parâmetro de eficiência. No segundo período, a firma aprende a sua eficiência e escolhe o nível de esforço para reduzir custos. O mecanismo ótimo efetua screening sequencial entre os tipos da firma e atinge um nível de bem-estar superior ao alcançado pelo mecanismo estático. O contrato é implementado indiretamente por uma sequência de transferências, que consiste em um pagamento fixo antecipado, baseado na estimativa de custos reportada pela firma, e uma compensação posterior linear no custo realizado.

APA, Harvard, Vancouver, ISO, and other styles

3

João, Rafael Stoffalette. "Mineração de padrões sequenciais e geração de regras de associação envolvendo temporalidade." Universidade Federal de São Carlos, 2015. https://repositorio.ufscar.br/handle/ufscar/8923.

Full text

Abstract:

Submitted by Aelson Maciera (aelsoncm@terra.com.br) on 2017-08-07T19:16:02Z No. of bitstreams: 1 DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5)
Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-07T19:18:39Z (GMT) No. of bitstreams: 1 DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5)
Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-07T19:18:50Z (GMT) No. of bitstreams: 1 DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5)
Made available in DSpace on 2017-08-07T19:28:30Z (GMT). No. of bitstreams: 1 DissRSJ.pdf: 7098556 bytes, checksum: 78b5b020899e1b4ef3e1fefb18d32443 (MD5) Previous issue date: 2015-05-07
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Data mining aims at extracting useful information from a Database (DB). The mining process enables, also, to analyze the data (e.g. correlations, predictions, chronological relationships, etc.). The work described in this document proposes an approach to deal with temporal knowledge extraction from a DB and describes the implementation of this approach, as the computational system called S_MEMIS+AR. The system focuses on the process of finding frequent temporal patterns in a DB and generating temporal association rules, based on the elements contained in the frequent patterns identified. At the end of the process performs an analysis of the temporal relationships between time intervals associated with the elements contained in each pattern using the binary relationships described by the Allen´s Interval Algebra. Both, the S_MEMISP+AR and the algorithm that the system implements, were subsidized by the Apriori, the MEMISP and the ARMADA approaches. Three experiments considering two different approaches were conducted with the S_MEMISP+AR, using a DB of sale records of products available in a supermarket. Such experiments were conducted to show that each proposed approach, besides inferring new knowledge about the data domain and corroborating results that reinforce the implicit knowledge about the data, also promotes, in a global way, the refinement and extension of the knowledge about the data.
A mineração de dados tem como objetivo principal a extração de informações úteis a partir de uma Base de Dados (BD). O processo de mineração viabiliza, também, a realização de análises dos dados (e.g, identificação de correlações, predições, relações cronológicas, etc.). No trabalho descrito nesta dissertação é proposta uma abordagem à extração de conhecimento temporal a partir de uma BD e detalha a implementação dessa abordagem por meio de um sistema computacional chamado S_MEMISP+AR. De maneira simplista, o sistema tem como principal tarefa realizar uma busca por padrões temporais em uma base de dados, com o objetivo de gerar regras de associação temporais entre elementos de padrões identificados. Ao final do processo, uma análise das relações temporais entre os intervalos de duração dos elementos que compõem os padrões é feita, com base nas relações binárias descritas pelo formalismo da Álgebra Intervalar de Allen. O sistema computacional S_MEMISP+AR e o algoritmo que o sistema implementa são subsidiados pelas propostas Apriori, ARMADA e MEMISP. Foram realizados três experimentos distintos, adotando duas abordagens diferentes de uso do S_MEMISP+AR, utilizando uma base de dados contendo registros de venda de produtos disponibilizados em um supermercado. Tais experimentos foram apresentados como forma de evidenciar que cada uma das abordagens, além de inferir novo conhecimento sobre o domínio de dados e corroborar resultados que reforçam o conhecimento implícito já existente sobre os dados, promovem, de maneira global, o refinamento e extensão do conhecimento sobre os dados.

APA, Harvard, Vancouver, ISO, and other styles

4

Abar, Orhan. "Rule Mining and Sequential Pattern Based Predictive Modeling with EMR Data." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/85.

Full text

Abstract:

Electronic medical record (EMR) data is collected on a daily basis at hospitals and other healthcare facilities to track patients’ health situations including conditions, treatments (medications, procedures), diagnostics (labs) and associated healthcare operations. Besides being useful for individual patient care and hospital operations (e.g., billing, triaging), EMRs can also be exploited for secondary data analyses to glean discriminative patterns that hold across patient cohorts for different phenotypes. These patterns in turn can yield high level insights into disease progression with interventional potential. In this dissertation, using a large scale realistic EMR dataset of over one million patients visiting University of Kentucky healthcare facilities, we explore data mining and machine learning methods for association rule (AR) mining and predictive modeling with mood and anxiety disorders as use-cases. Our first work involves analysis of existing quantitative measures of rule interestingness to assess how they align with a practicing psychiatrist’s sense of novelty/surprise corresponding to ARs identified from EMRs. Our second effort involves mining causal ARs with depression and anxiety disorders as target conditions through matching methods accounting for computationally identified confounding attributes. Our final effort involves efficient implementation (via GPUs) and application of contrast pattern mining to predictive modeling for mental conditions using various representational methods and recurrent neural networks. Overall, we demonstrate the effectiveness of rule mining methods in secondary analyses of EMR data for identifying causal associations and building predictive models for diseases.

APA, Harvard, Vancouver, ISO, and other styles

5

Lu, Jing. "From sequential patterns to concurrent branch patterns : a new post sequential patterns mining approach." Thesis, University of Bedfordshire, 2006. http://hdl.handle.net/10547/556399.

Full text

Abstract:

Sequential patterns mining is an important pattern discovery technique used to identify frequently observed sequential occurrence of items across ordered transactions over time. It has been intensively studied and there exists a great diversity of algorithms. However, there is a major problem associated with the conventional sequential patterns mining in that patterns derived are often large and not very easy to understand or use. In addition, more complex relations among events are often hidden behind sequences. A novel model for sequential patterns called Sequential Patterns Graph (SPG) is proposed. The construction algorithm of SPG is presented with experimental results to substantiate the concept. The thesis then sets out to define some new structural patterns such as concurrent branch patterns, exclusive patterns and iterative patterns which are generally hidden behind sequential patterns. Finally, an integrative framework, named Post Sequential Patterns Mining (PSPM), which is based on sequential patterns mining, is also proposed for the discovery and visualisation of structural patterns. This thesis is intended to prove that discrete sequential patterns derived from traditional sequential patterns mining can be modelled graphically using SPG. It is concluded from experiments and theoretical studies that SPG is not only a minimal representation of sequential patterns mining, but it also represents the interrelation among patterns and establishes further the foundation for mining structural knowledge (i.e. concurrent branch patterns, exclusive patterns and iterative patterns). from experiments conducted on both synthetic and real datasets, it is shown that Concurrent Branch Patterns (CBP) mining is an effective and efficient mining algorithm suitable for concurrent branch patterns.

APA, Harvard, Vancouver, ISO, and other styles

6

Mooney, Carl Howard, and carl mooney@bigpond com. "The Discovery of Interacting Episodes and Temporal Rule Determination in Sequential Pattern Mining." Flinders University. Informatics and Engineering, 2007. http://catalogue.flinders.edu.au./local/adt/public/adt-SFU20070702.120306.

Full text

Abstract:

The reason for data mining is to generate rules that can be used as the basis for making decisions. One such area is sequence mining which, in terms of transactional datasets, can be stated as the discovery of inter-transaction associations or associations between different transactions. The data used for sequence mining is not limited to data stored in overtly temporal or longitudinally maintained datasets and in such domains data can be viewed as a series of events, or episodes, occurring at specific times. The problem thus becomes a search for collections of events that occur frequently together. While the mining of frequent episodes is an important capability, the manner in which such episodes interact can provide further useful knowledge in the search for a description of the behaviour of a phenomenon but as yet has received little investigation. Moreover, while many sequences are associated with absolute time values, most sequence mining routines treat time in a relative sense, returning only patterns that can be described in terms of Allen-style relationships (or simpler), ie. nothing about the relative pace of occurrence. They thus produce rules with a more limited expressive power. Up to this point in time temporal interval patterns have been based on the endpoints of the intervals, however in many cases the natural point of reference is the midpoint of an interval and it is therefore appropriate to develop a mechanism for reasoning between intervals when midpoint information is known. This thesis presents a method for discovering interacting episodes from temporal sequences and the analysis of them using temporal patterns. The mining can be conducted both with and without the mechanism for handling the pace of events and the analysis is conducted using both the traditional interval algebras and a midpoint algebra presented in this thesis. The visualisation of rules in data mining is a large and dynamic field in its own right and although there has been a great deal of research in the visualisation of associations, there has been little in the area of sequence or episodic mining. Add to this the emerging field of mining stream data and there is a need to pursue methods and structures for such visualisations, and as such this thesis also contributes toward research in this important area of visualisation.

APA, Harvard, Vancouver, ISO, and other styles

7

Muzammal, Muhammad. "Mining sequential patterns from probabilistic data." Thesis, University of Leicester, 2012. http://hdl.handle.net/2381/27638.

Full text

Abstract:

Sequential Pattern Mining (SPM) is an important data mining problem. Although it is assumed in classical SPM that the data to be mined is deterministic, it is now recognized that data obtained from a wide variety of data sources is inherently noisy or uncertain, such as data from sensors or data being collected from the web from different (potentially conflicting) data sources. Probabilistic databases is a popular framework for modelling uncertainty. Recently, several data mining and ranking problems have been studied in probabilistic databases. To the best of our knowledge, this is the first systematic study of mining sequential patterns from probabilistic databases. In this work, we consider the kind of uncertainties that could arise in SPM. We propose four novel uncertainty models for SPM, namely tuple-level uncertainty, event-level uncertainty, source-level uncertainty and source-level uncertainty in deduplication, all of which fit into the probabilistic databases framework, and motivate them using potential real-life scenarios. We then define the interestingness predicate for two measures of interestingness, namely expected support and probabilistic frequentness. Next, we consider the computational complexity of evaluating the interestingness predicate, for various combinations of uncertainty models and interestingness measures, and show that different combinations have very different outcomes from a complexity theoretic viewpoint: whilst some cases are computationally tractable, we show other cases to be computationally intractable. We give a dynamic programming algorithm to compute the source support probability and hence the expected support of a sequence in a source-level uncertain database. We then propose optimizations to speedup the support computation task. Next, we propose probabilistic SPM algorithms based on the candidate generation and pattern growth frameworks for the source-level uncertainty model and the expected support measure. We implement these algorithms and give an empirical evaluation of the probabilistic SPM algorithms and show the scalability of these algorithms under different parameter settings using both real and synthetic datasets. Finally, we demonstrate the effectiveness of the probabilistic SPM framework at extracting meaningful patterns in the presence of noise.

APA, Harvard, Vancouver, ISO, and other styles

8

Samamé, Jimenez Hilda Ana. "Recommender systems using temporal restricted sequential patterns." Master's thesis, Pontificia Universidad Católica del Perú, 2021. http://hdl.handle.net/20.500.12404/18784.

Full text

Abstract:

Recommendation systems are algorithms for suggesting relevant items to users. Generally, the recommendations are expressed in what will be recommended and a value representing the recommendation's relevance. However, forecasting if the user will buy the recommended item in the next day, week, or month is crucial for companies. The present study describes a process to suggest items from sequential patterns under temporal restrictions.

APA, Harvard, Vancouver, ISO, and other styles

9

Brown, Shawn Paul. "Rules and patterns of microbial community assembly." Diss., Kansas State University, 2014. http://hdl.handle.net/2097/18324.

Full text

Abstract:

Doctor of Philosophy
Division of Biology
Ari M. Jumpponen
Microorganisms are critically important for establishing and maintaining ecosystem properties and processes that fuel and sustain higher-trophic levels. Despite the universal importance of microbes, we know relatively little about the rules and processes that dictate how microbial communities establish and assemble. Largely, we rely on assumptions that microbial community establishment follow similar trajectories as plants, but on a smaller scale. However, these assumptions have been rarely validated and when validation has been attempted, the plant-based theoretical models apply poorly to microbial communities. Here, I utilized genomics-inspired tools to interrogate microbial communities at levels near community saturation to elucidate the rules and patterns of microbial community assembly. I relied on a community filtering model as a framework: potential members of the microbial community are filtered through environmental and/or biotic filters that control which taxa can establish, persist, and coexist. Additionally, I addressed whether two different microbial groups (fungi and bacteria) share similar assembly patterns. Similar dispersal capabilities and mechanisms are thought to result in similar community assembly rules for fungi and bacteria. I queried fungal and bacterial communities along a deglaciated primary successional chronosequence to determine microbial successional dynamics and to determine if fungal and bacterial assemblies are similar or follow trajectories similar to plants. These experiments demonstrate that not only do microbial community assembly dynamics not follow plant-based models of succession, but also that fungal and bacterial community assembly dynamics are distinct. We can no longer assume that because fungi and bacteria share small propagule sizes they follow similar trends. Further, additional studies targeting biotic filters (here, snow algae) suggest strong controls during community assembly, possibly because of fungal predation of the algae or because of fungal utilization of algal exudates. Finally, I examined various technical aspects of sequence-based ecological investigations. These studies aimed to improve microbial community data reliability and analyses.

APA, Harvard, Vancouver, ISO, and other styles

10

Yang, Can. "Discovering Contiguous Sequential Patterns in Network-Constrained Movement." Licentiate thesis, KTH, Geoinformatik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217998.

Full text

Abstract:

A large proportion of movement in urban area is constrained to a road network such as pedestrian, bicycle and vehicle. That movement information is commonly collected by Global Positioning System (GPS) sensor, which has generated large collections of trajectories. A contiguous sequential pattern (CSP) in these trajectories represents a certain number of objects traversing a sequence of spatially contiguous edges in the network, which is an intuitive way to study regularities in network-constrained movement. CSPs are closely related to route choices and traffic flows and can be useful in travel demand modeling and transportation planning. However, the efficient and scalable extraction of CSPs and effective visualization of the heavily overlapping CSPs are remaining challenges. To address these challenges, the thesis develops two algorithms and a visual analytics system. Firstly, a fast map matching (FMM) algorithm is designed for matching a noisy trajectory to a sequence of edges traversed by the object with a high performance. Secondly, an algorithm called bidirectional pruning based closed contiguous sequential pattern mining (BP-CCSM) is developed to extract sequential patterns with closeness and contiguity constraint from the map matched trajectories. Finally, a visual analytics system called sequential pattern explorer for trajectories (SPET) is designed for interactive mining and visualization of CSPs in a large collection of trajectories. Extensive experiments are performed on a real-world taxi trip GPS dataset to evaluate the algorithms and visual analytics system. The results demonstrate that FMM achieves a superior performance by replacing repeated routing queries with hash table lookups. BP-CCSM considerably outperforms three state-of-the-art algorithms in terms of running time and memory consumption. SPET enables the user to efficiently and conveniently explore spatial and temporal variations of CSPs in network-constrained movement.

QC 20171122

APA, Harvard, Vancouver, ISO, and other styles

11

Merah, Amar Farouk. "Vehicular Movement Patterns: A Sequential Patterns Data Mining Approach Towards Vehicular Route Prediction." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/22851.

Full text

Abstract:

Behavioral patterns prediction in the context of Vehicular Ad hoc Networks (VANETs)has been receiving increasing attention due to enabling on-demand, intelligent traffic analysis and response to real-time traffic issues. One of these patterns, sequential patterns, are a type of behavioral patterns that describe the occurence of events in a timely-ordered fashion. In the context of VANETs, these events are defined as an ordered list of road segments traversed by vehicles during their trips from a starting point to their final intended destination, forming a vehicular path. Due to their predictable nature, undertaken vehicular paths can be exploited to extract the paths that are considered frequent. From the extracted frequent paths through data mining, the probability that a vehicular path will take a certain direction is obtained. However, in order to achieve this, samples of vehicular paths need to be initially collected over periods of time in order to be data-mined accordingly. In this thesis, a new set of formal definitions depicting vehicular paths as sequential patterns is described. Also, five novel communication schemes have been designed and implemented under a simulated environment to collect vehicular paths; such schemes are classified under two categories: Road Side Unit-Triggered (RSU-Triggered) and Vehicle-Triggered. After collection, extracted frequent paths are obtained through data mining, and the probability of these frequent paths is measured. In order to evaluate the e ciency and e ectiveness of the proposed schemes, extensive experimental analysis has been realized. From the results, two of the Vehicle-Triggered schemes, VTB-FP and VTRD-FP, have improved the vehicular path collection operation in terms of communication cost and latency over others. In terms of reliability, the Vehicle-Triggered schemes achieved a higher success rate than the RSU-Triggered scheme. Finally, frequent vehicular movement patterns have been effectively extracted from the collected vehicular paths according to a user-de ned threshold and the confidence of generated movement rules have been measured. From the analysis, it was clear that the user-de ned threshold needs to be set accordingly in order to not discard important vehicular movement patterns.

APA, Harvard, Vancouver, ISO, and other styles

12

Lee, Albert K. (Albert Kimin) 1972. "Combinatorial analysis of sequential firing patterns across multiple neurons decoding memory of sequential spatial experience in rat hippocampus." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/29932.

Full text

Abstract:

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2003.
Includes bibliographical references (p. 100-104).
There is broad agreement that the hippocampus is crucially involved in the formation of richly-detailed, long term memories of events in humans. A key aspect of such memories is the temporal order and spatial context of the events experienced. Evidence from a wide variety of behavioral and electrophysiological experiments indicates that the rodent hippocampal spatial memory system is a model system for studying this type of memory in humans. Here, we develop a new combinatorial method for analyzing sequential firing patterns involving an arbitrary number of neurons based on relative time order. We then apply this method to decode memories of sequential spatial experience in the rat hippocampus during slow wave sleep. Specificaly, rats are trained to repeatedly run through a sequence of spatial receptive fields ("place fields") of hippocampal CA1 "place cells" in a fixed temporal order. The spiking activity of many such individual cells is recorded before (PRE), during (RUN), and after (POST) this experience. By treating each place field traversed as an individual event, the rat's experience in RUN can be represeted by the resulting sequence of place fields traversed, and therefore by the activity of the corresponding place cells. Then to characterize the extent to which the sequential nature of the RUN experience has been encoded into memory, we search for firing patterns related to the RUN sequence in POST. To do so, we develop a method that statistically quantifies the similarity between any desired "reference sequence" (here chosen to be the RUN sequence) and arbitrary temporal firing patterns. We find that the RUN sequence is repeatedly re-expressed during POST slow wave sleep in brief bursts involving four or more cells firing in order, but not so during PRE.
(cont.) This provides direct neural evidence of the rapid learning of extended spatial sequences experienced in RUN. The results may shed light on the encoding of memories of events in time ("episodic memories") in humans. Furthermore, the multiple spike train analysis method developed here is general and could be applied to many other neural systems in many different experimental conditions.
by Albert K. Lee.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

13

李純琇. "Mining Image Classification Rules based on Decision Tree of Sequential Patterns." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/95144286021237702677.

Full text

Abstract:

碩士
國立臺灣師範大學
資訊教育研究所
91
In this thesis, a method of image classifications is proposed. This approach is designed based on constructing decision trees for sequential patterns. First, the color space of images is transferred from RGB to HSI. After performing quantization on the color space, color blocks in an image are extracted and blocks with the same color are assigned the same identifiers of feature terms. According to y-positions of color blocks, blocks are sorted to form a sequence of feature terms in order to represent features of an image. Frequent sequential patterns, mined from the sequences of image feature terms extracted from training images, are used to be the attributes for classification. Finally, according to the selected attributes, a decision tree is constructed by performing C4.5 algorithm to find the classification rules. Moreover, in order to improve the accurate rate of classification, new images which are assigned the wrong categories by the system can be inserted into training set to re-train the classification rules. For achieving more efficient performance when performing re-training, the concept of incremental mining is applied in the system to preserve the information of frequent sequential patterns and negative borders in the previous training images. Such that it prevents re-scanning the whole training data set to select the new classification attributes. The experiment results show that the accurate rates of the proposed method is good for various kinds of image. Furthermore, by comparing with another related work, our method has better accurate rate and has less numbers of comparisons when searching classification rules.

APA, Harvard, Vancouver, ISO, and other styles

14

Tiple, Pedro Santos. "Tool for discovering sequential patterns in financial markets." Master's thesis, 2014. http://hdl.handle.net/10362/14091.

Full text

Abstract:

The goal of this thesis is the study of a tool that can help analysts in finding sequential patterns. This tool will have a focus on financial markets. A study will be made on how new and relevant knowledge can be mined from real life information, potentially giving investors, market analysts, and economists new basis to make informed decisions. The Ramex Forum algorithm will be used as a basis for the tool, due to its ability to find sequential patterns in financial data. So that it further adapts to the needs of the thesis, a study of relevant improvements to the algorithm will be made. Another important aspect of this algorithm is the way that it displays the patterns found, even with good results it is difficult to find relevant patterns among all the studied samples without a proper result visualization component. As such, different combinations of parameterizations and ways to visualize data will be evaluated and their influence in the analysis of those patterns will be discussed. In order to properly evaluate the utility of this tool, case studies will be performed as a final test. Real information will be used to produce results and those will be evaluated in regards to their accuracy, interest, and relevance.

APA, Harvard, Vancouver, ISO, and other styles

15

Lin, Kuei Ying, and 林桂英. "Mining Fuzzy Multiple-level Association Rules and Fuzzy Sequential Patterns from Quantitative Data." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/83878981510147486751.

Full text

Abstract:

碩士
義守大學
資訊工程學系
89
Many researchers in database and machine learning fields are primarily interested in data mining because it offers opportunities to discover useful information and important relevant patterns in large databases. Most previous studies have shown how binary valued transaction data may be handled. Transaction data in real-world applications usually consist of quantitative values, so designing a sophisticated data-mining algorithm able to deal with various types of data presents a challenge to workers in this research field. This paper thus proposes two kinds of fuzzy mining algorithms, respectively for multiple-level association rules and sequential patterns, to extract knowledge implicit in transactions stored as quantitative values. The proposed fuzzy mining algorithms first transform quantitative values in transactions into linguistic terms, then filter them to find fuzzy association rules or sequential patterns by modifying the conventional mining algorithms. Each quantitative item uses only the linguistic term with the maximum cardinality or uses all possible linguistic terms in the mining processes. If only the linguistic terms with the maximum cardinalities are used, the number of fuzzy regions to be processed is the same as that of the original items. The algorithms therefore focus on the most important linguistic terms and reduce their time complexity. If all linguistic terms are used in the mining process, the derived set of rules or patterns is more complete, although computation is more complex. In addition, a web mining algorithm for fuzzy browsing patterns from the world wide web has also been proposed. The association rules and sequential patterns mined out thus exhibit important quantitative regularity in databases and can be used to provide some suggestions to appropriate supervisors.

APA, Harvard, Vancouver, ISO, and other styles

16

SHAN, BO-YANG, and 單柏揚. "Using Sequential Pattern Mining to Enhance iMonsters Game Rules." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/95bqtn.

Full text

Abstract:

碩士
亞洲大學
行動商務與多媒體應用學系
107
In the era of the rapid development of IoT, artificial intelligence and big data, Information security issues are on the rise and the importance of cyber security has gradually gained attention. Many studies using game-based learning to teach the cyber security concept and literacy have gained good results. For example, "iMonsters" card game, developed by KDELab of Asia University, currently has been effective in the teaching and has a lot of favorite players. But according to the feedback from the players, there still exist some problems. In this study, we will solve these problems to achieve a better balance between the gameplay, domain knowledge and education. We first proposed the game refining algorithm using Sequential Pattern Mining to strengthen the iMonsters game rules. For example, some anomalies have been found including the existence of the barrier of the game and the different interpretations of the game rules. Therefore, the game rule refining algorithm can help us to refine the game rules. We then conducted the teaching and testing of the card game through the winter camp of the Asian University. According to the results of pre-tests, post-tests and questionnaires, we further modified the rules of the game and obtained the players' learning status. In addition, we applied the Internet security knowledge building algorithm proposed by the KDELab to analyze the collected real Internet security incidents and to modify the iMonsters card game rules if the new incidents cannot be solved. Finally the game rules evolution algorithm was proposed to modify the game rules to conform to the ever-changing cyber-attack techniques.

APA, Harvard, Vancouver, ISO, and other styles

17

Lu, Jen-Chieh, and 呂仁傑. "Adapting Sequential-Pattern Mining to Discover Causal Rules on University Student Portfolios." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/nsj9em.

Full text

Abstract:

碩士
國立中央大學
資訊工程學系
106
Institutional Research(IR) has been proposed for many years in the world . The main purpose of IR in college or University is to inform campus decision-making and plans in areas such as admissions, curriculum assessment , enrollment management , clubs , student life ,etc. The cause by changing of educational institution and social society, IR has been concentrate on many fields in recent years. Supporting assessment, strategic and decision-making effectively is the main topic in the thesis. The coverage of research topics of the IR is very wide. The thesis focuses on mining student causal rules consisting of city, high school types , admissions , clubs, permits , questionnaire survey of graduation .using those elements to find out rules base on suffix tree is appropriate. Building suffix tree through the bunch of factors which had been arranged in chronological order is the major method to discover effect rules . Mining the causal rules with the query interface to support IR researchers to determine decision-making accurately is the main purpose in the thesis.

APA, Harvard, Vancouver, ISO, and other styles

18

Wang, Linyan. "AV space for efficiently learning classification rules from large datasets /." 2006. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR19748.

Full text

Abstract:

Thesis (M.Sc.)--York University, 2006. Graduate Programme in Computer Science.
Typescript. Includes bibliographical references (leaves 130-134). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR19748

APA, Harvard, Vancouver, ISO, and other styles

19

RAILEAN, Ion. "Algorithmes et mesures dans l'exploration de données séquentielles." Phd thesis, 2012. http://tel.archives-ouvertes.fr/tel-00808168.

Full text

Abstract:

The increasing amount of information makes sequential data mining an important domain of research. A vast number of data mining models and approaches have been developed in order to extract interesting and useful patterns of data. Most models are used for strategic purposes resulting in using of the time parameter. However, the extensive field of data mining applications requires new models to be introduced. The current thesis proposed models for temporal sequential data mining having as a goal the forecasting process. We focus our study on sequential temporal database analysis and on time-series data. In sequential database analysis we propose several interestingness measures for rules selection and patterns extraction. Their goal is to advantage those rules/patterns whose time-distance between the itemsets is small. The extracted information is used to predict user¿s future requests in a web log database, obtaining a higher performance in comparison to other compared models. In time-series analysis we propose a forecasting model based on Neural Networks, Genetic Algorithms, and Wavelet Transform. We apply it on a WiMAX network traffic and EUR/USD currency exchange data in order to compare its prediction performance with those obtained using other existing models. Different ways of changing parameters adapted to a given situation and the corresponding simulations are presented. It was shown that the proposed model outperforms the existing ones from the prediction point of view on the used time-series. As a whole, this thesis proposes forecasting models for different types of temporal sequential data with different characteristics and behaviour.

APA, Harvard, Vancouver, ISO, and other styles

20

Lin, Ming-Yen, and 林明言. "Efficient Algorithms for Association Rule Mining and Sequential Pattern Mining." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/m8z62p.

Full text

Abstract:

博士
國立交通大學
資訊工程系所
92
Tremendous amount of data being collected is increasing speedily by computerized applications around the world. Hidden in the vast data, the valuable information is attracting researchers of multiple disciplines to study effective approaches to derive useful knowledge from within. Among various data mining objectives, the mining of frequent patterns has been the focus of knowledge discovery in databases. This thesis aims to investigate efficient algorithms for mining frequent patterns including association rules and sequential patterns. We propose the LexMiner algorithm to deal with frequent item-set discovery for association rules. To alleviate the drawbacks of hash-tree placement of candidates, some algorithms store candidate patterns according to prefix-order of itemsets. LexMiner utilizes the lexicographic features and lexicographic comparisons to further speed up the kernel operation of mining algorithms. A memory indexing approach called MEMISP is proposed for fast sequential pattern mining using a find-then-index technique. MEMISP mines databases of any size, with respect to any support threshold, in just two passes of database scanning. MEMISP outperforms other algorithms in that neither candidate patterns nor intermediate databases are generated. Mining sequential patterns with time constraints, such as time gaps and sliding time-window, may reinforce the accuracy of mining results. However, the capabilities to mine the time-constrained patterns were previously available only within Apriori framework. Recent studies indicate that pattern- growth methodology could speed up sequence mining. We integrate the constraints into a divide-and-conquer strategy of sub-database projection and propose the pattern-growth based DELISP algorithm, which outperforms other algorithms in mining time-constrained sequential patterns. In practice, knowledge discovery is an iterative process. Thus, reducing the response time during user interactions for the desired outcome is crucial. The proposed KISP algorithm utilizes the knowledge acquired from individual mining process, accumulates the counting information to facilitate efficient counting of patterns, and accelerates the whole interactive sequence mining process. Current approaches for sequential pattern mining usually assume that the mining is performed with respect to a static sequence database. However, databases are not static due to update so that the discovered patterns might become invalid and new patterns could be created. Instead of re-mining from scratch, the proposed IncSP algorithm solves the incremental update problem through effective implicit merging and efficient separate counting over appended sequences. Patterns found in prior stages are incrementally updated rather than re-mining. Comprehensive experiments have been conducted to assess the performance of the proposed algorithms. The empirical results show that these algorithms outperform state-of-the-art algorithms with respect to various mining parameters and datasets of different characteristics. The scale-up experiments also verify that our algorithms successfully mine frequent patterns with good linear scalability.

APA, Harvard, Vancouver, ISO, and other styles

21

Chen, Hung-Jen, and 陳宏任. "Algorithms for Negative Sequential Pattern Mining and Fuzzy Correlation Rule Mining." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/35030898529324940135.

Full text

Abstract:

博士
淡江大學
資訊工程學系博士班
96
Due to rapid developments in information technology and automatic data collection tools, a large amount of data has been collected and stored in various data repositories. To extract valuable information from these data is the key to improve business competition. Data mining offers ways to automatically find nontrivial, previously unknown, and potentially useful knowledge from large databases. Mining of frequent patterns plays an essential role in data mining. Many methods have been proposed for discovering various types of frequent patterns such as frequent itemsets, association rules, correlation rules, and sequential patterns. In this dissertation, three types of frequent patterns, namely, negative sequential patterns, negative fuzzy sequential patterns, and fuzzy correlation rules, have been introduced. We propose an algorithm for mining negative sequential patterns, which consider not only the occurrence of itemsets in transactions in databases but also their absence. In this algorithm, we have designed a candidate generation procedure employing the apriori principle to eliminate many redundant candidates during the mining task. Moreover, in this method, we also define a function based on the conditional probability theory to measure the interestingness of sequences in order to find more interesting negative sequential patterns. Additionally, most transaction data in real-world applications usually consist of quantitative values. In order to investigate various types of data in quantitative databases and then discover negative sequential patterns from such databases, we propose an algorithm, which combines fuzzy-set theory and negative sequential pattern concept, for mining negative fuzzy sequential patterns from quantitative databases. Furthermore, we propose a method for mining fuzzy correlation rules, which applies fuzzy correlation analysis to determine whether two sub-fuzzy itemsets in a fuzzy itemset are dependent, and then extract more interesting fuzzy correlation rules from quantitative databases. Experiments in the three proposed algorithms show that our algorithms can prune a lot of redundant candidates during the process of mining tasks and can effectively extract frequent patterns that are actually interesting.

APA, Harvard, Vancouver, ISO, and other styles

22

Chiu, Chia-Wei, and 邱佳偉. "Mining Web Browsing Behavior by the Association Rule and the Sequential Pattern." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/4yw577.

Full text

Abstract:

碩士
國立臺北科技大學
商業自動化與管理研究所
96
Rich connected links and information on the internet have changed enterprises’ ways to do business. Internet also enlarges the field of marketing and forms many creative business models, such as B2B, B2C, C2B, and C2C. Nowadays, enterprises can access more potential customers through broadcasting product and brand information by internet. As a result, how to build an automatic system to help enterprises find out valuable customers and understand customer’s online behaviors is an extremely important issue now. This research is to build a systematic mechanism to find out valuable customers and understand their online browsing and shopping behaviors. RFM model, the association rule and the sequential pattern method are adopted to mine the internet clickstream data of a B2C e-commerce website. The empirical results show our mechanism can well achieve our research purpose.

APA, Harvard, Vancouver, ISO, and other styles

23

Tsai, Han-cheng, and 蔡涵丞. "A Frequent-Pattern tree Algorithm combine with Sequential Pattern for Discovering the Multi-level Characteristic Rule." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/81469548286833468544.

Full text

Abstract:

碩士
立德大學
數位應用研究所
98
Most researchers investigate effective methods of mining association rules in order to find out the items’ association rules of the databases to facilitate users. However, these researchers are only attention the association rules of items. Multiple items and technology develop quickly, and the association of items is less. A lot of messages are not found easily from the items, and traditional data mining methods have used widely. Most items focus on selling high-margin products, and not for customers needed. If decision makers want to understand the different characteristics of customers, they should know the date and frequency of buying items of the customers. Therefore, the decision makers must identify the association rules of customer characteristics, purchases date and items. The study is to integrate sequential patterns and FPML algorithm which are so-called FSP (Frequent Sequential Pattern). The FSP is used to explore the association among the multi-level rules, purchasing date, and items. According to the results of date mining, it will show what time customers may buy the items. It will enhance sales performance greatly by using marketing strategies.

APA, Harvard, Vancouver, ISO, and other styles

24

An, Pao-Ying, and 安寶楹. "Mining Associative Sequential Rules for Image Classification." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/96293014152058508425.

Full text

Abstract:

碩士
國立臺灣師範大學
資訊教育研究所
90
In this thesis, a new image classification method based on mining associative sequential rules is proposed. First, the colour blocks in an image is extracted. Moreover, the attribute values of the colour blocks are recorded, including the area, x-position, y-position of the color block and so on. A colour block with a specific colour is defined as an image feature term.The extracted colour blocks are sorted according to a colour attribute to form a sequence of image feature terms, which is the data used to represent the characteristic of an image. Moreover, an efficient sequential pattern mining algorithm is provided. The frequent sequential patterns are mined from the sequences of image feature terms extracted from training images to derive associative classification rules. The data structures “bits index table” and “appearing index table” are designed to assist mining frequent sequential patterns and classification rules quickly. Finally, the judgement method of classification is designed based on multiple classification rules instead of one single rule. The experiments are performed on natural images and animal images obtained from Corel Gallery CD. The results show that the average accurate rate of image classification, achieved by the method proposed in this thesis, is above 92%. In addition, the performance of accurate rate of our method is better than the related works.

APA, Harvard, Vancouver, ISO, and other styles

25

Mooney, Carl Howard. "The discovery of interacting episodes and temporal rule determination in sequential pattern mining." 2006. http://catalogue.flinders.edu.au/local/adt/public/adt-SFU20070702.120306/index.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

方乃騏. "Generalizing Sequential Pattern Discovery: Signed and Quantitative Sequential Patterns." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/99840209472614470085.

Full text

Abstract:

碩士
國立中山大學
資訊管理學系
87
The sequential pattern analysis utilizes the customer identification information to provide new type of knowledge (i.e., sequential patterns) that can not be discovered in the association rule analysis. However, the sequential pattern analysis has some limitations. First, sequential patterns are anonymous rather than signed. That is, customer profiles are not incorporated in the sequential pattern analysis. The sequential pattern analysis may find such pattern as purchasing washer then dryer. The characteristics of customers who have this particular sequential purchase pattern is, however unknown. Secondly, the quantities of items in a set of ordered transactions are not taken into account during the sequential pattern analysis. Thus, such knowledge as “after purchasing several dehumidification cans (on the same or different transactions), a customer often purchases a dehumidifier” cannot be discovered by the sequential pattern analysis technique. The purpose of this thesis generalize the sequential pattern analysis by incorporating the customer profiles for mining signed sequential patterns as well as by incorporating the quantity of items purchased for quantitative sequential patterns. This research involves developing mining techniques for signed and quantitative sequential patterns and evaluating the performance of the proposed techniques by varying the number of customers, the number of items, the number of customer attributes, or the percentage of missing values.

APA, Harvard, Vancouver, ISO, and other styles

27

Liao, Pei-Yu, and 廖珮妤. "Mining Fuzzy Time Sequential Patterns." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/09899033589499186338.

Full text

Abstract:

碩士
淡江大學
資訊工程學系碩士班
95
An important task of sequential patterns mining is to discover frequent sequential patterns in a sequence database. Conventional sequential patterns only reveal the order of items, information about time intervals between successive by occurred items has not been determined. In this paper, we proposed an algorithm called fuzzy time sequential pattern mining (FTSP). We use the hierarchical clustering technique to cluster the time intervals between successive itemsets, and define a fuzzy number to each time cluster to compute the fuzzy support, and then we have mined the frequent fuzzy time sequential patterns. Fuzzy time sequential patterns mining, reveals not only the order of items, but also the time intervals between successive items.

APA, Harvard, Vancouver, ISO, and other styles

28

Wu, Ching-Po, and 吳青坡. "Mining Constrained Temporal Sequential Patterns." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/10783496287402210264.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊管理系
90
Sequential patterns are very useful for marketing. Recently, many researchers have been working on mining sequential patterns. However, three problems remain to be resolved. First, only orders between events are considered. The time interval between the occurring of two events is not considered. Second, too many rules are generated and most of them are not useful. Third, the existing mining algorithms are not efficient due to the enormous search space of the mining problem. In this thesis, we propose solutions for the above listing problems. First, we introduce the time interval concept in sequential patterns. For example, we may find a rule states that if a user buys item A then within 3 to 5 days he will buys item B. Second, we allow a user to specify constraints on sequential patterns. As such, only rules that are interesting to the user will be mined. Furthermore, we take the advantage of constraints and develop two efficient mining algorithms. The GAMTI algorithm is a genetic algorithm that finds the optimal time interval between any two related events. While the EVC algorithm applies the concept of variation coefficient in Statistics to find the optimal time interval. Both GAMTI and EVC algorithms take advantage of constraints to find the rules that are interesting to the users efficiently.

APA, Harvard, Vancouver, ISO, and other styles

29

Yang, Shun-An, and 楊順安. "Mining Closed Multidimensional Sequential Patterns." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/78213096241759731497.

Full text

Abstract:

碩士
銘傳大學
資訊工程學系碩士班
98
It can analyze behaviors that most customers sequentially buy products by sequential patterns mining. However, different characteristics of customers may have different buying behaviors. Therefore, we must consider what characteristics of customers will have to sequentially buy products. The information is called multidimensional sequential patterns. During mining process, it will discover large number of patterns from multidimensional sequence database. Policymakers wouldn’t realize how to use these large number of patterns. Therefore, closed multidimensional sequential patterns is non- redundant multidimensional sequential patterns and can infer all multidimensional sequential patterns. In this paper, we provide an efficient algorithm CMSP to mining closed multidimensional sequential patterns. During mining process, we have three ways to quickly judge a pattern is closed multidimensional sequential patterns and don’t have to delete candidate closed multidimensional sequential patterns. We will compare CMSP with CCMD and CIScombine in execution time. It proves that our approach is very efficient by the result of experiments.

APA, Harvard, Vancouver, ISO, and other styles

30

Lan, Kuo-Lung, and 藍國隆. "Efficient Clustering Algorithms for Sequential Patterns." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/97123256740724450082.

Full text

Abstract:

碩士
元智大學
資訊管理研究所
93
Maximal complete subgraph (MCS), a NP-Hard problem, can be used as a clustering algorithm on sequential patterns. Generic Algorithm (GA) and Generic Programming (GP) have been applied on the lustering problem to obtain approximate optimal solutions under limited time consuming. The CLIQUES is the fast algorithm on searching all MCSs in the current literatures. Its time complexity is linear to the product of the amount of nodes and the amount of MCSs. However, the approach of CLIQUES cannot provide the solution considering the order of items or efficient incremental update. 　　This article proposes some approaches satisfying these requirements, and provide a method to improve the performance of CLIQUES algorithm. The experimental results demonstrate the applicability of our approach on variety of graphs. And, the experiments show that our method is faster than CLIQUES for most graphs.

APA, Harvard, Vancouver, ISO, and other styles

31

CHENG, LI-WEI, and 鄭力瑋. "Mining Sequential Patterns in Data Stream." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/56rjbu.

Full text

Abstract:

碩士
銘傳大學
資訊工程學系碩士班
105
Mining sequential pattern is mainly to find the sequential purchasing behaviors for the most customers.in the database. For example, the most customer will buy the product A first and then buy the product B or the product C. We can use the information of the most customers purchasing behavior that we analyzed to make the decision to raise the profit. Mining Sequential Pattern is separated from static and dynamic. Now, we often use dynamic sequential mining. We need to update the data stream that one by one to come. Because the dynamic data stream is update immediately, there are many transactions join the database, Before this paper, the result of mining sequential pattern in data stream is not efficient. There will use too much memory. So we want to improve the method to update data stream efficiently.

APA, Harvard, Vancouver, ISO, and other styles

32

Tu, Kang, and 涂剛. "Efficient Algorithms of Mining Sequential Patterns." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/87924260270503592545.

Full text

Abstract:

碩士
南台科技大學
資訊管理系
98
As the growing of information technology and internet, the amount of data that an enterprise accumulating is much bigger and bigger. Amounts of data from the visible into the invisible. If use traditional way of searching has not been able to get the useful information quickly and efficiently from a large number of data. So data mining is just to find out the potentially useful information and knowledge. Association rules, Clustering, Classification, and Sequential pattern all the popular way of data mining. The research is base on Apriori algorithm, modifying Apriori algorithm to be a judgments method of candidate itemsets, and join the concept of the sequence. It to reveal the customers and purchasing behavior’s correlation. And then join the concept of the sequence. It can discover the associations among traveling and purchasing behaviors of customers and overcome the disadvantages of traditional methods. Then redesign the algorithm and enable the algorithm to create all the frequent itemsets after the database updated, without scanning the original database again. When a person is thinking the item about how many amount can be used to sell, and it will provide us very useful information.

APA, Harvard, Vancouver, ISO, and other styles

33

Shie, Bai-En, and 謝百恩. "Mining Sequential Patterns with Pattern Constraints." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/4yw437.

Full text

Abstract:

碩士
銘傳大學
資訊工程學系碩士班
94
Sequential pattern mining is to find sequential behaviors which most customers frequently do in a transaction database. These behaviors are called sequential patterns. There were many papers proposed algorithms for finding all sequential patterns. However, there is a new problem: users may only need some special sequential patterns, for example, the sequential patterns which include certain items or behaviors. If we let users set the items or patterns which they are interested in before mining process, we will save much execution time and the sequential patterns we found can fit the users'' need. The items or patterns which are preset by users are "pattern constraints." We propose an effective algorithm to find all sequential patterns which fit the constraint from the transaction database. In the experimental results cheaper, we use real dataset and synthetic dataset to compare our algorithm with SPIRIT(R) algorithm and Bit-String algorithm. The results show that although our algorithm used more memory than SPIRIT(R) algorithm during the mining process, our algorithm was faster than SPIRIT(R) algorithm. The results also show that our method not only used less memory space than Bit-String algorithm but also was faster than Bit-String algorithm.

APA, Harvard, Vancouver, ISO, and other styles

34

Yu, Shiau-Ping, and 余曉萍. "Development of Optimal Stopping Rules for Sequential Sampling Plan." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/74935312944624766199.

Full text

Abstract:

碩士
國立成功大學
統計學系碩博士班
95
During the in-coming and/or outgoing inspection of the industrial products, the decision of accepting or rejecting a lot is made according to the inspection/testing results for the key characteristics of sample units. Additional costs including labor and material costs as well as the loss of mis-judgement usually occur when applying Wald's sequential sampling plan to the destructive testing. Normally, previous stopping rules for Wald's sequential sampling plan are empirically determined based on rules of thumb. Practical and unable to decide whether the sample number for terminating inspection/testing is economical or not . In order to effectively reduce the average sample number of sequential sampling plan, the upper limit of the sample number is specified first, then the optimal stopping rules is determined based on this specified sample number . Finally, a total cost function is established to assess the total loss of the proposed sequential sampling plan . The results show that the optimal stopping rule for our proposed sequential sampling plan can effectively reduce the average sample number and thus achieve a minimum total loss under the reasonable type Ι and Ⅱ errors.

APA, Harvard, Vancouver, ISO, and other styles

35

Yu, Ling. "Mining Maximal Sequential Patterns in Protein Databases." 2005. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0001-2607200510480900.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Lin, Shi-Cheng, and 林師晟. "Maximal Sequential Patterns Mining with Timing Constraints." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/77209305556748893287.

Full text

Abstract:

碩士
淡江大學
資訊管理學系碩士班
98
The purpose of frequent sequential pattern mining is to find sequential patterns which occur more frequently than a given threshold. Normally these patterns are then transformed into previously-unknown useful and valuable information. Because of accumulated huge number of records in the database, frequent sequential pattern mining often takes a lot of time. Since most frequent sequential pattern mining algorithms do not have timing constraints, lots of frequent sequential patterns are found. It is difficult to decide which patterns among them are useful. Maximal frequent sequential pattern mining could obtain more compact patterns without losing any results obtained in frequent sequential pattern mining. However, most of these algorithms must complete k rounds to obtain maximal frequent sequential patterns with length k. The longer the maximal frequent sequential patterns, the more rounds the mining requires. The required mining time would be longer accordingly. We propose an algorithm which obtains maximal frequent sequential patterns with timing constraints. This algorithm can restrict the occurring time-interval of the obtained maximal sequential patterns. It could obtain maximal frequent sequential patterns with length k in less than k rounds. We demonstrate that the timing constraints could speed up the mining process. Finally, we apply our algorithm to a database of traffic flow records, and illustrate how to obtain maximal frequent sequential patterns with different timing meaning according to selected timing constraints.

APA, Harvard, Vancouver, ISO, and other styles

37

Kao, Cheng-Li, and 高誠勵. "Mining Sequential Interaction Patterns in Social Networks." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/59519368642549973067.

Full text

Abstract:

碩士
國立臺灣大學
資訊管理學研究所
99
With advance of web technology, many social networks have been highly developed in recent years. A large amount of interactions between users in a social network have been collected into databases. Mining interaction patterns in social networks can help us to analyze user’s interactions and behavior, promote the technologies of running social networks, and formulate marketing and advertisement strategies. Therefore, in this thesis, we propose an efficient method, called MSIP (Maximal Sequential Interaction Patterns), to mine maximal interaction patterns in social network databases. The proposed algorithm consisted of two phases. First, we scan the database to find all frequent patterns of length one (1-patterns) and generate the projected database for each frequent 1-patterns. Next, we recursively mine all frequent patterns in a depth-first search (DFS) manner until no more frequent patterns can be found. During mining process, we employ three effective pruning strategies to prune unnecessary candidates and a closure checking scheme to remove non-maximal frequent patterns. Therefore, the proposed method can efficiently mine interaction patterns in social networks. The experimental results show that the MSIP algorithm outperforms the modified MSPX algorithm.

APA, Harvard, Vancouver, ISO, and other styles

38

Lee, Yi-Tian, and 李宜靝. "The Periodical Intervals Analysis on Sequential Patterns." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/21969389403664746841.

Full text

Abstract:

碩士
淡江大學
資訊工程學系碩士班
93
In processing huge transaction data analysis, we often use Association Rules Mining and Sequential Patterns Mining techniques to discover the buying behaviors of customers. However, by sequential patterns, we are hard to find out the time intervals of related items purchased. In this paper, we develop a set of algorithms to analysis the periodical properties of time intervals over sequential patterns. The first, we introduce PDT/PDM algorithms to discover periodical distributions for common cases. Then, we extend them as LPDT/LPDM algorithms to overcome linearly trend components of curves. Finally, we combine those algorithms and sequential patterns’ distribution property as PIM (Periodical Intervals Mining) algorithm. By experiment, we use PIM algorithm to analysis the periodical distributions and use them to point out the best choice of products from sequential patterns by compare the periodical intervals.

APA, Harvard, Vancouver, ISO, and other styles

39

Chen, Wei-ting, and 陳威廷. "Sequential Patterns Mining in Multiple Data Streams." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/35162014108124967289.

Full text

Abstract:

碩士
東吳大學
資訊管理學系
97
Sequential patterns mining searches for the relative sequence of events, allowing users to make predictions on discovered sequential patterns. The application of the technique is considerably prevalent among commercial transactions, meteorology and health care…etc. Due to IT progress in recent years, data has changed rapidly, growth in the amount of data explodes and real-time demand increase, leading to a so-called data stream environment. In this environment, data cannot be fully stored and ineptitude in traditional mining techniques has led to the emergence of data streaming mining technology. With application of this mining technology, a database mining which could not store massive amount of data can even provides users with real-time mining results. Multiple data streams are a branch off the data stream environment. In the study of multiple data streams, sequential pattern mining is still one of the many important issues. Nonetheless, the previously proposed MILE algorithm from the study has a limitation to preserving the previous minding sequential pattern when a new data is entered due to the concept of one-time fashion mining. To address this problem, we propose an ICspan algorithm to continue mining sequential patterns through an incremental approach and to acquire a more accurate mining result. In addition, due to the algorithm’s constraint in closed sequential patterns mining, the generation and records for sequential patterns will be reduced, leading to the reduction of memory usage and to effectively increase execution efficiency.

APA, Harvard, Vancouver, ISO, and other styles

40

Ling, Yu, and 凌宇. "Mining Maximal Sequential Patterns in Protein Databases." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/28917977808357764502.

Full text

Abstract:

碩士
國立臺灣大學
資訊管理學研究所
93
Because of the close relationship between sequential patterns and protein function, systematically mining significant sequential patterns in protein databases has become an important research topic. In this thesis, we proposed a suffix-tree-based algorithm to discover patterns in protein databases. We use the occurrence information maintained in the suffix tree to mine closed frequent substrings, generate maximal frequent sequential patterns, and adjust the gaps within the patterns. To ensure the compactness of the patterns we generate, we do not generate all patterns but only maximal patterns. From the experimental results, our proposed algorithm can find not only the patterns recorded in PROSITE database, but also some other patterns worth of further biological studying, such as longer patterns and the classifier pattern set. Besides, our proposed algorithm generates better results than those of Chang and Halgamuge’s method in the experiment.

APA, Harvard, Vancouver, ISO, and other styles

41

chang, Ho-Yi, and 張和逸. "Efficient Approaches for Mining Consecutive Sequential Patterns." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/78324196880827314175.

Full text

Abstract:

碩士
南台科技大學
資訊管理系
96
Consecutive sequential patterns exist in many applications. Air routes, ship routes, or websites browsing routes are kinds of consecutive sequential patterns. However, few studies have been done on consecutive sequential patterns. The TFA algorithm is the one of few algorithms which attempts to mine consecutive sequential patterns. The purpose of this thesis is to improve the defects of the TFA algorithm. The TFA algorithm is one of the most efficient algorithms, but it still has two problems. First, the performance of TFA algorithm is getting worse when transaction length is getting longer. Second, it generates the consecutive sub-sequences by decompose duplicate transactions. Therefore performance of TFA algorithm can be improved. First, we propose a new algorithm – CFP（Mining Consecutive Sequence Using Filtered and Pruning the mechanism）that is improved from TFA algorithm. Using Filtered and Pruning mechanism in earlier stage CFP algorithm accumulates the same shortened transactions of databases. CFP algorithm is very suitable to mine the sequences of databases whose record length is very long, and it will not generate the consecutive sub-sequences by decompose duplicate transactions. It only scans database four times and avoids generating any unnecessary candidate consecutive sub-sequence in mining process. The size of the databases in the real world is always greater than the size of the memory. In order to solve this problem, the CFP algorithm divides a large database into many sub-databases and mines rules from those sub-databases. The CFP algorithm only scans database four times and will not be affected by the length of frequent sub-sequences. The CFP algorithm avoids wasting a lot of I/O time and increases the efficiency and the practicability in application. The CFP algorithm is one of the most efficient algorithms, but it still has one problem which is that CFP needs to inspect the same sub-sequence repeatedly in order to eliminate infrequent consecutive sub-sequences. Therefore performance of CFP algorithm can be improved. So we propose a new algorithm – CNP（Mining Consecutive Sequence Using New Filtered and Produce the mechanism）which uses a new filtering and producing mechanism to reduce frequency of inspection when eliminating infrequent consecutive sub-sequences. Comprehensive experiments have been conducted to assess the performance of the proposed algorithm. The experimental results show that algorithms which we proposed outperform previous algorithms in the experiments.

APA, Harvard, Vancouver, ISO, and other styles

42

Shu, Yu-Tzu, and 舒毓箎. "Mining Useful Sequential patterns from Medical Data." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/92200773024082078708.

Full text

Abstract:

碩士
國立東華大學
資訊工程學系
92
The goal of data mining is to find useful information from huge amount of data. In this paper, we try to find useful sequential patterns from the reports of the laboratory tests and physical examinations of the cirrhosis patients. By using the available algorithm, many interesting sequential patterns are mined out. We hope the mining results can promote the medical service quality and achieve the goal of preventive medicine.

APA, Harvard, Vancouver, ISO, and other styles

43

Chen, Shih-Sheng, and 陳仕昇. "The Research of Mining Frequent Sequential Patterns." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/00571165628123445673.

Full text

Abstract:

博士
國立中央大學
資訊管理研究所
91
Mining sequential patterns in databases is an important issue with many applications on commercial and scientific domains. For example, finding the patterns of DNA sequences and analyzing users’ web site browsing patterns can help to discover important knowledge in genetic evolution and consumer behavior, respectively. Existing studies on finding sequential patterns can be classified into two categories, namely continuous and discontinuous patterns. In the first category, patterns are composed of elements in consecutive sequences. In the second category, patterns can be composed by elements that are separated by wild cards, which can denote zero or more than one elements. Although many researches have been published to find either kind of the patterns, no one can find both of them. Neither can they find the discontinuous patterns formed of several continuous sub-patterns. The dissertation defines hybrid patterns as the combination of continuous and discontinuous patterns and proposes a novel algorithm to mine hybrid patterns. The algorithm is as fast as PrefixSpan for mining sequential patterns. Algorithms such as PrefixSpan require data volume to be small enough to fit in the main memory of machines to gain the full speed. In the dissertation, we also propose a sampling-based approach to find discontinuous patterns and continuous patterns. There are three advantages in this approach. First, it can mine frequent patterns from huge data as Apriori-like algorithms but need not to scan database many times. Second, it is as efficient as Pattern-growth algorithm like PrefixSpan and need not compress the database into the memory. Third, it can work with any known algorithm in mining discontinuous or continuous patterns. The algorithms developed in the dissertation are important because they can be applied to mine knowledge from sequential data which are generated often in our daily life.

APA, Harvard, Vancouver, ISO, and other styles

44

Lin, Chia-Sheng, and 林佳生. "Mining Sequential Patterns with Multiple Minimum Supports." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/85600037678334560317.

Full text

Abstract:

碩士
國立中央大學
資訊管理研究所
91
Sequential mining is becoming more and more important recently. Traditional sequential pattern mining algorithms used the same model, i.e., finding all sequential patterns that satisfy one user-specified minimum support. However, using only one single minimum support implies that all items in the data are of the same nature and/or have similar frequencies in the database. This is not often the case in real-life applications. In this paper, first we extended traditional one minimum support for all sequential patterns with multiple item supports. Second, we developed an effective algorithm called MS-PrefixSpan. Its general idea is using a conditional minimum support as a threshold to qualify items in each projected database for candidate length-1 sequential patterns. According to each projected database the conditional minimum support is gradually adjusted to reflect the actual minimum support of each maximal sequential pattern. Besides, in order to claim that MS-PrefixSpan can find all and only all maximal sequential patterns satisfying their own MSSP, we also provide a theorem to prove the correctness of MS-PrefixSpan. Third, our experimental result shows that MS-PrefixSpan indeed can substantially reduce the execution time and the number of produced sequential patterns.

APA, Harvard, Vancouver, ISO, and other styles

45

Lin, Shao-Yuan, and 林少源. "Mining Sequential Patterns Using Graph Search Techniques." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/12539847525602202744.

Full text

Abstract:

碩士
國立雲林科技大學
電子與資訊工程研究所碩士班
90
Sequential patterns discovery has emerged as an important problem in data mining. In this thesis, we propose an effective GST algorithm for mining sequential patterns in a large transaction database. Different from the Apriori-like algorithms, the GST algorithm can out of order find large k-sequences (k >= 3); i.e., we can find large k-sequences not directly through large (k-1)-sequences. This leads to that our algorithm has much better performance than the Apriori-like algorithms. Besides, we also propose the method to find new sequential patterns by scanning only new transactions since the database was updated. Through several comprehensive experiments, the GST algorithm gains a significant performance improvement over the Apriori-like algorithms. Also we found as long as the ratio of the items purchased in new transactions is not close to 100%, scanning only new transactions is always much better than scanning the entire database.

APA, Harvard, Vancouver, ISO, and other styles

46

Chang, Heng-Ke, and 張衡閣. "A New Method of Multi-Dimensional Sequential Rules Mining from Databases." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/74021370957057807972.

Full text

Abstract:

碩士
朝陽科技大學
資訊管理系碩士班
90
Data Mining has become one of the fast growing areas of research in recent years. Besides association rules mining, researchers endeavor to develop mining methods with time factor considered. Popular research topics include customers buying patterns analysis, Internet surfing sequence analysis, trend analysis, and so on. When probing the customers buying sequential patterns, most developed mining methods require repeated database scans to generate candidate patterns, which are then checked to find frequent sequential patterns. It therefore deteriorates the performances of these methods. This paper presents a Frequent Pattern Adjacent Matrix (FPAM) to record intermediate length-2 patterns. After finding the frequent patterns, it only needs one more round of database scan to find all the sequential patterns by taking advantages of FPAM. Without generating unnecessary patterns, the proposed method is an efficient method for mining frequent sequential patterns from databases. However, the existence of patterns is often related to the circumstances or conditions. A circumstance has to be considered in different views. For example, when a customer buys a product, not only the priority of purchasing, but variables such as region, time, climate and customer category should be also taken into account. A more applicable sequential pattern to the real situation can therefore be mined. In this paper, we embedded FPAM as the algorithm for sequential patterns mining. Furthermore, we applied Rough Set Theory for multi-dimensional analysis. After a construction of Rough Set index structure, sequential and multi-dimensional patterns are combined to obtain multi-dimensional sequential patterns. With this kind of approach, when given frequent patterns, we can enhance the efficiency of data mining by rescanning the database only once.

APA, Harvard, Vancouver, ISO, and other styles

47

Lee, Wang-Jung, and 李宛蓉. "A Hybrid of Sequential Rules and Collaborative Filtering for Product Recommendation." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/10683024951188371694.

Full text

Abstract:

碩士
國立交通大學
資訊管理研究所
94
Customers’ purchase behavior may vary over time. Traditional Collaborative Filtering (CF) methods use similar customers’ purchase behavior to provide recommendations to the target customer, without considering customers’ purchase behavior over time. The sequential rule-based recommendation method mainly analyzes customers’ purchase behavior over time to extract sequential rules with the form: purchase behavior over past periods => purchase behavior at current period. If a target customer’s purchase behavior over past periods is similar to the conditional part of the rule, then the purchase behavior of the customer at current period is predicted to be the consequent part of the rule. Although the sequential rule method considers customers’ purchase sequences over time, it does not make use of the target customer’s purchase data at current period. This work proposes a novel hybrid recommendation method that combines sequential rule and CF methods. The proposed method uses customers’ RFM (Recency, Frequency, and Monetary) values to cluster customers into groups with similar RFM values. For each group of customers, sequential rules are extracted from purchase sequences of that group to make recommendations. In addition, a KNN-based CF method is adopted to provide recommendations based on the target customer’s purchase data at current period. The results of the two methods are combined to make final recommendations. The experimental result shows that the hybrid method performs better than other methods.

APA, Harvard, Vancouver, ISO, and other styles

48

Wu, Chien-Hsin, and 吳建興. "Document retrieval based on mining keywords sequential patterns." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/24612895699444367999.

Full text

Abstract:

碩士
淡江大學
資訊工程學系碩士班
98
In this age of information explosion, internet users through today''s information-search feature, most quickly retrieved a large number of relevant information, but in all probability due to the poor and various impacts retrieval system discrimination factors exist, making users usually retrieved too much information. This study to the problem, is to “Document retrieval based on mining keywords sequential patterns” to purpose and solve in the topic. This information is often related to a user''s expectations far cry, although to be able to put together all of a sudden to search for a great deal of information, but it still cannot really reaches its data to search for the necessary efficiency. First, the study use keywords which user’s key in Successively select that not match the web site and decrease result. And using sequential patterns mining to strengthen and guess user what they want documents, according to this result to calculate the rank, speeding up looking for documents by user.The experiment shows, and most research sites the comparison is retrieved as a result, this method can be retrieved more simple and correct information.

APA, Harvard, Vancouver, ISO, and other styles

49

Hao, Wei-Hua, and 郝維華. "Efficient Mining and Maintaining Algorithms for Sequential Patterns." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/51482633830794967649.

Full text

Abstract:

博士
淡江大學
資訊工程學系博士班
96
Since the invention of characters, all kinds of records with numbers and words, increased dramatically in many domains. The invention of computer had trigger rapid data accumulation in science, education, e-Learning, business and supermarket, exponentially. Automation of information system has urged this situation and result in huge amount of data stored in databases worldwide. These electronic data is considered to be the mirror of the real world, which we can make full use it, properly. We try to discover interesting information or knowledge that conceived in databases via various methods, such as statistic, query, graphics and data mining, to further understand the world we lived in. A modern day information system is accountable for this purpose. In these diverse approaches, data mining has caught the eyes of many domain experts, and gain achievements. In the last decade, mining sequential patterns became one promising topic and arouse our interest. The essence of data mining is to dealing with huge amount of data, many previous researches focus on propose efficient algorithm with less search time and run time. Hence, this dissertation has focused on develop algorithms that mining sequential pattern efficiently, and to be maintainable. In our point of view there are two criteria to evaluate mining algorithm: scan times of database, volume of working space and searching space. As we already known the speed of accessing data from hard drive is slower than from main memory, usually by factor of two. This implies that the less scan times the better. Working space is required to host data during mining process. Searching space is the space to store the result, frequent sequence set or data model. The less space required by algorithm the more chance to fit all processed data into main memory. Consequently, both the efficiency and performance will be improved. With these in mind, all three algorithms been proposed in this dissertation have these three characters: scan database once, mining without candidates and mining full set of frequent sequences. First algorithm, FAL, is designed to fully utilize both the downward closure property and upward closure property to construct a lattice data model with maximal sequence representation. Second algorithm is FMCSP that inherit the legacy of FAL, but applied closed sequence concept instead of maximal sequence. Note that, closed sequence is the longest sequence in its equivalent class, it can shrink the size of searching space, and furthermore, with adjustable ability for user to set, or tune, the threshold of minimum support after the mining of data model had been constructed. Finally, algorithm MMSP has inherited the legacy of previous algorithms to deal with incremental sequence database. MMSP is capable to handle incremental data added into data model one by one and batch data without rerun whole database from scratch.

APA, Harvard, Vancouver, ISO, and other styles

50

Yen, Chih-Yu, and 顏志祐. "A Procedure to Discover More Meaningful Sequential Patterns." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/81102701480125650843.

Full text

Abstract:

碩士
淡江大學
資訊管理學系碩士班
96
Sequential pattern mining technique is developed to determine time-related behavior in sequence databases. Most of the previous proposed methods discover frequent subsequences as patterns but do not consider the confidence issue. Besides, although the discovered sequential patterns can reveal the order of events, but the time between events is not well determined. This dissertation presents, E-PrefixSpan, a new method for mining frequent and more confident association rules from sequential databases. The method is based on the PrefixSpan[20] algorithm. To take the advantage of the pattern-growth[21] mining approach and discover the time related sequential patterns, E-PrefixSpan records the time-intervals between items and creates projected databases to reduce the times of database scanning. Sequential pattern mining often generates a huge number of rules. To reduce the number of the correlated pattern without information loss, E-PrefixSpan applys the confidence pattern mining technique . The proposed approach is compared to existing sequential pattern mining methods to show how they complement each other to discover association rules. Our performance study shows that E-PrefixSpan is a valuable approach to condense the correlated patterns and provide additional time-interval information for sequential pattern.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Sequential rules and patterns'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles