To see the other types of publications on this topic, follow the link: String matching algorithms.

Dissertations / Theses on the topic 'String matching algorithms'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'String matching algorithms.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Cheng, Lok-lam, and 鄭樂霖. "Approximate string matching in DNA sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B29350591.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

黎少斌 and Shiao-bun Lai. "Trading off time for space for the string matching problem." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1996. http://hub.hku.hk/bib/B31214216.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Lai, Shiao-bun. "Trading off time for space for the string matching problem /." Hong Kong : University of Hong Kong, 1996. http://sunzi.lib.hku.hk/hkuto/record.jsp?B18061795.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Pockrandt, Christopher Maximilian [Verfasser]. "Approximate String Matching : Improving Data Structures and Algorithms / Christopher Maximilian Pockrandt." Berlin : Freie Universität Berlin, 2019. http://d-nb.info/1183675879/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Klaib, Ahmad. "Exact string matching algorithms for searching DNA and protein sequences and searching chemical databases." Thesis, University of Huddersfield, 2014. http://eprints.hud.ac.uk/id/eprint/24266/.

Full text
Abstract:
The enormous quantities of biological and chemical files and databases are likely to grow year on year, consequently giving rise to the need to develop string-matching algorithms capable of minimizing the searching response time. Being aware of this need, this thesis aims to develop string matching algorithms to search biological sequences and chemical structures by studying exact string matching algorithms in detail. As a result, this research developed a new classification of string matching algorithms containing eight categories according to the pre-processing function of algorithms and proposed five new string matching algorithms; BRBMH, BRQS, Odd and Even algorithm (OE), Random String Matching algorithm (RSMA) and Skip Shift New algorithm (SSN). The main purpose behind the proposed algorithms is to reduce the searching response time and the total number of comparisons. They are tested by comparing them with four well- known standard algorithms, Boyer Moore Horspool (BMH), Quick Search (QS), TVSBS and BRFS. This research applied all of the algorithms to sample data files by implementing three types of tests. The number of comparison tests showed a substantial difference in the number of comparisons our algorithms use compared to the non-hybrid algorithms such as QS and BMH. In addition, the tests showed considerable difference between our algorithms and other hybrid algorithm such as TVSBS and BRFS. For instance, the average elapsed search time tests showed that our algorithms presented better average elapsed search time than the BRFS, TVSBS, QS and BMH algorithms, while the average number of tests showed better number of attempts compared to BMH, QS, TVSBS and BRFS algorithms. A new contribution has been added by this research by using the fastest proposed algorithm, the SSN algorithm, to develop a chemical structure searching toolkit to search chemical structures in our local database. The new algorithms were paralleled using OpenMP and MPI parallel models and tested at the University of Science Malaysia (USM) on a Stealth Cluster with different number of threads and processors to improve the speed of searching pattern in the given text which, as we believe, is another contribution.
APA, Harvard, Vancouver, ISO, and other styles
6

Ãvila, Ricardo Lima Feitosa. "Emprego de tÃcnicas de prÃ-processamento textual e algoritmos de comparaÃÃo como suporte à correÃÃo de questÃes dissertativas: experimentos, anÃlises e contribuiÃÃes." Universidade Federal do CearÃ, 2013. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=10572.

Full text
Abstract:
CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior
Esta dissertaÃÃo apresenta um estudo de tÃcnicas que podem ser empregadas como apoio para a correÃÃo de questÃes dissertativas com base na adaptaÃÃo de algoritmos de comparaÃÃo textual combinados a tÃcnicas de prÃ-processamento de textos. O principal desafio na concepÃÃo de uma ferramenta para este tipo de aplicaÃÃo à a ambiguidade da linguagem natural. Para analisar situaÃÃes de correÃÃo de questÃes subjetivas, foram efetuados testes com esses algoritmos, tendo-se desenvolvido uma ferramenta para tal propÃsito. Confrontando respostas de alunos ao padrÃo de resposta de questÃes propostas em provas subjetivas, foram analisados o desempenho individual dos algoritmos e de um conjunto de tÃcnicas de prÃ-processamento que sÃo encontrados na literatura, de maneira isolada e combinada. Buscando contornar situaÃÃes especÃficas de falso negativo e falso positivo, foram propostas algumas tÃcnicas auxiliares como contribuiÃÃo deste trabalho. ApÃs a anÃlise dos experimentos realizados, os resultados de Ãndice de similaridade entre respostas indicam o uso da soluÃÃo como suporte a correÃÃo de questÃes discursivas, podendo, ainda, ser aplicado na detecÃÃo de plÃgio e ser integrado a um ambiente virtual de ensino e aprendizagem.
This master thesis presents a study of techniques used as support for a correction of essay questions based in an adaptation of string-matching algorithms combined with text preprocessing techniques. The main challenge to design a tool like this is an ambiguity of natural language. To analyze a correction of subjective questions, tests were performed with these algorithms, and a tool have been developed for this purpose. Comparing student responses with response pattern of questions proposed in subjective tests, we analyzed the performance of individual algorithms and a set of pre-processing techniques that are found in the literature, in isolation and combined. Seeking to neutralize specific situations of false negative and false positive, some techniques have been proposed as auxiliary contribution of this work. After analyzing the experiments, the results of similarity index between responses indicate the use of the solution to support the correction of essay questions, and may also be applied in the detection of plagiarism and be integrated to a learning management system.
APA, Harvard, Vancouver, ISO, and other styles
7

Toth, Róbert. "Přibližné vyhledávání řetězců v předzpracovaných dokumentech." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236122.

Full text
Abstract:
This thesis deals with the problem of approximate string matching, also called string matching allowing errors. The thesis targets the area of offline algorithms, which allows very fast pattern matching thanks to index created during initial text preprocessing phase. Initially, we will define the problem itself and demonstrate variety of its applications, followed by short survey of different approaches to cope with this problem. Several existing algorithms based on suffix trees will be explained in detail and new hybrid algorithm will be proposed. Algorithms wil be implemented in C programming language and thoroughly compared in series of experiments with focus on newly presented algorithm.
APA, Harvard, Vancouver, ISO, and other styles
8

Neto, Domingos Soares. "Filtros para a busca e extração de padrões aproximados em cadeias biológicas." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-19102009-002745/.

Full text
Abstract:
Esta dissertação de mestrado aborda formulações computacionais e algoritmos para a busca e extração de padrões em cadeias biológicas. Em particular, o presente texto concentra-se nos dois problemas a seguir, considerando-os sob as distâncias de Hamming e Levenshtein: a) como determinar os locais nos quais um dado padrão ocorre de modo aproximado em uma cadeia fornecida; b) como extrair padrões que ocorram de modo aproximado em um número significativo de cadeias de um conjunto fornecido. O primeiro problema, para o qual já existem diversos algoritmos polinomiais, tem recebido muita atenção desde a década de 60, e ganhou novos ares com o advento da biologia computacional, nos idos dos anos 80, e com a popularização da Internet e seus mecanismos de busca: ambos os fenômenos trouxeram novos obstáculos a serem superados, em razão do grande volume de dados e das bastante justas restrições de tempo inerentes a essas aplicações. O segundo problema, de surgimento um pouco mais recente, é intrinsicamente desafiador, em razão de sua complexidade computacional, do tamanho das entradas tratadas nas aplicações mais comuns e de sua dificuldade de aproximação. Também é de chamar a atenção o seu grande potencial de aplicação. Neste trabalho são apresentadas formulações adequadas dos problemas abordados, assim como algoritmos e estruturas de dados essenciais ao seu estudo. Em especial, estudamos a extremamente versátil árvore dos sufixos, assim como uma de suas generalizações e sua estrutura irmã: o vetor dos sufixos. Grande parte do texto é dedicada aos filtros baseados em q-gramas para a busca aproximada de padrões e algumas de suas mais recentes variações. Estão cobertos os algoritmos bit-paralelos de Myers e Baeza-Yates-Gonnet para a busca de padrões; os algoritmos de Sagot para a extração de padrões; os algoritmos de filtragem de Ukkonen, Jokinen-Ukkonen, Burkhardt-Kärkkäinen, entre outros.
This thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others.
APA, Harvard, Vancouver, ISO, and other styles
9

Dubois, Simon. "Offline Approximate String Matching forInformation Retrieval : An experiment on technical documentation." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-22566.

Full text
Abstract:
Approximate string matching consists in identifying strings as similar even ifthere is a number of mismatch between them. This technique is one of thesolutions to reduce the exact matching strictness in data comparison. In manycases it is useful to identify stream variation (e.g. audio) or word declension (e.g.prefix, suffix, plural). Approximate string matching can be used to score terms in InformationRetrieval (IR) systems. The benefit is to return results even if query terms doesnot exactly match indexed terms. However, as approximate string matchingalgorithms only consider characters (nor context neither meaning), there is noguarantee that additional matches are relevant matches. This paper presents the effects of some approximate string matchingalgorithms on search results in IR systems. An experimental research design hasbeen conducting to evaluate such effects from two perspectives. First, resultrelevance is analysed with precision and recall. Second, performance is measuredthanks to the execution time required to compute matches. Six approximate string matching algorithms are studied. Levenshtein andDamerau-Levenshtein computes edit distance between two terms. Soundex andMetaphone index terms based on their pronunciation. Jaccard similarity calculatesthe overlap coefficient between two strings. Tests are performed through IR scenarios regarding to different context,information need and search query designed to query on a technicaldocumentation related to software development (man pages from Ubuntu). Apurposive sample is selected to assess document relevance to IR scenarios andcompute IR metrics (precision, recall, F-Measure). Experiments reveal that all tested approximate matching methods increaserecall on average, but, except Metaphone, they also decrease precision. Soundexand Jaccard Similarity are not advised because they fail on too many IR scenarios.Highest recall is obtained by edit distance algorithms that are also the most timeconsuming. Because Levenshtein-Damerau has no significant improvementcompared to Levenshtein but costs much more time, the last one is recommendedfor use with a specialised documentation. Finally some other related recommendations are given to practitioners toimplement IR systems on technical documentation.
APA, Harvard, Vancouver, ISO, and other styles
10

Frey, Jeffrey Daniel. "Finding Song Melody Similarities Using a DNA String Matching Algorithm." Kent State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=kent1208961242.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Momeninasab, Leila. "Design and Implementation of a Name Matching Algorithm for Persian Language." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-102210.

Full text
Abstract:
Name matching plays a vital and crucial role in many applications. They are for example used in information retrieval or deduplication systems to do comparisons among names to match them together or to find the names that refer to identical objects, persons, or companies. Since names in each application are subject to variations and errors that are unavoidable in any system and because of the importance of name matching, so far many algorithms have been developed to handle matching of names. These algorithms consider the name variations that may happen because of spelling, pattern or phonetic modifications. However most existing methods were developed for use with the English language and so cover the characteristics of this language. Up to now no specific one has been designed and implemented for the Persian language. The purpose of this thesis is to present a name matching algorithm for Persian. In this project, after consideration of all major algorithms in this area, we selected one of the basic methods for name matching that we then expanded to make it work particularly well for Persian names. This proposed algorithm, called Persian Edit Distance Algorithm or shortly PEDA, was built based on the characteristics of the Persian language and it compares Persian names with each other on three levels: phonetic similarity, character form similarity and keyboard distance, in order to give more accurate results for Persian names. The algorithm gets Persian names as its input and determines their similarity as a percentage in the output. In this thesis three series of experiments have been accomplished in order to evaluate the proposed algorithm. The f-measure average shows a value of 0.86 for the first series and a value of 0.80 for the second series results. The first series of experiments have been repeated with Levenshtein as well, and have 33.9% false negatives on average while PEDA has a false negative average of 6.4%. The third series of experiments shows that PEDA works well for one edit, two edits and three edits with true positive average values of 99%, 81%, and 69% respectively.
APA, Harvard, Vancouver, ISO, and other styles
12

Silva, Júnior José Bonifácio da. "Paralelização em CUDA do algoritmo Aho-Corasick utilizando as hierarquias de memórias da GPU e nova compactação da Tabela de Transcrição de Estados." Universidade Federal de Sergipe, 2017. https://ri.ufs.br/handle/riufs/3353.

Full text
Abstract:
The Intrusion Detection System (IDS) needs to compare the contents of all packets arriving at the network interface with a set of signatures for indicating possible attacks, a task that consumes much CPU processing time. In order to alleviate this problem, some researchers have tried to parallelize the IDS's comparison engine, transferring execution from the CPU to GPU. This This dissertation aims to parallelize the Brute Force and Aho-Corasick string matching algorithms and to propose a new compression of the State Transition Table of the Aho-Corasick algorithm in order to make it possible to use it in shared memory and accelerate the comparison of strings. The two algorithms were parallelized using the NVIDIA CUDA platform and executed in the GPU memories to allow a comparative analysis of the performance of these memories. Initially, the AC algorithm proved to be faster than the Brute Force algorithm and so it was followed for optimization. The AC algorithm was compressed and executed in parallel in shared memory, achieving a performance gain of 15% over other GPU memories and being 48 times faster than its serial version when testing with real network packets. When the tests were done with synthetic data (less random data) the gain reached 73% and the parallel algorithm was 56 times faster than its serial version. Thus, it can be seen that the use of compression in shared memory becomes a suitable solution to accelerate the processing of IDSs that need agility in the search for patterns.
Um Sistema de Detecção de Intrusão (IDS) necessita comparar o conteúdo de todos os pacotes que chegam na interface da rede com um conjunto de assinaturas que indicam possíveis ataques, tarefa esta que consome bastante tempo de processamento da CPU. Para amenizar esse problema, tem-se tentado paralelizar o motor de comparação dos IDSs transferindo sua execução da CPU para a GPU. Esta dissertação tem como objetivo fazer a paralelização dos algoritmos de comparação de strings Força-Bruta e Aho-Corasick e propor uma nova compactação da Tabela de Transição de Estados do algoritmo Aho-Corasick a fim de possibilitar o uso dela na memória compartilhada e acelerar a comparação de strings. Os dois algoritmos foram paralelizados utilizando a plataforma CUDA da NVIDIA e executados nas memórias da GPU a fim de possibilitar uma análise comparativa de desempenho dessas memórias. Inicialmente, o algoritmo AC mostrou-se mais veloz do que o algoritmo Força-Bruta e por isso seguiu-se para sua otimização. O algoritmo AC foi compactado e executado de forma paralela na memória compartilhada, alcançando um ganho de desempenho de 15% em relação às outras memórias da GPU e sendo 48 vezes mais rápido que sua versão na CPU quando os testes foram feitos com pacotes de redes reais. Já quando os testes foram feitos com dados sintéticos (dados menos aleatórios) o ganho chegou a 73% e o algoritmo paralelo chegou a ser 56 vezes mais rápido que sua versão serial. Com isso, pode-se perceber que o uso da compactação na memória compartilhada torna-se uma solução adequada para acelerar o processamento de IDSs que necessitem de agilidade na busca por padrões.
APA, Harvard, Vancouver, ISO, and other styles
13

Yahiaoui, Said. "Algorithmes et applications pour la coloration et les alliances dans les graphes." Thesis, Lyon 1, 2013. http://www.theses.fr/2013LYO10274.

Full text
Abstract:
Dans cette thèse, nous nous intéressons aux aspects algorithmiques et applications de deux problèmes de graphes, à savoir, la coloration et les alliances. La première partie concerne deux variantes de la coloration de graphes, la coloration Grundy et la coloration forte stricte. Nous commençons par l'étude du nombre Grundy des graphes réguliers. Nous donnons une condition fixe k, nous fournissons une condition nécessaire et suffisante pour que le nombre Grundy d'un graphe régulier soit au moins égal k. Nous caractérisons la classe des graphes cubiques (3-réguliers) pour laquelle le nombre Grundy est égal à 4, et nous présentons un algorithme linéaire pour déterminer le nombre Grundy d'un graphe cubique quelconque. Par ailleurs, en se basant sur la coloration forte stricte pour décomposer les arbres en petites composantes, nous présentons un nouvel algorithme pour l'appariement d'arbres étiquetés, non-ordonnés non-enracinés. Nous montrons que la distance calculée entre deux arbres est une pseudo-métrique. Nos expérimentations sur de larges bases synthétiques et des bases de données réelles confirment nos résultats analytiques et montrent que la distance proposée est précise et son algorithme est scalable. La seconde partie de cette thèse est consacrée aux alliances dans les graphes. Nous proposons un algorithme distribué autostabilisant pour la construction d'alliance offensive globale minimale dans un graphe arbitraire. Nous démontrons que cet algorithme converge sous le démon synchrone en temps linéaire. Ensuite, nous donnons le premier algorithme distribué autostabilisant pour le problème de l'alliance forte globale minimale dans un graphe quelconque. Nous prouvons que cet algorithme est polynomial sous le démon inéquitable distribué. Nous montrons par la suite, comment cet algorithme peut être adapté pour des généralisations du problème, comme la k-alliance forte et l'alliance forte pondérée. Enfin, en se basant sur les propriétés structurelles de l'alliance offensive, nous présentons une solution pour décentraliser le protocole de signalisation SIP. Ceci rend possible son déploiement dans un réseau mobile ad hoc
This thesis investigates the algorithmic aspects and applications of two graph problems, namely, colorings and alliances. In the first part, we focus on two variants of the proper vertex coloring, the Grundy coloring and the strict strong coloring. We start by the study of Grundy number for regular graphs. We give a sufficient condition for d-regular graphs with sufficiently large girth to have Grundy number equals d + 1. Then, using graph homomorphism, we obtain a necessary and sufficient condition for d-regular graphs to have Grundy number at least k. Moreover, we characterize cubic graphs (3-regular) for which the Grundy number is d + 1, and present a linear-time algorithm to determine the Grundy number of any arbitrary cubic graph. Subsequently, based on the strict strong coloring, we present an approach for the problem of matching labeled trees. Using this coloring, we propose a new algorithm to deterministically decompose a tree into small components. This leads to an efficient algorithm to measure an accurate distance between unrooted unordered labeled trees. The second part is devoted to the alliances in graphs. We first propose a linear-time self-stabilizing algorithm for the minimal global offensive alliance set problem, under the synchronous distributed scheduler. Then, we give the first self-stabilizing algorithm for the minimal global powerful alliance set problem in arbitrary graphs. Moreover, we show how this algorithm can be adapted to find the minimal global powerful k-alliance and the minimal weighted global powerful alliance sets. We prove that all these algorithms converge in polynomial-time under the unfair distributed scheduler. Finally, based on the structural properties of the offensive alliance, we propose a solution to decentralize the signaling protocol SIP. This enables SIP applications in mobile ad hoc networks
APA, Harvard, Vancouver, ISO, and other styles
14

Moradi, Arvin. "Smart Clustering System for Filtering and Cleaning User Generated Content : Creating a profanity filter for Truecaller." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-124408.

Full text
Abstract:
This thesis focuses on investigating and creating an application for filtering user-generated content. The method was to examine how profanity and racist expressions are used and manipulated to evade filtering processes in similar systems. Focus also went on to study different algorithms to get this process to be quick and efficient, i.e., to process as many names in the shortest amount of time possible. This is because the client needs to filter millions of new uploads every day. The result shows that the application detects profanity and manipulated profanity. Data from the customer’s database was also used for testing purposes, and the result showed that the application also works in practice. The performance test shows that the application has a fast execution time. We could see this by approximating it to a linear func-tion with respect to time and the number of names entered. The conclusion was that the filter works and discovers profanity not detected earlier. Future updates to strengthen the decision process could be to introduce a third-party service, or a web interface where you can manually control decisions. Execution time is good and shows that 10 million names can be pro-cessed in about 6 hours. In the future, one can parallelize queries to the database so that multiple names can be processed simultaneously.
Denna avhandling fokuserar på att utreda och skapa en applikation för filtrering av användargenererat innehåll. Metoden gick ut på att undersöka hur svordomar samt rasistiska uttryck används och manipuleras för att undgå filtrerings processer i liknande system. Fokus gick även ut på att studera olika algoritmer för att få denna process att vara snabb och effektiv, dvs kunna bearbeta så många namn på kortast möjliga tid. Detta beror på att kunden i detta sammanhang får in miljontals nya uppladdningar varje dag, som måste filtreras innan använding. Resultatet visar att applikationen upptäcker svordomar i olika former. Data från kundens databas användes också för test syfte, och resultatet visade att applikationen även fungerar i praktiken. Prestanda testet visar att applikationen har en snabb exekveringstid. Detta kunde vi se genom att estimera den till en linjär funktion med hänsyn till tid och antal namn som matats in. Slutsatsen blev att filtret fungerar och upptäcker svordomar som inte upptäckts tidigare i kundens databas. För att stärka besluten i processen kan man i framtida uppdateringar införa tredje parts tjänster, eller ett web interface där man manuelt kan styra beslut. Exekverings tiden är bra och visar att 10 miljoner namn kan bearbetas på cirka 6 timmar. I framtiden kan man parallellisera förfrågningarna till databasen så att flera namn kan bearbetas samtidigt.
APA, Harvard, Vancouver, ISO, and other styles
15

Alex, Ann Theja. "Local Alignment of Gradient Features for Face Photo and Face Sketch Recognition." University of Dayton / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1353372694.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Martin, Benjamin. "Analyse de structures répétitives dans les séquences musicales." Thesis, Bordeaux 1, 2012. http://www.theses.fr/2012BOR14711/document.

Full text
Abstract:
Cette thèse rend compte de travaux portant sur l’inférence de structures répétitives à partir du signal audio à l’aide d’algorithmes du texte. Son objectif principal est de proposer et d’évaluer des algorithmes d’inférence à partir d’une étude formelle des notions de similarité et de répétition musicale.Nous présentons d’abord une méthode permettant d’obtenir une représentation séquentielle à partir du signal audio. Nous introduisons des outils d’alignement permettant d’estimer la similarité entre de telles séquences musicales, et évaluons l’application de ces outils pour l’identification automatique de reprises. Nous adaptons alors une technique d’indexation de séquences biologiques permettant une estimation efficace de la similarité musicale au sein de bases de données conséquentes.Nous introduisons ensuite plusieurs répétitions musicales caractéristiques et employons les outils d’alignement pour identifier ces répétitions. Une première structure, la répétition d’un segment choisi, est analysée et évaluée dans le cadre dela reconstruction de données manquantes. Une deuxième structure, la répétition majeure, est définie, analysée et évaluée par rapport à un ensemble d’annotations d’experts, puis en tant qu’alternative d’indexation pour l’identification de reprises.Nous présentons enfin la problématique d’inférence de structures répétitives telle qu’elle est traitée dans la littérature, et proposons notre propre formalisation du problème. Nous exposons alors notre modélisation et proposons un algorithme permettant d’identifier une hiérarchie de répétitions. Nous montrons la pertinence de notre méthode à travers plusieurs exemples et en l’évaluant par rapport à l’état de l’art
The work presented in this thesis deals with repetitive structure inference from audio signal using string matching techniques. It aims at proposing and evaluating inference algorithms from a formal study of notions of similarity and repetition in music.We first present a method for representing audio signals by symbolic strings. We introduce alignment tools enabling similarity estimation between such musical strings, and evaluate the application of these tools for automatic cover song identification. We further adapt a bioinformatics indexing technique to allow efficient assessments of music similarity in large-scale datasets. We then introduce several specific repetitive structures and use alignment tools to analyse these repetitions. A first structure, namely the repetition of a chosen segment, is retrieved and evaluated in the context of automatic assignment of missingaudio data. A second structure, namely the major repetition, is defined, retrieved and evaluated regarding expert annotations, and as an alternative indexing method for cover song identification.We finally present the problem of repetitive structure inference as addressed in literature, and propose our own problem statement. We further describe our model and propose an algorithm enabling the identification of a hierarchical music structure. We emphasize the relevance of our method through several examples and by comparing it to the state of the art
APA, Harvard, Vancouver, ISO, and other styles
17

Nosek, Ondřej. "Hardwarová akcelerace algoritmu pro hledání podobnosti dvou DNA řetězců." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2007. http://www.nusl.cz/ntk/nusl-236882.

Full text
Abstract:
Methods for aproximate string matching of various sequences used in bioinformatics are crucial part of development in this branch. Tasks are of very large time complexity and therefore we want create a hardware platform for acceleration of these computations. Goal of this work is to design a generalized architecture based on FPGA technology, which can work with various types of sequences. Designed acceleration card will use especially dynamic algorithms like Needleman-Wunsch and Smith-Waterman.
APA, Harvard, Vancouver, ISO, and other styles
18

Kuang, Hsueh Jui, and 薛睿光. "A Study on String-Matching Algorithms." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/45412037298316276441.

Full text
Abstract:
碩士
大同大學
應用數學研究所
91
In this thesis, we give experimental results on the performance of several popular string-matching algorithms. Meanwhile the behaviors of the costs of some variant Boyer-Moore algorithms are investigated. The costs considered include the numbers of character inspections, and the total time used.
APA, Harvard, Vancouver, ISO, and other styles
19

Shih, Hung-Te, and 施鴻德. "A Survey of String-Matching Algorithms." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/18193520008520957794.

Full text
Abstract:
碩士
逢甲大學
應用數學所
92
We study the algorithms for pattern matching, i.e., searching a query substring in a given string, which appears in the text processing, image processing, and other applications. The algorithms cover the Direct String-Matching algorithms, the Rabin-Karp algorithm, the Finite Automata, the Knuth-Morris-Pratt algorithm, the Boyer-Moore algorithm, etc. The time complexity, worst-case performance and the criteria for choosing these algorithms are also studied.
APA, Harvard, Vancouver, ISO, and other styles
20

Chen, Kuei-Hao, and 陳奎昊. "Improved Algorithms for Exact String Matching Problems." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/64377810416197972394.

Full text
Abstract:
博士
國立暨南國際大學
資訊工程學系
101
In this dissertation, we consider two problems: the exact string matching problem and its variation, the exact circular string matching. The exact string matching problem is: Given two strings, a text $T$ of length $n$ and a pattern $P$ of length $m$, find all occurrences of $P$ in $T$. We propose a strategy to analyze the average performance of the reverse factor algorithm. The analysis is based on the assumption that the text is very long as compared to the length of the pattern, and each symbol in the text is drawn uniformly from a random source with $sigma$ symbols. Our analysis uses only elementary techniques in probability theory and avoids applying complicated combinatorics in stringology. We also propose a new algorithm for exact string matching problem under the name Improved BNDM algorithm. The Improved BNDM algorithm uses the $q$-gram filtering technique to speed up the performance of the Turbo BNDM algorithm. The time complexity of the Improved BNDM algorithm achieves $mathcal{O}(n)$ in the worst case and $mathcal{O}(nlog_sigma m/m)$ in the average case where $sigma$ is the alphabet size. It is optimal in both worst case and average case. Another problem we discuss in this dissertation is the exact circular string matching problem. Given a string $P=p_1p_2cdots p_m$, let a string $P^{(i)}=p_ip_{i+1}cdots p_mp_1cdots p_{i-1}$. The exact circular string matching problem is: Given two strings, $T$ of length $n$ and $P$ of length $m$, find all occurrences of $P^{(i)}$ in $T$ for $1leq i leq m$. We propose two algorithms that perform searching of a circular string in the text using bit-parallel technique. Our algorithms use only the composition of bitwise-logical operations and basic arithmetic operations, and apply this technique to solve the problem. These algorithms are given names CSBNDM and $ ext{CSBNDN}_q$, respectively. We give several experiments to verify that they have good performance for random strings and DNA sequence.
APA, Harvard, Vancouver, ISO, and other styles
21

Lu, Chia Wei, and 呂嘉維. "Efficient Exact and Approximate String Matching Algorithms." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/49922113552090356808.

Full text
Abstract:
博士
國立清華大學
資訊工程學系
102
In this thesis, we first propose two algorithms for exact string matching problem, which aims to find all the positions i's in a given text where a given pattern occurs. Our algorithms find an optimal selective comparing order of the characters of the pattern so that we could have a better performance in the searching phase. To find the optimal comparing order, we adopt the branch and bound approach. Moreover, our proposed algorithm can be combined with other existing exact string matching algorithms to improve the searching efficiency. The experimental results show that our algorithms indeed have the smallest number of character comparisons and are also efficient in time as compared with other existing exact string matching algorithms. Second, we propose a new filtration algorithm, as well as a hybrid filtration strategy, to efficiently solve the approximate string matching problem (also called the k-difference problem), which aims to find all the positions i's in a given text such that there exists a substring of the text ending at position i whose edit distance from a given pattern is less than or equal to a given error bound k. Our experimental results on simulated datasets of DNA sequences show that when compared with other filtration algorithms, our filtration algorithm has better performance on the efficiency to filter out those positions of the text at which the pattern does not occur approximately. Moreover, our hybrid filtration strategy further improves the effectiveness of our filtration algorithm. Third, we propose a progressive approach to solve the DNA resequencing problem which is defined as follows: We are given an unknown DNA sequence X and a known reference sequence R. Our task is to see whether X and R are similar or not. The present popular approach is to break up X into subsequences by the next generation sequencing (NGS) technologies, called reads. We then map the reads of X onto R with a suitable error bound. However, if the similarity between X and R is not very high (<95%), there would be many reads unmapped, and we then cannot obtain the mutations inside the unmapped regions. One can use a large error bound to increase the number of reads mapped. But it is not a good solution because increasing error bound will also increase the probability of false positive mapping. Our approach uses a small error bound and to increase the number of reads mapped, our approach modifies R each time after the reads are mapped. Thus our approach is a progressive approach. Compared with other available tools, our approach allows us to be able to map more reads to the reference sequence. In our simulated experiments, we also show the high correctness of our mapping algorithm.
APA, Harvard, Vancouver, ISO, and other styles
22

Lu, Chia-Wei. "String Matching Algorithms Based upon the Uniqueness Property and an Approximate String Matching Algorithm Based upon the Candidate Elimination Method." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200815183100.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Lu, Chia-Wei, and 呂嘉維. "String Matching Algorithms Based upon the Uniqueness Property and an Approximate String Matching Algorithm Based upon the Candidate Elimination Method." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/17632385982720975496.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
96
In this thesis, we consider two problems, exact string matching and approximate string matching problem. The exact string matching problem is to determine all of the locations of a pattern string P appearing in a text string T. We first point out a uniqueness property of a given pattern. We then propose four algorithms to solve the exact string matching problem based upon the uniqueness property. Experimental results showed that three of our algorithms are faster than the KMP and Boyer-Moore algorithms. The approximate string matching problem is defined as follows: Given a text string T, a pattern string P and an error bound k, find all substrings of T whose edit distances with P are smaller than or equal to k. We give a method to eliminate candidate locations in text T as there can be no substring S starting from those locations such that the edit distance between S and pattern P is smaller than or equal to k. Our method is simple to implement. Experimental results show that our method is effective, especially in the case when we perform natural language searching.
APA, Harvard, Vancouver, ISO, and other styles
24

Lin, Po-Ching, and 林柏青. "Accelerating String Matching Algorithms for Deep Packet Inspection." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/05808898005943344191.

Full text
Abstract:
博士
國立交通大學
資訊科學與工程研究所
96
An increasing number of network devices inspect the packet content for various signatures of security violation. Accelerating the process, namely deep packet inspection (DPI), can be in two aspects: algorithm and packet flow. This work focuses on the former. We first review existing string matching algorithms for DPI to see what work has been done and what should be addressed. In the review, we indicate the pros and cons of each algorithm, and a scalable, efficient design to meet the different characteristics of the pattern set in various applications is still a challenge. We also profile the performance of string matching in practical applications for various numbers, lengths and character distributions of the patterns to realize the key factors in efficient string matching. The results shed light on which approach is appropriate for each DPI application. Considering the above study, we design a hardware-based string-matching engine that can exploit algorithmic heuristics for acceleration, and a hybrid software method for applications with a large pattern set, say anti-virus applications. However, it is insufficient to accelerate some DPI applications, such as Web filter, with string matching alone. We thus present a probabilistic approach, namely the early decision algorithm to accelerate the classification in Web filtering. This work features a number of observations and contributions. First, the review and profiling study several major DPI applications, including intrusion detection, anti-virus and Web filtering, unlike most existing work that focuses on intrusion detection. The study shows string matching is not so critical in intrusion detection, and its development for DPI should also cover other applications more. The profiling indicates memory access (thus also cache locality), verification frequency and shift distance of the search window are major factors that affect the performance. Second, the presented hardware architecture of string matching, namely Bloom Filter Accelerated Sub-linear Time (BFAST), can exploit algorithmic heuristics with Bloom filters to scan the content in sub-linear time. The bad-block heuristic in the BFAST has better performance than that from the conventional table due to the precise information kept in the Bloom filters. We also propose practical techniques to handle the worst case, and the theoretical method to achieve linear worst-case time. The simulation shows that the peak throughput of a single match engine can achieve up to 9.34 Gbps for 16,384 patterns with an eight-character block in the heuristics. A hybrid method for virus scanning is also presented since the hardware approach may be not applicable. We partition the patterns in ClamAV into long and short ones. An algorithm enhanced from the Wu-Manber (WM) algorithm, namely the Backward-Hashing algorithm (BH), is responsible for only long patterns to lengthen the average skip distance, while the Aho-Corasick algorithm scans for only short patterns to reduce the automaton sizes. The former utilizes the bad-block heuristic to exploit long shift distance and reduce the verification frequency, so it is much faster than the original WM implementation in ClamAV. The latter increases the AC performance by around 50% due to better cache locality. Besides speeding up string matching, we present a simple, but effective early decision algorithm to accelerate the filtering process by examining only part of the Web content. The algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content should belong to a banned or an allowed category. The experiments show the algorithm can examine only around one-fourth of Web content on average, while the accuracy remains fairly good. This algorithm can complement other Web filtering approaches to filter the Web content with high efficiency. Keywords: string matching, algorithm, deep packet inspection
APA, Harvard, Vancouver, ISO, and other styles
25

Huang, Lan-Ya. "An Exact String Matching Algorithms Using Hashing Scheme." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200814285000.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Huang, Lan-Ya, and 黃蘭雅. "An Exact String Matching Algorithms Using Hashing Scheme." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/16933370591311289241.

Full text
Abstract:
碩士
國立暨南國際大學
資訊管理學系
96
In this thesis, we consider how to solve the exact string matching problem. The exact string matching problem is to find all locations of a pattern string P in a text string T. In general, string matching algorithm problem work in linear time and linear space. The two well-known of them are Knuth-Morris-Pratt (KMP) algorithm and Boyer-Moore (BM) algorithm. We will use the hashing scheme to solve the exact string matching problem. Our method is simple to implement, and our algorithm work in constant space. Experimental shows that our algorithm is better than Brute Force algorithm and KMP algorithm.
APA, Harvard, Vancouver, ISO, and other styles
27

Chen, Hong-Chieh, and 陳宏杰. "A Review of String Matching Algorithms with Constant Space." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/77479923915180499277.

Full text
Abstract:
碩士
國立暨南國際大學
資訊管理學系
94
The string matching problem is to find all occurrences of a given pattern string P in a text string T. It is a classical and important problem in computer science and two well-known string matching algorithms are the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore (BM) algorithm and both of them work in linear time and linear space. Hence, designing a string matching algorithm with linear time cost and simultaneously constant space are interesting and usually sophisticated in the string matching problem. In recent years, many constant space and linear time string matching algorithms were proposed and most of them are based on the KMP algorithm. In this thesis, we perform a review of three recent constant space string matching algorithms: the MaxSuffix-Matching algorithm, the Two-Way Pattern matching algorithm and the Sequential-Sampling scheme. We first discuss the KMP algorithm and introduce its preprocessing functions. We then introduce above three constant space algorithms respectively. Finally, we compare their differences between these three algorithms by several different subjects.
APA, Harvard, Vancouver, ISO, and other styles
28

Holloway, James Lee. "Algorithms for string matching with applications in molecular biology /." 1992. http://hdl.handle.net/1957/11518.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Burkhardt, Stefan [Verfasser]. "Filter algorithms for approximate string matching / von Stefan Burkhardt." 2004. http://d-nb.info/972323015/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Li, Zhi-Xiang, and 李志祥. "Profiling and Accelerating String Matching Algorithms in Three Content Security Applications." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/24569832427268462905.

Full text
Abstract:
碩士
國立交通大學
資訊科學系所
93
Network content security has become a critical issue of the Internet. It is shown that the efficiency of string matching algorithms is essential to content processing. The performance of a string matching algorithm is sensitive to the number of patterns, the minimum length of the signature and the character set that the signatures are composed of. This work reviews and profiles some typical algorithms to understand which algorithm is suitable in which situation. The AC algorithm is suitable for LSP=1, the Modified-WM algorithm is suitable for LSP =2 when the pattern set size is smaller than 1,000, the FNPw2 algorithm is suitable for LSP=2 when the pattern set size is larger than 1,000, the Modified-WM algorithm is suitable for LSP=3 and the BG+ algorithm is suitable for LSP 4. Then, these algorithms are implemented on open-source content security applications to observe the performance in practice. The performance improvement is significant if the percentage of string matching processing on total execution time is great. For example, the novel method is five times faster than the original method on the experiment of ClamAV. In addition, these applications are fed with the real and synthetic data. The differences of performance between the real and synthetic data are also compared. The execution time for processing the real data is longer than that for processing the synthetic data due to the character set distribution. Finally, the practical design issues for string matching are also observed in this work.
APA, Harvard, Vancouver, ISO, and other styles
31

Huang, Chun-cheng, and 黃俊程. "High-Performance Parallel Location-Aware Algorithms for Approximate String Matching on GPUs." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/5b596z.

Full text
Abstract:
碩士
國立臺灣師範大學
電機工程學系
105
Approximate string matching has been widely used in many applications, including deoxyribonucleic acid sequence searching, spell checking, text mining, and spam filters. The method is designed to find all locations of strings that approximately match a pattern in accordance with the number of insertion, deletion, and substitution operations. Among the proposed algorithms, the bit-parallel algorithms are considered to be the best and highly efficient algorithms. However, the traditional bit-parallel algorithms lacks the ability of identifying the start and end positions of a matched pattern. Furthermore, acceleration of the bit-parallel algorithms has become a crucial issue for processing big data nowadays. In this paper, we propose two kinds of parallel location-aware algorithms called data-segmented parallelism and high-degree parallelism as means to accelerate approximate string matching using graphic processing units. Experimental results show that the high-degree parallelism on GPUs achieves significant improvement in system and kernel throughputs compared to CPU counterparts. Compared to state-of-the-art approaches, the proposed high-degree parallelism achieves 11 to 105 times improvement. Finally, we develop a web service which allows users to perform approximate string matching on line and deliver all matched substrings with the start and end positions.
APA, Harvard, Vancouver, ISO, and other styles
32

Shih-hsin, Liang, and 梁仕炘. "A K-modular String Matching Algorithm." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/19890631579529954123.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
98
Lots of string matching algorithms has been proposed. Among them, Boyer-Moore is fast but difficult to analyze its time complexity. The naive, or brute force, algorithm is the simplest string matching method. Its time complexity is O(m*n) when searching with a pattern of length m in a text of length n. Although Morris Pratt (MP) and Knuth-Morris-Pratt (KMP) algorithm method are very simple, and their time complexity is O(n), but every character must be checked at lease once. Colussi method, the one most peculiar method, divides its process into two step. The first step, is by using Knuth-Morris-Pratt (KMP) prefix function table, scans from left to right and leaps some characters. The second step searches from right to left on the characters missed by the first step. By this method, it searches faster than KMP method, but the time analysis is not so simple as KMP’s. Therefore we try to find a method which is simple and is easy to analyse. Our method also have two step similar with Colussi’s. The first step is to compare character from left to right every k characters, then the second step, also starting from the left side, is to check the un-compared characters. This method is simple and easy to analyze as KMP’s, but the number of comparisons is lower than KMP’s.
APA, Harvard, Vancouver, ISO, and other styles
33

Liu, Kuei-Wen. "Some Results on Exact String Matching Algorithm." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200814333600.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Hsu, Yu-Ling, and 徐玉玲. "An Efficient Real Scaled String Matching Algorithm." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/44140988949103322076.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
96
Text searching is used on a wide variety of technologies, such as computer science, multimedia library, web search, etc. This thesis proposes how to find all possible patterns that appear under the real scales in text. On our algorithm, we consider two cases of real scales. One of them contains the scale range is greater than or equal to 1. The other case contains the range is less than or equal to 1. By using the Real Scaled Indexing Tree (RSIT), we can determine whether a substring under every scale can match the text or not. Assume that the lengths of text and pattern length are n and m, respectively. The time complexity is O(n^3).
APA, Harvard, Vancouver, ISO, and other styles
35

Liu, Kuei-Wen, and 劉貴文. "Some Results on Exact String Matching Algorithm." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/14472736499752714951.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
96
In this paper, we introduce two algorithms for stringology. One is to solve the exact string matching problem and the other is to find the string cycle (period of a string). In the exact string matching problem, we are given two strings T=t1t2...tn and P=p1p2...pm. We are asked to find all occurrences of P in T. Our searching method is based upon a single character rule. Consider a location i in P. Suppose that pi is aligned with tj and pi <> tj. We then must move P in such a way that some pk=tj will be aligned with tj. In the thesis, we propose a method to find the optimal location i. We also modified a string matching approach, named wide window approach, which divides the text into n/m overlapping windows of size 2m-1. In the windows, the approach attempts m possible occurrence positions in parallel. It firstly searches pattern suffixes from middle to right with the modified convolution method, shifts the window directly when it fails, otherwise, scans the corresponding prefixes backward with the modified convolution method again. For the period of string problem, we are given a string T and we are asked to find the periods of T. Our algorithm is based upon the bit parallel approach.
APA, Harvard, Vancouver, ISO, and other styles
36

Lo, Shiao Wen. "A Review of String Matching Algorithm without Tables." 2007. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2706200714045300.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Lo, Shiao-Wen, and 羅少文. "A Review of String Matching Algoritms without Tables." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/19999616183376766508.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
95
General string matching algorithms for string matching problem work in linear time and linear space. The two well-known famous of them are Knuth-Morris-Pratt (KMP) algorithm and Boyer-Moore (BM) algorithm. However, some string matching algorithms presented in recent years work in linear time and just need a small additional memory space. If such algorithms can work in linear time and simultaneously constant space, we call those algorithms as constant space string matching algorithms. In this thesis, we perform a review about constant space string matching algorithms. We will introduce five constant space string matching algorithms: the Constant-space matching for a specific pattern, the MaxSuffix Matching algorithm, the Two Way Matching algorithm, Crochemore-Perrin algorithm and Galil-Seiferas algorithm. Finally, we also discuss the results about them.
APA, Harvard, Vancouver, ISO, and other styles
38

Chou, Chih-Chieh, and 周智杰. "A Fast String Matching Algorithm Based On Network Processor." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/25687177452147145028.

Full text
Abstract:
碩士
國立清華大學
資訊工程學系
92
With the development of high-speed network, uni-processor incapable of affording a large number of traffic can not satisfy what is required by high performance network equipments. It can help improve the performances. However, most network equipments such as network intrusion detection or protection systems need to inspect the packet content and compare with its own signatures, and react appropriately. Thus, the need for a faster algorithm for multi-pattern searching becomes more and more urgent. It is the most crucial factor concerned with the network performance.   Take Snort [18], a popular open-source network intrusion detection system as an example, it uses the algorithm called “A fast Algorithm For Multi-Pattern Searching” proposed by Sun Wu, and Udi Manber 1994(denoted as WuM) [6]. The WuM algorithm can compare the input text with the whole patterns concurrently, but the length of the shortest pattern (denoted as LSP) can not be less than the block-size usually equal to 2. If LSP is less than the block-size, in snort, it compares the input text with its own signatures sequentially using the BM algorithm which is proposed by Boyer R. S., and J. S. Moore 1977 [1]. Consequently, the throughput is limited by matching patterns, and has the poor performance.   The purpose of the thesis is to improve the performance of WuM algorithm, and to handle the length of the shortest pattern less than the block size. Therefore, we can use the only one algorithm to perform multi-pattern searching even the LSP is equal to 1. We concentrate on typical searches rather than on worst-case behavior. It makes sense in most network devices which need to compare incoming packets to its own patterns. Malicious packets in network are always less than legal packets.   To verify the performance of FWM, we use the signatures defined by snort as the patterns, and use the packets downloaded from DEFCON [17] as the input to run the simulation. Finally, we got 15% - 25% performance improvement.
APA, Harvard, Vancouver, ISO, and other styles
39

Shieh, Yi-Kung. "An Improved Approximate String Matching Algorithm Based upon the Boyer-Moore Algorithm." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200816121600.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Shieh, Y. K., and 謝一功. "An Improved Approximate String Matching Algorithm Based upon the Boyer-Moore Algorithm." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/66121462520038468689.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
96
In this thesis, we discuss the exact string matching problem and approximate string matching problem. We avoid using a brute-force algorithm to solve the string matching problem. We improve the Bad Character Rule of Boyer and Moore Algorithm and compare it with the Horspool Algorithm. And we find that the improved method include the information of the Horspool Algorithm. Therefore, we combine the improved method with the Horspool Algorithm. The combinative method has a better efficiency in searching phase.
APA, Harvard, Vancouver, ISO, and other styles
41

Chu, Jia-Han, and 朱家漢. "Efficient Pattern Matching Algorithms for Variable-Length and Tolerant Strings." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/07175382319267136974.

Full text
Abstract:
碩士
國立臺灣海洋大學
資訊工程學系
92
In this thesis, a novel algorithm for searching common segments in multiple DNA or animo acid sequences is designed. To improve efficiency in pattern searching, combination of hashing encoding, quick sorting and ladderlike stepping and/or interval jumping techniques are applied. Since multiple sequence alignment of DNA or animo acid sequences from the giant genomic database is usually time consuming, we develop a three-phase methodology to search common sub-strings and reduce the time complexity in the comparison. In the first coding phase, DNA or animo acid sequences are transformed into a numerical space set. Subsequently, the quick sort algorithms are employed in the second sorting stage to reorder the encoded data. In the last searching phase, ladderlike stepping and interval jumping rules are proposed to increase efficiencies of numerical comparison. In addition, uniform and bitwise interval segmentation techniques are applied prior to interval jumping procedures. The segmenting methodologies are designed according to the length of searching pattern, and the proposed ladderlike searching algorithms provide improved performance. Experimental results show that the algorithms are capable of reducing time complexity from O(mLi(Li-m+1)+mLj(Lj-m+1)) to O(|Ii|+|Ij|). In order to provide the right pair comparing order for multiple DNA or animo acid sequences, several strategies based on statistics are designed for several classified sequences. In addition to the analysis of comparing order, based on voting mechanism, the representation and tolerant features of sub-string among sequences are also taken into consideration in this thesis. Keywords: Stepping, Interval Jumping, Hashing Encoding, Voting Algorithm
APA, Harvard, Vancouver, ISO, and other styles
42

Yen, Chen-Cheng. "A Combination of Berry Ravindran Algorithm and Two Way Algorithm for Exact String Matching." 2007. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200814134200.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Yen, Chen-Cheng, and 顏成哲. "A Combination of Berry Ravindran Algorithm and Two Way Algorithm for Exact String Matching." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/43753477020737174183.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
96
The exact string matching problem is defined as follows: We are given a text and a pattern , . Our purpose is to find all occurrences of in . It is a classical and important problem in computer science. In this thesis, we propose a new approach to solve this problem. Two Way algorithm and the Berry Ravindran algorithm are good algorithms which find all occurrences of in , but some problems still exist. If we consider the situation that the last two text characters of the window with size m are the same with that of , Berry Ravindran algorithm is not efficient in this situation. If a mismatch often occurs within v part of , Two Way algorithm is not efficient in this situation. So we combine Two Way algorithm and Berry Ravindran algorithm to slove this problem Finally; we discuss the results of Two Way algorithm, Berry Ravindran algorithm and our approach.
APA, Harvard, Vancouver, ISO, and other styles
44

Chiu, Chi-Chang, and 邱啟彰. "Design a Two-Way Fast String-Matching Algorithm for Intrusion Detection System." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/86518049918252358632.

Full text
Abstract:
碩士
義守大學
資訊工程學系碩士班
96
As proliferation of Internet applications increases, security becomes a serious problem within network solutions. Intrusion detection systems (IDS) have become widely recognized as the most effective ways for identifying and thwarting all kinds of known network attacks. Because most of the known attacks can be represented with strings or combinations of multiple substrings, string matching is one of the most critical components in IDS. String matching must check every byte of every packet to see if it matches one of a set of ten thousand suspicious strings. As a result, string matching has become the bottleneck in IDS as network speeds grow into the tens of gigabits/second. An efficient string matching algorithms are therefore important for identifying these packets at the line rate. In this study, we propose a two-way parallel structure to further improve the performance of the Aho-Corasick-based string matching algorithm. The proposed string matching algorithm will be implemented by modifying the source code of Snort. Our results showed that two-way Aho-Corasick-based string matching algorithm is superior to other algorithms, especially in detecting network packets with large data payload. Besides, multiway parallel structure can be developed based on the concept of this two way parallel structure, and then be expected to apply to a multiple Gbps intrusion detection system.
APA, Harvard, Vancouver, ISO, and other styles
45

陳建麟. "A Parallel String Matching Algorithm for High Speed Network Intrusion Detection System." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/57773162720790226759.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Yang, Chao-Yuan, and 楊朝淵. "A New Algorithm for Solving the String Matching with k-Mismatches Problem." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/53970156684807658094.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
93
String matching with k-mismatches problem is defined as follows. Given a pattern P with length m, a text T with length n and an integer k, string matching with k-mismatches problem wants to find all occurrences of P on T with k maximal number of mismatches allowable. In this thesis, we propose a new algorithm to solve the problem. The algorithm uses the Kangaroo method mentioned in the paper by Galil and Giancarlo and that by Landau and Vishkin. Our algorithm and the Kangaroo Method are distinct from the number of steps to find mismatches. Each step in our algorithm may find two new mismatches or at least one new mismatch in some case. We can compare our algorithm to the Kangaroo Method. Each step in the Kangaroo Method only finds a mismatch. Another difference between our algorithm and the Kangaroo method is the number of positions on the text which we need to verify if those positions are occurrences of the pattern with at most k mismatches. Our algorithm would consider that some positions on the text are impossible to be occurrences of the pattern and then we don’t need to waste any time to check whether those locations are matches with at most k mismatches.
APA, Harvard, Vancouver, ISO, and other styles
47

Liao, Chung-Yu, and 廖重淯. "A Novel Parallel Dual-Character String Matching Algorithm on Graphical Processing Units." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/mn6gb5.

Full text
Abstract:
碩士
國立臺灣師範大學
電機工程學系
105
Aho-Corasick algorithm has been widely used in network intrusion detection system to inspect network packets against thousands of attack patterns. To improve the performance of network intrusion detection systems, many variations of Aho-Corasick algorithm are proposed to accelerate multiple string matching on GPUs or dedicated hardware. One of the proposed variations is to increase the number of characters that are processed per cycle. However, increasing the number of characters processed per cycle will encounter two major problems. The first problem is the input alignment problem while the second problem is the large increase of memory required for storing the state transition table. The two problems cause the multi-character approach become less feasible. In this thesis, we propose a novel parallel dual-character string matching algorithm on graphical processing units. In order to solve the two major problems, the proposed algorithm presents a new state machine to solve the input alignment problem, and compresses the state transition table using perfect hashing to solve the memory explosion problem. The experimental results show that the proposed algorithm is superior to the state-of-the-art approaches in terms of performance and memory requirements.
APA, Harvard, Vancouver, ISO, and other styles
48

Liao, Kuei-Hui, and 廖桂慧. "Solving the Exact String Matching Problem by Using the 2-Substring Algorithm." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/46723272181241712677.

Full text
Abstract:
碩士
國立暨南國際大學
生物醫學科技研究所
95
String matching is a very important component of many problems, such as data compression, search engine, speech recognition, virus detection, computational biology, and so on. There are many efficient method proposed to solve the string matching problem. For example, KMP algorithm、Boyer-and-Moore algorithm. In this thesis, we proposed a method to solve the exact string matching problem. We proposed a rule, called the 2-substring rule, which avoids the brute force method and can be used to solve the problem. We know the time complexity of KMP algorithm is better than Boyer-and-Moore algorithm. But the practice of Boyer-and-Moore algorithm is faster than KMP algorithm. We implement our method in practice. Our method not only is faster than KMP algorithm, Boyer and Moore algorithm, Horspool algorithm and Quick Search algorithm but also is earlier to implement.
APA, Harvard, Vancouver, ISO, and other styles
49

Lin, Sheng-Yuan, and 林聖淵. "A High Performance Parallel Algorithm for Approximate String Matching on Multi-core Processor." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/96412144907123635479.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Zheng, Yi-Jun, and 鄭伊君. "A sub-linear time string matching algorithm with Bloom filters acceleration: Design, Implementation and Evaluation." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/25684296002946637638.

Full text
Abstract:
碩士
國立交通大學
資訊科學與工程研究所
94
Many network security applications heavily rely on string matching to detect malicious intrusions, viruses, spam, and so on. A software-based implementation may not meet the performance requirement of high-speed applications due to intensive computation and frequent memory accesses. A hardware solution to take advantage of hardware parallelism is a promising trend to inspect the packet payload at line rate. In this work, we propose an innovative memory-based architecture using Bloom filters to realize a sub-linear time algorithm that can effectively process multiple characters simultaneously. The two key ideas to realize the sub-linear time algorithm in this architecture are (1) replacing the slow table lookup in the external memory with simultaneous queries to several Bloom filters and (2) designing a non-blocking verification interface to keep the worst-case performance in linear time. The proposed architecture is verified in both behavior simulation in C and timing simulation in HDL. The simulation result shows that the throughput is nearly 10 Gbps for Windows executable files and 600 Mbps in the worst case.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography