Dissertations / Theses on the topic 'String matching algorithms'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'String matching algorithms.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Cheng, Lok-lam, and 鄭樂霖. "Approximate string matching in DNA sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B29350591.
Full text黎少斌 and Shiao-bun Lai. "Trading off time for space for the string matching problem." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1996. http://hub.hku.hk/bib/B31214216.
Full textLai, Shiao-bun. "Trading off time for space for the string matching problem /." Hong Kong : University of Hong Kong, 1996. http://sunzi.lib.hku.hk/hkuto/record.jsp?B18061795.
Full textPockrandt, Christopher Maximilian [Verfasser]. "Approximate String Matching : Improving Data Structures and Algorithms / Christopher Maximilian Pockrandt." Berlin : Freie Universität Berlin, 2019. http://d-nb.info/1183675879/34.
Full textKlaib, Ahmad. "Exact string matching algorithms for searching DNA and protein sequences and searching chemical databases." Thesis, University of Huddersfield, 2014. http://eprints.hud.ac.uk/id/eprint/24266/.
Full textÃvila, Ricardo Lima Feitosa. "Emprego de tÃcnicas de prÃ-processamento textual e algoritmos de comparaÃÃo como suporte à correÃÃo de questÃes dissertativas: experimentos, anÃlises e contribuiÃÃes." Universidade Federal do CearÃ, 2013. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=10572.
Full textEsta dissertaÃÃo apresenta um estudo de tÃcnicas que podem ser empregadas como apoio para a correÃÃo de questÃes dissertativas com base na adaptaÃÃo de algoritmos de comparaÃÃo textual combinados a tÃcnicas de prÃ-processamento de textos. O principal desafio na concepÃÃo de uma ferramenta para este tipo de aplicaÃÃo à a ambiguidade da linguagem natural. Para analisar situaÃÃes de correÃÃo de questÃes subjetivas, foram efetuados testes com esses algoritmos, tendo-se desenvolvido uma ferramenta para tal propÃsito. Confrontando respostas de alunos ao padrÃo de resposta de questÃes propostas em provas subjetivas, foram analisados o desempenho individual dos algoritmos e de um conjunto de tÃcnicas de prÃ-processamento que sÃo encontrados na literatura, de maneira isolada e combinada. Buscando contornar situaÃÃes especÃficas de falso negativo e falso positivo, foram propostas algumas tÃcnicas auxiliares como contribuiÃÃo deste trabalho. ApÃs a anÃlise dos experimentos realizados, os resultados de Ãndice de similaridade entre respostas indicam o uso da soluÃÃo como suporte a correÃÃo de questÃes discursivas, podendo, ainda, ser aplicado na detecÃÃo de plÃgio e ser integrado a um ambiente virtual de ensino e aprendizagem.
This master thesis presents a study of techniques used as support for a correction of essay questions based in an adaptation of string-matching algorithms combined with text preprocessing techniques. The main challenge to design a tool like this is an ambiguity of natural language. To analyze a correction of subjective questions, tests were performed with these algorithms, and a tool have been developed for this purpose. Comparing student responses with response pattern of questions proposed in subjective tests, we analyzed the performance of individual algorithms and a set of pre-processing techniques that are found in the literature, in isolation and combined. Seeking to neutralize specific situations of false negative and false positive, some techniques have been proposed as auxiliary contribution of this work. After analyzing the experiments, the results of similarity index between responses indicate the use of the solution to support the correction of essay questions, and may also be applied in the detection of plagiarism and be integrated to a learning management system.
Toth, Róbert. "Přibližné vyhledávání řetězců v předzpracovaných dokumentech." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236122.
Full textNeto, Domingos Soares. "Filtros para a busca e extração de padrões aproximados em cadeias biológicas." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-19102009-002745/.
Full textThis thesis deals with computational formulations and algorithms for the extraction and search of patterns from biological strings. In particular, the present text focuses on the following problems, both considered under Hamming and Levenshtein distances: 1. How to find the positions where a given pattern approximatelly occurs in a given string; 2. How to extract patterns which approximatelly occurs in a certain number of strings from a given set. The first problem, for which there are many polinomial time algorithms, has been receiving a lot of attention since the 60s and entered a new era of discoveries with the advent of computational biology, in the 80s, and the widespread of the Internet and its search engines: both events brought new challenges to be faced by virtue of the large volume of data usually held by such applications and its time constraints. The second problem, much younger, is very challenging due to its computational complexity, approximation hardness and the size of the input data usually held by the most common applications. This problem is also very interesting due to its potential of application. In this work we show computational formulations, algorithms and data structures for those problems. We cover the bit-parallel algorithms of Myers, Baeza-Yates-Gonnet and the Sagots algorithms for patterns extraction. We also cover here the oustanding versatile suffix tree, its generalised version, and a similar data structure: the suffix array. A significant part of the present work focuses on q-gram based filters designed to solve the approximate pattern search problem. More precisely, we cover the filter algorithms of Ukkonen, Jokinen-Ukkonen and Burkhardt-Kärkkäinen, among others.
Dubois, Simon. "Offline Approximate String Matching forInformation Retrieval : An experiment on technical documentation." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-22566.
Full textFrey, Jeffrey Daniel. "Finding Song Melody Similarities Using a DNA String Matching Algorithm." Kent State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=kent1208961242.
Full textMomeninasab, Leila. "Design and Implementation of a Name Matching Algorithm for Persian Language." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-102210.
Full textSilva, Júnior José Bonifácio da. "Paralelização em CUDA do algoritmo Aho-Corasick utilizando as hierarquias de memórias da GPU e nova compactação da Tabela de Transcrição de Estados." Universidade Federal de Sergipe, 2017. https://ri.ufs.br/handle/riufs/3353.
Full textUm Sistema de Detecção de Intrusão (IDS) necessita comparar o conteúdo de todos os pacotes que chegam na interface da rede com um conjunto de assinaturas que indicam possíveis ataques, tarefa esta que consome bastante tempo de processamento da CPU. Para amenizar esse problema, tem-se tentado paralelizar o motor de comparação dos IDSs transferindo sua execução da CPU para a GPU. Esta dissertação tem como objetivo fazer a paralelização dos algoritmos de comparação de strings Força-Bruta e Aho-Corasick e propor uma nova compactação da Tabela de Transição de Estados do algoritmo Aho-Corasick a fim de possibilitar o uso dela na memória compartilhada e acelerar a comparação de strings. Os dois algoritmos foram paralelizados utilizando a plataforma CUDA da NVIDIA e executados nas memórias da GPU a fim de possibilitar uma análise comparativa de desempenho dessas memórias. Inicialmente, o algoritmo AC mostrou-se mais veloz do que o algoritmo Força-Bruta e por isso seguiu-se para sua otimização. O algoritmo AC foi compactado e executado de forma paralela na memória compartilhada, alcançando um ganho de desempenho de 15% em relação às outras memórias da GPU e sendo 48 vezes mais rápido que sua versão na CPU quando os testes foram feitos com pacotes de redes reais. Já quando os testes foram feitos com dados sintéticos (dados menos aleatórios) o ganho chegou a 73% e o algoritmo paralelo chegou a ser 56 vezes mais rápido que sua versão serial. Com isso, pode-se perceber que o uso da compactação na memória compartilhada torna-se uma solução adequada para acelerar o processamento de IDSs que necessitem de agilidade na busca por padrões.
Yahiaoui, Said. "Algorithmes et applications pour la coloration et les alliances dans les graphes." Thesis, Lyon 1, 2013. http://www.theses.fr/2013LYO10274.
Full textThis thesis investigates the algorithmic aspects and applications of two graph problems, namely, colorings and alliances. In the first part, we focus on two variants of the proper vertex coloring, the Grundy coloring and the strict strong coloring. We start by the study of Grundy number for regular graphs. We give a sufficient condition for d-regular graphs with sufficiently large girth to have Grundy number equals d + 1. Then, using graph homomorphism, we obtain a necessary and sufficient condition for d-regular graphs to have Grundy number at least k. Moreover, we characterize cubic graphs (3-regular) for which the Grundy number is d + 1, and present a linear-time algorithm to determine the Grundy number of any arbitrary cubic graph. Subsequently, based on the strict strong coloring, we present an approach for the problem of matching labeled trees. Using this coloring, we propose a new algorithm to deterministically decompose a tree into small components. This leads to an efficient algorithm to measure an accurate distance between unrooted unordered labeled trees. The second part is devoted to the alliances in graphs. We first propose a linear-time self-stabilizing algorithm for the minimal global offensive alliance set problem, under the synchronous distributed scheduler. Then, we give the first self-stabilizing algorithm for the minimal global powerful alliance set problem in arbitrary graphs. Moreover, we show how this algorithm can be adapted to find the minimal global powerful k-alliance and the minimal weighted global powerful alliance sets. We prove that all these algorithms converge in polynomial-time under the unfair distributed scheduler. Finally, based on the structural properties of the offensive alliance, we propose a solution to decentralize the signaling protocol SIP. This enables SIP applications in mobile ad hoc networks
Moradi, Arvin. "Smart Clustering System for Filtering and Cleaning User Generated Content : Creating a profanity filter for Truecaller." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-124408.
Full textDenna avhandling fokuserar på att utreda och skapa en applikation för filtrering av användargenererat innehåll. Metoden gick ut på att undersöka hur svordomar samt rasistiska uttryck används och manipuleras för att undgå filtrerings processer i liknande system. Fokus gick även ut på att studera olika algoritmer för att få denna process att vara snabb och effektiv, dvs kunna bearbeta så många namn på kortast möjliga tid. Detta beror på att kunden i detta sammanhang får in miljontals nya uppladdningar varje dag, som måste filtreras innan använding. Resultatet visar att applikationen upptäcker svordomar i olika former. Data från kundens databas användes också för test syfte, och resultatet visade att applikationen även fungerar i praktiken. Prestanda testet visar att applikationen har en snabb exekveringstid. Detta kunde vi se genom att estimera den till en linjär funktion med hänsyn till tid och antal namn som matats in. Slutsatsen blev att filtret fungerar och upptäcker svordomar som inte upptäckts tidigare i kundens databas. För att stärka besluten i processen kan man i framtida uppdateringar införa tredje parts tjänster, eller ett web interface där man manuelt kan styra beslut. Exekverings tiden är bra och visar att 10 miljoner namn kan bearbetas på cirka 6 timmar. I framtiden kan man parallellisera förfrågningarna till databasen så att flera namn kan bearbetas samtidigt.
Alex, Ann Theja. "Local Alignment of Gradient Features for Face Photo and Face Sketch Recognition." University of Dayton / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1353372694.
Full textMartin, Benjamin. "Analyse de structures répétitives dans les séquences musicales." Thesis, Bordeaux 1, 2012. http://www.theses.fr/2012BOR14711/document.
Full textThe work presented in this thesis deals with repetitive structure inference from audio signal using string matching techniques. It aims at proposing and evaluating inference algorithms from a formal study of notions of similarity and repetition in music.We first present a method for representing audio signals by symbolic strings. We introduce alignment tools enabling similarity estimation between such musical strings, and evaluate the application of these tools for automatic cover song identification. We further adapt a bioinformatics indexing technique to allow efficient assessments of music similarity in large-scale datasets. We then introduce several specific repetitive structures and use alignment tools to analyse these repetitions. A first structure, namely the repetition of a chosen segment, is retrieved and evaluated in the context of automatic assignment of missingaudio data. A second structure, namely the major repetition, is defined, retrieved and evaluated regarding expert annotations, and as an alternative indexing method for cover song identification.We finally present the problem of repetitive structure inference as addressed in literature, and propose our own problem statement. We further describe our model and propose an algorithm enabling the identification of a hierarchical music structure. We emphasize the relevance of our method through several examples and by comparing it to the state of the art
Nosek, Ondřej. "Hardwarová akcelerace algoritmu pro hledání podobnosti dvou DNA řetězců." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2007. http://www.nusl.cz/ntk/nusl-236882.
Full textKuang, Hsueh Jui, and 薛睿光. "A Study on String-Matching Algorithms." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/45412037298316276441.
Full text大同大學
應用數學研究所
91
In this thesis, we give experimental results on the performance of several popular string-matching algorithms. Meanwhile the behaviors of the costs of some variant Boyer-Moore algorithms are investigated. The costs considered include the numbers of character inspections, and the total time used.
Shih, Hung-Te, and 施鴻德. "A Survey of String-Matching Algorithms." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/18193520008520957794.
Full text逢甲大學
應用數學所
92
We study the algorithms for pattern matching, i.e., searching a query substring in a given string, which appears in the text processing, image processing, and other applications. The algorithms cover the Direct String-Matching algorithms, the Rabin-Karp algorithm, the Finite Automata, the Knuth-Morris-Pratt algorithm, the Boyer-Moore algorithm, etc. The time complexity, worst-case performance and the criteria for choosing these algorithms are also studied.
Chen, Kuei-Hao, and 陳奎昊. "Improved Algorithms for Exact String Matching Problems." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/64377810416197972394.
Full text國立暨南國際大學
資訊工程學系
101
In this dissertation, we consider two problems: the exact string matching problem and its variation, the exact circular string matching. The exact string matching problem is: Given two strings, a text $T$ of length $n$ and a pattern $P$ of length $m$, find all occurrences of $P$ in $T$. We propose a strategy to analyze the average performance of the reverse factor algorithm. The analysis is based on the assumption that the text is very long as compared to the length of the pattern, and each symbol in the text is drawn uniformly from a random source with $sigma$ symbols. Our analysis uses only elementary techniques in probability theory and avoids applying complicated combinatorics in stringology. We also propose a new algorithm for exact string matching problem under the name Improved BNDM algorithm. The Improved BNDM algorithm uses the $q$-gram filtering technique to speed up the performance of the Turbo BNDM algorithm. The time complexity of the Improved BNDM algorithm achieves $mathcal{O}(n)$ in the worst case and $mathcal{O}(nlog_sigma m/m)$ in the average case where $sigma$ is the alphabet size. It is optimal in both worst case and average case. Another problem we discuss in this dissertation is the exact circular string matching problem. Given a string $P=p_1p_2cdots p_m$, let a string $P^{(i)}=p_ip_{i+1}cdots p_mp_1cdots p_{i-1}$. The exact circular string matching problem is: Given two strings, $T$ of length $n$ and $P$ of length $m$, find all occurrences of $P^{(i)}$ in $T$ for $1leq i leq m$. We propose two algorithms that perform searching of a circular string in the text using bit-parallel technique. Our algorithms use only the composition of bitwise-logical operations and basic arithmetic operations, and apply this technique to solve the problem. These algorithms are given names CSBNDM and $ ext{CSBNDN}_q$, respectively. We give several experiments to verify that they have good performance for random strings and DNA sequence.
Lu, Chia Wei, and 呂嘉維. "Efficient Exact and Approximate String Matching Algorithms." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/49922113552090356808.
Full text國立清華大學
資訊工程學系
102
In this thesis, we first propose two algorithms for exact string matching problem, which aims to find all the positions i's in a given text where a given pattern occurs. Our algorithms find an optimal selective comparing order of the characters of the pattern so that we could have a better performance in the searching phase. To find the optimal comparing order, we adopt the branch and bound approach. Moreover, our proposed algorithm can be combined with other existing exact string matching algorithms to improve the searching efficiency. The experimental results show that our algorithms indeed have the smallest number of character comparisons and are also efficient in time as compared with other existing exact string matching algorithms. Second, we propose a new filtration algorithm, as well as a hybrid filtration strategy, to efficiently solve the approximate string matching problem (also called the k-difference problem), which aims to find all the positions i's in a given text such that there exists a substring of the text ending at position i whose edit distance from a given pattern is less than or equal to a given error bound k. Our experimental results on simulated datasets of DNA sequences show that when compared with other filtration algorithms, our filtration algorithm has better performance on the efficiency to filter out those positions of the text at which the pattern does not occur approximately. Moreover, our hybrid filtration strategy further improves the effectiveness of our filtration algorithm. Third, we propose a progressive approach to solve the DNA resequencing problem which is defined as follows: We are given an unknown DNA sequence X and a known reference sequence R. Our task is to see whether X and R are similar or not. The present popular approach is to break up X into subsequences by the next generation sequencing (NGS) technologies, called reads. We then map the reads of X onto R with a suitable error bound. However, if the similarity between X and R is not very high (<95%), there would be many reads unmapped, and we then cannot obtain the mutations inside the unmapped regions. One can use a large error bound to increase the number of reads mapped. But it is not a good solution because increasing error bound will also increase the probability of false positive mapping. Our approach uses a small error bound and to increase the number of reads mapped, our approach modifies R each time after the reads are mapped. Thus our approach is a progressive approach. Compared with other available tools, our approach allows us to be able to map more reads to the reference sequence. In our simulated experiments, we also show the high correctness of our mapping algorithm.
Lu, Chia-Wei. "String Matching Algorithms Based upon the Uniqueness Property and an Approximate String Matching Algorithm Based upon the Candidate Elimination Method." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200815183100.
Full textLu, Chia-Wei, and 呂嘉維. "String Matching Algorithms Based upon the Uniqueness Property and an Approximate String Matching Algorithm Based upon the Candidate Elimination Method." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/17632385982720975496.
Full text國立暨南國際大學
資訊工程學系
96
In this thesis, we consider two problems, exact string matching and approximate string matching problem. The exact string matching problem is to determine all of the locations of a pattern string P appearing in a text string T. We first point out a uniqueness property of a given pattern. We then propose four algorithms to solve the exact string matching problem based upon the uniqueness property. Experimental results showed that three of our algorithms are faster than the KMP and Boyer-Moore algorithms. The approximate string matching problem is defined as follows: Given a text string T, a pattern string P and an error bound k, find all substrings of T whose edit distances with P are smaller than or equal to k. We give a method to eliminate candidate locations in text T as there can be no substring S starting from those locations such that the edit distance between S and pattern P is smaller than or equal to k. Our method is simple to implement. Experimental results show that our method is effective, especially in the case when we perform natural language searching.
Lin, Po-Ching, and 林柏青. "Accelerating String Matching Algorithms for Deep Packet Inspection." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/05808898005943344191.
Full text國立交通大學
資訊科學與工程研究所
96
An increasing number of network devices inspect the packet content for various signatures of security violation. Accelerating the process, namely deep packet inspection (DPI), can be in two aspects: algorithm and packet flow. This work focuses on the former. We first review existing string matching algorithms for DPI to see what work has been done and what should be addressed. In the review, we indicate the pros and cons of each algorithm, and a scalable, efficient design to meet the different characteristics of the pattern set in various applications is still a challenge. We also profile the performance of string matching in practical applications for various numbers, lengths and character distributions of the patterns to realize the key factors in efficient string matching. The results shed light on which approach is appropriate for each DPI application. Considering the above study, we design a hardware-based string-matching engine that can exploit algorithmic heuristics for acceleration, and a hybrid software method for applications with a large pattern set, say anti-virus applications. However, it is insufficient to accelerate some DPI applications, such as Web filter, with string matching alone. We thus present a probabilistic approach, namely the early decision algorithm to accelerate the classification in Web filtering. This work features a number of observations and contributions. First, the review and profiling study several major DPI applications, including intrusion detection, anti-virus and Web filtering, unlike most existing work that focuses on intrusion detection. The study shows string matching is not so critical in intrusion detection, and its development for DPI should also cover other applications more. The profiling indicates memory access (thus also cache locality), verification frequency and shift distance of the search window are major factors that affect the performance. Second, the presented hardware architecture of string matching, namely Bloom Filter Accelerated Sub-linear Time (BFAST), can exploit algorithmic heuristics with Bloom filters to scan the content in sub-linear time. The bad-block heuristic in the BFAST has better performance than that from the conventional table due to the precise information kept in the Bloom filters. We also propose practical techniques to handle the worst case, and the theoretical method to achieve linear worst-case time. The simulation shows that the peak throughput of a single match engine can achieve up to 9.34 Gbps for 16,384 patterns with an eight-character block in the heuristics. A hybrid method for virus scanning is also presented since the hardware approach may be not applicable. We partition the patterns in ClamAV into long and short ones. An algorithm enhanced from the Wu-Manber (WM) algorithm, namely the Backward-Hashing algorithm (BH), is responsible for only long patterns to lengthen the average skip distance, while the Aho-Corasick algorithm scans for only short patterns to reduce the automaton sizes. The former utilizes the bad-block heuristic to exploit long shift distance and reduce the verification frequency, so it is much faster than the original WM implementation in ClamAV. The latter increases the AC performance by around 50% due to better cache locality. Besides speeding up string matching, we present a simple, but effective early decision algorithm to accelerate the filtering process by examining only part of the Web content. The algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content should belong to a banned or an allowed category. The experiments show the algorithm can examine only around one-fourth of Web content on average, while the accuracy remains fairly good. This algorithm can complement other Web filtering approaches to filter the Web content with high efficiency. Keywords: string matching, algorithm, deep packet inspection
Huang, Lan-Ya. "An Exact String Matching Algorithms Using Hashing Scheme." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200814285000.
Full textHuang, Lan-Ya, and 黃蘭雅. "An Exact String Matching Algorithms Using Hashing Scheme." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/16933370591311289241.
Full text國立暨南國際大學
資訊管理學系
96
In this thesis, we consider how to solve the exact string matching problem. The exact string matching problem is to find all locations of a pattern string P in a text string T. In general, string matching algorithm problem work in linear time and linear space. The two well-known of them are Knuth-Morris-Pratt (KMP) algorithm and Boyer-Moore (BM) algorithm. We will use the hashing scheme to solve the exact string matching problem. Our method is simple to implement, and our algorithm work in constant space. Experimental shows that our algorithm is better than Brute Force algorithm and KMP algorithm.
Chen, Hong-Chieh, and 陳宏杰. "A Review of String Matching Algorithms with Constant Space." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/77479923915180499277.
Full text國立暨南國際大學
資訊管理學系
94
The string matching problem is to find all occurrences of a given pattern string P in a text string T. It is a classical and important problem in computer science and two well-known string matching algorithms are the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore (BM) algorithm and both of them work in linear time and linear space. Hence, designing a string matching algorithm with linear time cost and simultaneously constant space are interesting and usually sophisticated in the string matching problem. In recent years, many constant space and linear time string matching algorithms were proposed and most of them are based on the KMP algorithm. In this thesis, we perform a review of three recent constant space string matching algorithms: the MaxSuffix-Matching algorithm, the Two-Way Pattern matching algorithm and the Sequential-Sampling scheme. We first discuss the KMP algorithm and introduce its preprocessing functions. We then introduce above three constant space algorithms respectively. Finally, we compare their differences between these three algorithms by several different subjects.
Holloway, James Lee. "Algorithms for string matching with applications in molecular biology /." 1992. http://hdl.handle.net/1957/11518.
Full textBurkhardt, Stefan [Verfasser]. "Filter algorithms for approximate string matching / von Stefan Burkhardt." 2004. http://d-nb.info/972323015/34.
Full textLi, Zhi-Xiang, and 李志祥. "Profiling and Accelerating String Matching Algorithms in Three Content Security Applications." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/24569832427268462905.
Full text國立交通大學
資訊科學系所
93
Network content security has become a critical issue of the Internet. It is shown that the efficiency of string matching algorithms is essential to content processing. The performance of a string matching algorithm is sensitive to the number of patterns, the minimum length of the signature and the character set that the signatures are composed of. This work reviews and profiles some typical algorithms to understand which algorithm is suitable in which situation. The AC algorithm is suitable for LSP=1, the Modified-WM algorithm is suitable for LSP =2 when the pattern set size is smaller than 1,000, the FNPw2 algorithm is suitable for LSP=2 when the pattern set size is larger than 1,000, the Modified-WM algorithm is suitable for LSP=3 and the BG+ algorithm is suitable for LSP 4. Then, these algorithms are implemented on open-source content security applications to observe the performance in practice. The performance improvement is significant if the percentage of string matching processing on total execution time is great. For example, the novel method is five times faster than the original method on the experiment of ClamAV. In addition, these applications are fed with the real and synthetic data. The differences of performance between the real and synthetic data are also compared. The execution time for processing the real data is longer than that for processing the synthetic data due to the character set distribution. Finally, the practical design issues for string matching are also observed in this work.
Huang, Chun-cheng, and 黃俊程. "High-Performance Parallel Location-Aware Algorithms for Approximate String Matching on GPUs." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/5b596z.
Full text國立臺灣師範大學
電機工程學系
105
Approximate string matching has been widely used in many applications, including deoxyribonucleic acid sequence searching, spell checking, text mining, and spam filters. The method is designed to find all locations of strings that approximately match a pattern in accordance with the number of insertion, deletion, and substitution operations. Among the proposed algorithms, the bit-parallel algorithms are considered to be the best and highly efficient algorithms. However, the traditional bit-parallel algorithms lacks the ability of identifying the start and end positions of a matched pattern. Furthermore, acceleration of the bit-parallel algorithms has become a crucial issue for processing big data nowadays. In this paper, we propose two kinds of parallel location-aware algorithms called data-segmented parallelism and high-degree parallelism as means to accelerate approximate string matching using graphic processing units. Experimental results show that the high-degree parallelism on GPUs achieves significant improvement in system and kernel throughputs compared to CPU counterparts. Compared to state-of-the-art approaches, the proposed high-degree parallelism achieves 11 to 105 times improvement. Finally, we develop a web service which allows users to perform approximate string matching on line and deliver all matched substrings with the start and end positions.
Shih-hsin, Liang, and 梁仕炘. "A K-modular String Matching Algorithm." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/19890631579529954123.
Full text國立暨南國際大學
資訊工程學系
98
Lots of string matching algorithms has been proposed. Among them, Boyer-Moore is fast but difficult to analyze its time complexity. The naive, or brute force, algorithm is the simplest string matching method. Its time complexity is O(m*n) when searching with a pattern of length m in a text of length n. Although Morris Pratt (MP) and Knuth-Morris-Pratt (KMP) algorithm method are very simple, and their time complexity is O(n), but every character must be checked at lease once. Colussi method, the one most peculiar method, divides its process into two step. The first step, is by using Knuth-Morris-Pratt (KMP) prefix function table, scans from left to right and leaps some characters. The second step searches from right to left on the characters missed by the first step. By this method, it searches faster than KMP method, but the time analysis is not so simple as KMP’s. Therefore we try to find a method which is simple and is easy to analyse. Our method also have two step similar with Colussi’s. The first step is to compare character from left to right every k characters, then the second step, also starting from the left side, is to check the un-compared characters. This method is simple and easy to analyze as KMP’s, but the number of comparisons is lower than KMP’s.
Liu, Kuei-Wen. "Some Results on Exact String Matching Algorithm." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200814333600.
Full textHsu, Yu-Ling, and 徐玉玲. "An Efficient Real Scaled String Matching Algorithm." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/44140988949103322076.
Full text國立暨南國際大學
資訊工程學系
96
Text searching is used on a wide variety of technologies, such as computer science, multimedia library, web search, etc. This thesis proposes how to find all possible patterns that appear under the real scales in text. On our algorithm, we consider two cases of real scales. One of them contains the scale range is greater than or equal to 1. The other case contains the range is less than or equal to 1. By using the Real Scaled Indexing Tree (RSIT), we can determine whether a substring under every scale can match the text or not. Assume that the lengths of text and pattern length are n and m, respectively. The time complexity is O(n^3).
Liu, Kuei-Wen, and 劉貴文. "Some Results on Exact String Matching Algorithm." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/14472736499752714951.
Full text國立暨南國際大學
資訊工程學系
96
In this paper, we introduce two algorithms for stringology. One is to solve the exact string matching problem and the other is to find the string cycle (period of a string). In the exact string matching problem, we are given two strings T=t1t2...tn and P=p1p2...pm. We are asked to find all occurrences of P in T. Our searching method is based upon a single character rule. Consider a location i in P. Suppose that pi is aligned with tj and pi <> tj. We then must move P in such a way that some pk=tj will be aligned with tj. In the thesis, we propose a method to find the optimal location i. We also modified a string matching approach, named wide window approach, which divides the text into n/m overlapping windows of size 2m-1. In the windows, the approach attempts m possible occurrence positions in parallel. It firstly searches pattern suffixes from middle to right with the modified convolution method, shifts the window directly when it fails, otherwise, scans the corresponding prefixes backward with the modified convolution method again. For the period of string problem, we are given a string T and we are asked to find the periods of T. Our algorithm is based upon the bit parallel approach.
Lo, Shiao Wen. "A Review of String Matching Algorithm without Tables." 2007. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2706200714045300.
Full textLo, Shiao-Wen, and 羅少文. "A Review of String Matching Algoritms without Tables." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/19999616183376766508.
Full text國立暨南國際大學
資訊工程學系
95
General string matching algorithms for string matching problem work in linear time and linear space. The two well-known famous of them are Knuth-Morris-Pratt (KMP) algorithm and Boyer-Moore (BM) algorithm. However, some string matching algorithms presented in recent years work in linear time and just need a small additional memory space. If such algorithms can work in linear time and simultaneously constant space, we call those algorithms as constant space string matching algorithms. In this thesis, we perform a review about constant space string matching algorithms. We will introduce five constant space string matching algorithms: the Constant-space matching for a specific pattern, the MaxSuffix Matching algorithm, the Two Way Matching algorithm, Crochemore-Perrin algorithm and Galil-Seiferas algorithm. Finally, we also discuss the results about them.
Chou, Chih-Chieh, and 周智杰. "A Fast String Matching Algorithm Based On Network Processor." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/25687177452147145028.
Full text國立清華大學
資訊工程學系
92
With the development of high-speed network, uni-processor incapable of affording a large number of traffic can not satisfy what is required by high performance network equipments. It can help improve the performances. However, most network equipments such as network intrusion detection or protection systems need to inspect the packet content and compare with its own signatures, and react appropriately. Thus, the need for a faster algorithm for multi-pattern searching becomes more and more urgent. It is the most crucial factor concerned with the network performance. Take Snort [18], a popular open-source network intrusion detection system as an example, it uses the algorithm called “A fast Algorithm For Multi-Pattern Searching” proposed by Sun Wu, and Udi Manber 1994(denoted as WuM) [6]. The WuM algorithm can compare the input text with the whole patterns concurrently, but the length of the shortest pattern (denoted as LSP) can not be less than the block-size usually equal to 2. If LSP is less than the block-size, in snort, it compares the input text with its own signatures sequentially using the BM algorithm which is proposed by Boyer R. S., and J. S. Moore 1977 [1]. Consequently, the throughput is limited by matching patterns, and has the poor performance. The purpose of the thesis is to improve the performance of WuM algorithm, and to handle the length of the shortest pattern less than the block size. Therefore, we can use the only one algorithm to perform multi-pattern searching even the LSP is equal to 1. We concentrate on typical searches rather than on worst-case behavior. It makes sense in most network devices which need to compare incoming packets to its own patterns. Malicious packets in network are always less than legal packets. To verify the performance of FWM, we use the signatures defined by snort as the patterns, and use the packets downloaded from DEFCON [17] as the input to run the simulation. Finally, we got 15% - 25% performance improvement.
Shieh, Yi-Kung. "An Improved Approximate String Matching Algorithm Based upon the Boyer-Moore Algorithm." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200816121600.
Full textShieh, Y. K., and 謝一功. "An Improved Approximate String Matching Algorithm Based upon the Boyer-Moore Algorithm." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/66121462520038468689.
Full text國立暨南國際大學
資訊工程學系
96
In this thesis, we discuss the exact string matching problem and approximate string matching problem. We avoid using a brute-force algorithm to solve the string matching problem. We improve the Bad Character Rule of Boyer and Moore Algorithm and compare it with the Horspool Algorithm. And we find that the improved method include the information of the Horspool Algorithm. Therefore, we combine the improved method with the Horspool Algorithm. The combinative method has a better efficiency in searching phase.
Chu, Jia-Han, and 朱家漢. "Efficient Pattern Matching Algorithms for Variable-Length and Tolerant Strings." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/07175382319267136974.
Full text國立臺灣海洋大學
資訊工程學系
92
In this thesis, a novel algorithm for searching common segments in multiple DNA or animo acid sequences is designed. To improve efficiency in pattern searching, combination of hashing encoding, quick sorting and ladderlike stepping and/or interval jumping techniques are applied. Since multiple sequence alignment of DNA or animo acid sequences from the giant genomic database is usually time consuming, we develop a three-phase methodology to search common sub-strings and reduce the time complexity in the comparison. In the first coding phase, DNA or animo acid sequences are transformed into a numerical space set. Subsequently, the quick sort algorithms are employed in the second sorting stage to reorder the encoded data. In the last searching phase, ladderlike stepping and interval jumping rules are proposed to increase efficiencies of numerical comparison. In addition, uniform and bitwise interval segmentation techniques are applied prior to interval jumping procedures. The segmenting methodologies are designed according to the length of searching pattern, and the proposed ladderlike searching algorithms provide improved performance. Experimental results show that the algorithms are capable of reducing time complexity from O(mLi(Li-m+1)+mLj(Lj-m+1)) to O(|Ii|+|Ij|). In order to provide the right pair comparing order for multiple DNA or animo acid sequences, several strategies based on statistics are designed for several classified sequences. In addition to the analysis of comparing order, based on voting mechanism, the representation and tolerant features of sub-string among sequences are also taken into consideration in this thesis. Keywords: Stepping, Interval Jumping, Hashing Encoding, Voting Algorithm
Yen, Chen-Cheng. "A Combination of Berry Ravindran Algorithm and Two Way Algorithm for Exact String Matching." 2007. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-2406200814134200.
Full textYen, Chen-Cheng, and 顏成哲. "A Combination of Berry Ravindran Algorithm and Two Way Algorithm for Exact String Matching." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/43753477020737174183.
Full text國立暨南國際大學
資訊工程學系
96
The exact string matching problem is defined as follows: We are given a text and a pattern , . Our purpose is to find all occurrences of in . It is a classical and important problem in computer science. In this thesis, we propose a new approach to solve this problem. Two Way algorithm and the Berry Ravindran algorithm are good algorithms which find all occurrences of in , but some problems still exist. If we consider the situation that the last two text characters of the window with size m are the same with that of , Berry Ravindran algorithm is not efficient in this situation. If a mismatch often occurs within v part of , Two Way algorithm is not efficient in this situation. So we combine Two Way algorithm and Berry Ravindran algorithm to slove this problem Finally; we discuss the results of Two Way algorithm, Berry Ravindran algorithm and our approach.
Chiu, Chi-Chang, and 邱啟彰. "Design a Two-Way Fast String-Matching Algorithm for Intrusion Detection System." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/86518049918252358632.
Full text義守大學
資訊工程學系碩士班
96
As proliferation of Internet applications increases, security becomes a serious problem within network solutions. Intrusion detection systems (IDS) have become widely recognized as the most effective ways for identifying and thwarting all kinds of known network attacks. Because most of the known attacks can be represented with strings or combinations of multiple substrings, string matching is one of the most critical components in IDS. String matching must check every byte of every packet to see if it matches one of a set of ten thousand suspicious strings. As a result, string matching has become the bottleneck in IDS as network speeds grow into the tens of gigabits/second. An efficient string matching algorithms are therefore important for identifying these packets at the line rate. In this study, we propose a two-way parallel structure to further improve the performance of the Aho-Corasick-based string matching algorithm. The proposed string matching algorithm will be implemented by modifying the source code of Snort. Our results showed that two-way Aho-Corasick-based string matching algorithm is superior to other algorithms, especially in detecting network packets with large data payload. Besides, multiway parallel structure can be developed based on the concept of this two way parallel structure, and then be expected to apply to a multiple Gbps intrusion detection system.
陳建麟. "A Parallel String Matching Algorithm for High Speed Network Intrusion Detection System." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/57773162720790226759.
Full textYang, Chao-Yuan, and 楊朝淵. "A New Algorithm for Solving the String Matching with k-Mismatches Problem." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/53970156684807658094.
Full text國立暨南國際大學
資訊工程學系
93
String matching with k-mismatches problem is defined as follows. Given a pattern P with length m, a text T with length n and an integer k, string matching with k-mismatches problem wants to find all occurrences of P on T with k maximal number of mismatches allowable. In this thesis, we propose a new algorithm to solve the problem. The algorithm uses the Kangaroo method mentioned in the paper by Galil and Giancarlo and that by Landau and Vishkin. Our algorithm and the Kangaroo Method are distinct from the number of steps to find mismatches. Each step in our algorithm may find two new mismatches or at least one new mismatch in some case. We can compare our algorithm to the Kangaroo Method. Each step in the Kangaroo Method only finds a mismatch. Another difference between our algorithm and the Kangaroo method is the number of positions on the text which we need to verify if those positions are occurrences of the pattern with at most k mismatches. Our algorithm would consider that some positions on the text are impossible to be occurrences of the pattern and then we don’t need to waste any time to check whether those locations are matches with at most k mismatches.
Liao, Chung-Yu, and 廖重淯. "A Novel Parallel Dual-Character String Matching Algorithm on Graphical Processing Units." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/mn6gb5.
Full text國立臺灣師範大學
電機工程學系
105
Aho-Corasick algorithm has been widely used in network intrusion detection system to inspect network packets against thousands of attack patterns. To improve the performance of network intrusion detection systems, many variations of Aho-Corasick algorithm are proposed to accelerate multiple string matching on GPUs or dedicated hardware. One of the proposed variations is to increase the number of characters that are processed per cycle. However, increasing the number of characters processed per cycle will encounter two major problems. The first problem is the input alignment problem while the second problem is the large increase of memory required for storing the state transition table. The two problems cause the multi-character approach become less feasible. In this thesis, we propose a novel parallel dual-character string matching algorithm on graphical processing units. In order to solve the two major problems, the proposed algorithm presents a new state machine to solve the input alignment problem, and compresses the state transition table using perfect hashing to solve the memory explosion problem. The experimental results show that the proposed algorithm is superior to the state-of-the-art approaches in terms of performance and memory requirements.
Liao, Kuei-Hui, and 廖桂慧. "Solving the Exact String Matching Problem by Using the 2-Substring Algorithm." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/46723272181241712677.
Full text國立暨南國際大學
生物醫學科技研究所
95
String matching is a very important component of many problems, such as data compression, search engine, speech recognition, virus detection, computational biology, and so on. There are many efficient method proposed to solve the string matching problem. For example, KMP algorithm、Boyer-and-Moore algorithm. In this thesis, we proposed a method to solve the exact string matching problem. We proposed a rule, called the 2-substring rule, which avoids the brute force method and can be used to solve the problem. We know the time complexity of KMP algorithm is better than Boyer-and-Moore algorithm. But the practice of Boyer-and-Moore algorithm is faster than KMP algorithm. We implement our method in practice. Our method not only is faster than KMP algorithm, Boyer and Moore algorithm, Horspool algorithm and Quick Search algorithm but also is earlier to implement.
Lin, Sheng-Yuan, and 林聖淵. "A High Performance Parallel Algorithm for Approximate String Matching on Multi-core Processor." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/96412144907123635479.
Full textZheng, Yi-Jun, and 鄭伊君. "A sub-linear time string matching algorithm with Bloom filters acceleration: Design, Implementation and Evaluation." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/25684296002946637638.
Full text國立交通大學
資訊科學與工程研究所
94
Many network security applications heavily rely on string matching to detect malicious intrusions, viruses, spam, and so on. A software-based implementation may not meet the performance requirement of high-speed applications due to intensive computation and frequent memory accesses. A hardware solution to take advantage of hardware parallelism is a promising trend to inspect the packet payload at line rate. In this work, we propose an innovative memory-based architecture using Bloom filters to realize a sub-linear time algorithm that can effectively process multiple characters simultaneously. The two key ideas to realize the sub-linear time algorithm in this architecture are (1) replacing the slow table lookup in the external memory with simultaneous queries to several Bloom filters and (2) designing a non-blocking verification interface to keep the worst-case performance in linear time. The proposed architecture is verified in both behavior simulation in C and timing simulation in HDL. The simulation result shows that the throughput is nearly 10 Gbps for Windows executable files and 600 Mbps in the worst case.