Dissertations / Theses on the topic 'Keyword spotting'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Keyword spotting.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Skácel, Miroslav. "Query-by-Example Keyword Spotting." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234939.
Full textSunde, Valfridsson Jonas. "Query By Example Keyword Spotting." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299743.
Full textRöstgränssnitt har växt i populäritet och med dem ett intresse för öppenvokabulärnyckelordsigenkänning. I den här uppsatsen fokuserar vi på en specifik form av öppenvokabulärnyckelordsigenkänning, den s.k nyckelordsigenkänning- genom- exempel. Tre typer av nyckelordsigenkänning- genom- exempel metoder beskrivs och utvärderas: sekvensavstånd, tal till fonem samt djupavståndsinlärning. Utvärdering görs på konstruerade uppgifter designade att mäta en mängd olika aspekter hos metoderna. Google Speech Commands data används för utvärderingen också, detta för att göra det mer jämförbart mot existerade arbeten. Från resultaten framgår det att djupavståndsinlärning verkar mest lovande förutom i miljöer där resurser är väldigt begränsade; i dessa kan sekvensavstånd vara av intresse. Tal till fonem metoderna visar brister i användningsuvärderingen.
Ling, Yong. "Keyword spotting in continuous speech utterances." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0024/MQ50822.pdf.
Full textLing, Yong 1973. "Keyword spotting in continuous speech utterances." Thesis, McGill University, 1999. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=21595.
Full textPuigcerver, I. Pérez Joan. "A Probabilistic Formulation of Keyword Spotting." Doctoral thesis, Universitat Politècnica de València, 2019. http://hdl.handle.net/10251/116834.
Full text[CAT] La detecció de paraules clau (Keyword Spotting, en anglès), aplicada a documents de text manuscrit, té com a objectiu recuperar els documents, o parts d'ells, que siguen rellevants per a una certa consulta (query, en anglès), indicada per l'usuari, dintre d'una gran col·lecció de documents. La temàtica ha recollit un gran interés en els últims 20 anys entre investigadors en Reconeixement de Formes (Pattern Recognition), així com biblioteques i arxius digitals. Aquesta tesi defineix l'objectiu de la detecció de paraules claus a partir d'una perspectiva basada en la Teoria de la Decisió i una formulació probabilística adequada. Més concretament, la detecció de paraules clau es presenta com un cas concret de Recuperació de la Informació (Information Retrieval), on el contingut dels documents és desconegut, però pot ser modelat mitjançant una distribució de probabilitat. A més, la tesi també demostra que, sota les distribucions de probabilitat correctes, el marc de treball desenvolupat condueix a la solució òptima del problema, segons diverses mesures d'avaluació utilitzades tradicionalment en el camp. Després, diferents models estadístics s'utilitzen per representar les distribucions necessàries: Xarxes Neuronal Recurrents i Models Ocults de Markov. Els paràmetres d'aquests són estimats a partir de dades d'entrenament, i les corresponents distribucions són representades mitjançant Transductors d'Estats Finits amb Pesos (Weighted Finite State Transducers). Amb l'objectiu de fer el marc de treball útil per a grans col·leccions de documents, es presenten distints algorismes per construir índexs de paraules a partir dels models probabilístics, tan basats en un lèxic tancat com en un obert. Aquests índexs són molt semblants als utilitzats per motors de cerca tradicionals. A més a més, s'estudia la relació que hi ha entre la formulació probabilística presentada i altres mètodes de gran influència en el camp de la detecció de paraules clau, destacant algunes limitacions dels segons. Finalment, totes les aportacions s'avaluen de forma experimental, no sols utilitzant proves acadèmics estàndard, sinó també en col·leccions amb desenes de milers de pàgines provinents de manuscrits històrics. Els resultats mostren que el marc de treball presentat permet construir sistemes de detecció de paraules clau molt acurats i ràpids, amb una sòlida base teòrica.
[EN] Keyword Spotting, applied to handwritten text documents, aims to retrieve the documents, or parts of them, that are relevant for a query, given by the user, within a large collection of documents. The topic has gained a large interest in the last 20 years among Pattern Recognition researchers, as well as digital libraries and archives. This thesis, first defines the goal of Keyword Spotting from a Decision Theory perspective. Then, the problem is tackled following a probabilistic formulation. More precisely, Keyword Spotting is presented as a particular instance of Information Retrieval, where the content of the documents is unknown, but can be modeled by a probability distribution. In addition, the thesis also proves that, under the correct probability distributions, the framework provides the optimal solution, under many of the evaluation measures traditionally used in the field. Later, different statistical models are used to represent the probability distribution over the content of the documents. These models, Hidden Markov Models or Recurrent Neural Networks, are estimated from training data, and the corresponding distributions over the transcripts of the images can be efficiently represented using Weighted Finite State Transducers. In order to make the framework practical for large collections of documents, this thesis presents several algorithms to build probabilistic word indexes, using both lexicon-based and lexicon-free models. These indexes are very similar to the ones used by traditional search engines. Furthermore, we study the relationship between the presented formulation and other seminal approaches in the field of Keyword Spotting, highlighting some limitations of the latter. Finally, all the contributions are evaluated experimentally, not only on standard academic benchmarks, but also on collections including tens of thousands of pages of historical manuscripts. The results show that the proposed framework and algorithms allow to build very accurate and very fast Keyword Spotting systems, with a solid underlying theory.
Puigcerver I Pérez, J. (2018). A Probabilistic Formulation of Keyword Spotting [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/116834
TESIS
Wang, Miaorong. "Algorithms and low power hardware for keyword spotting." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/118035.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 73-76).
Keyword spotting (KWS) is widely used in mobile devices to provide hands-free interface. It continuously listens to all sound signals, detects specific keywords and triggers the downstream system. The key design target of a KWS system is to achieve high classification accuracy of specified keywords and have low power consumption while doing real-time processing of speech data. The algorithm based on convolutional neural network (CNN) delivers high accuracy with small model size that can be stored in on-chip memory. However, the state-of-the-art NN accelerators either target at complex tasks using large CNN models, e.g. AlexNet, or support limited neural network (NN) architectures which delivers lower classification accuracy for KWS. This thesis takes an algorithm-and-hardware co-design approach to implement a low power NN accelerator for the KWS system that is able to process CNN with flexible structures. On the algorithm side, we propose a weight tuning method that tweaks the bits of weights to lower the switching activity in the weight network-on-chip (NoC) and multipliers. The algorithm takes in 2's complement 8-bit original weights and outputs sign-magnitude 8-bit tuned weights. In our experiment, 60.96% reduction in the toggle count of weights is achieved with 0.75% loss in accuracy. On the hardware side, we implement a processing element (PE) to efficiently process the tuned weights. It takes in sign-magnitude weights and input activations, and multiplies them by an unsigned multiplier. An XOR gate is used to generate the sign bit of the product. The sign-magnitude product is converted back to 2's complement representation and accumulated using an adder-and-subtractor. The sign bit of the product is used as a carry bit to do the conversion. Comparing to the PE that processes original 2's complement weights, around 35% power reduction is observed. In the end, this thesis presents a CNN accelerator that consumes 1.2 mW when doing real-time processing of speech data with an accuracy of around 87.3% on Google speech command dataset [34].
by Miaorong Wang.
S.M.
Friesch, Pius. "Generating Training Data for Keyword Spotting given Few Samples." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254960.
Full textTaligenkänningssystem behöver generellt en stor mängd träningsdata med varierande röstoch inspelningsförhållanden för att ge robusta resultat. I det specifika fallet med nyckelordsidentifiering, där endast korta kommandon känns igen i stället för stora vokabulärer, måste resurskrävande datainsamling göras för varje sökord individuellt. Under de senaste åren har neurala metoder i talsyntes och röstkonvertering gjort stora framsteg och genererar tal som är realistiskt för det mänskliga örat. I det här arbetet undersöker vi möjligheten att använda sådana metoder för att generera träningsdata för nyckelordsidentifiering. I detalj vill vi utvärdera om det genererade träningsdatat verkligen är realistiskt eller bara låter så, och om en modell tränad på dessa genererade exempel generaliserar väl till verkligt tal. Vi utvärderade tre metoder för neural talsyntes och röstomvandlingsteknik: (1) Speaker Adaptive VoiceLoop, (2) Factorized Hierarchical Variational Autoencoder (FHVAE), (3) Vector Quantised-Variational AutoEncoder (VQVAE).Dessa tre metoder används för att antingen generera träningsdata från text (talsyntes) eller att berika ett befintligt dataset för att simulera flera olika talare med hjälp av röstkonvertering, och utvärderas i ett system för nyckelordsidentifiering. Modellernas prestanda jämförs med en baslinje baserad på traditionell signalbehandling där tonhöjd och tempo varieras i det ursprungliga träningsdatat. Experimenten visar att man med hjälp av neurala nätverksmetoder kan ge en upp till 20% relativ noggrannhetsförbättring på valideringsuppsättningen jämfört med ursprungligt träningsdata. Baslinjemetoden baserad på signalbehandling ger minst dubbelt så bra resultat. Detta tycks indikera att användningen av talsyntes eller röstkonvertering med flera talare inte ger tillräckligt varierade eller representativa träningsdata.
Zhang, Yaodong Ph D. Massachusetts Institute of Technology. "Unsupervised spoken keyword spotting and learning of acoustically meaningful units." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/54655.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (p. 103-106).
The problem of keyword spotting in audio data has been explored for many years. Typically researchers use supervised methods to train statistical models to detect keyword instances. However, such supervised methods require large quantities of annotated data that is unlikely to be available for the majority of languages in the world. This thesis addresses this lack-of-annotation problem and presents two completely unsupervised spoken keyword spotting systems that do not require any transcribed data. In the first system, a Gaussian Mixture Model is trained to label speech frames with a Gaussian posteriorgram, without any transcription information. Given several spoken samples of a keyword, a segmental dynamic time warping is used to compare the Gaussian posteriorgrams between keyword samples and test utterances. The keyword detection result is then obtained by ranking the distortion scores of all the test utterances. In the second system, to avoid the need for spoken samples, a Joint-Multigram model is used to build a mapping from the keyword text samples to the Gaussian component indices. A keyword instance in the test data can be detected by calculating the similarity score of the Gaussian component index sequences between keyword samples and test utterances. The proposed two systems are evaluated on the TIMIT and MIT Lecture corpus. The result demonstrates the viability and effectiveness of the two systems. Furthermore, encouraged by the success of using unsupervised methods to perform keyword spotting, we present some preliminary investigation on the unsupervised detection of acoustically meaningful units in speech.
by Yaodong Zhang.
S.M.
Narasimhan, Karthik Rajagopal. "Morphological segmentation : an unsupervised method and application to Keyword Spotting." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/90139.
Full text26
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 41-44).
The contributions of this thesis are twofold. First, we present a new unsupervised algorithm for morphological segmentation that utilizes pseudo-semantic information, in addition to orthographic cues. We make use of the semantic signals from continuous word vectors, trained on huge corpora of raw text data. We formulate a log-linear model that is simple and can be used to perform fast, efficient inference on new words. We evaluate our model on a standard morphological segmentation dataset, and obtain large performance gains of up to 18.4% over an existing state-of-the-art system, Morfessor. Second, we explore the impact of morphological segmentation on the speech recognition task of Keyword Spotting (KWS). Despite potential benefits, state-of-the-art KWS systems do not use morphological information. In this thesis, we augment a KWS system with sub-word units derived by multiple segmentation algorithms including supervised and unsupervised morphological segmentations, along with phonetic and syllabic segmentations. Our experiments demonstrate that morphemes improve overall performance of KWS systems. Syllabic units, however, rival the performance of morphological units when used in KWS. By combining morphological and syllabic segmentations, we demonstrate substantial performance gains..
by Karthik Rajagopal Narasimhan.
S.M. in Computer Science and Engineering
Thambiratnam, Albert J. K. "Acoustic keyword spotting in speech with applications to data mining." Thesis, Queensland University of Technology, 2005. https://eprints.qut.edu.au/37254/1/Albert_Thambiratnam_Thesis.pdf.
Full textKarmacharya, Piush. "Design of Keyword Spotting System Based on Segmental Time Warping of Quantized Features." Master's thesis, Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/205192.
Full textM.S.E.E.
Keyword Spotting in general means identifying a keyword in a verbal or written document. In this research a novel approach in designing a simple spoken Keyword Spotting/Recognition system based on Template Matching is proposed, which is different from the Hidden Markov Model based systems that are most widely used today. The system can be used equally efficiently on any language as it does not rely on an underlying language model or grammatical constraints. The proposed method for keyword spotting is based on a modified version of classical Dynamic Time Warping which has been a primary method for measuring the similarity between two sequences varying in time. For processing, a speech signal is divided into small stationary frames. Each frame is represented in terms of a quantized feature vector. Both the keyword and the speech utterance are represented in terms of 1‐dimensional codebook indices. The utterance is divided into segments and the warped distance is computed for each segment and compared against the test keyword. A distortion score for each segment is computed as likelihood measure of the keyword. The proposed algorithm is designed to take advantage of multiple instances of test keyword (if available) by merging the score for all keywords used. The training method for the proposed system is completely unsupervised, i.e., it requires neither a language model nor phoneme model for keyword spotting. Prior unsupervised training algorithms were based on computing Gaussian Posteriorgrams making the training process complex but the proposed algorithm requires minimal training data and the system can also be trained to perform on a different environment (language, noise level, recording medium etc.) by re‐training the original cluster on additional data. Techniques for designing a model keyword from multiple instances of the test keyword are discussed. System performance over variations of different parameters like number of clusters, number of instance of keyword available, etc were studied in order to optimize the speed and accuracy of the system. The system performance was evaluated for fourteen different keywords from the Call - Home and the Switchboard speech corpus. Results varied for different keywords and a maximum accuracy of 90% was obtained which is comparable to other methods using the same time warping algorithms on Gaussian Posteriorgrams. Results are compared for different parameters variation with suggestion of possible improvements.
Temple University--Theses
James, David Anthony. "The application of classical information retrieval techniques to spoken documents." Thesis, University of Cambridge, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.361412.
Full textChen, I.-Fan. "Resource-dependent acoustic and language modeling for spoken keyword search." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54919.
Full textManos, Alexandros Sterios. "A study on out-of-vocabulary word modelling for a segment-based keyword spotting system." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/39394.
Full textIncludes bibliographical references (leaves 93-95).
by Alexandros Sterios Manos.
M.S.
Anifowose, Olakunle. "DESIGN OF A KEYWORD SPOTTING SYSTEM USING MODIFIED CROSS-CORRELATION IN THE TIME AND THE MFCC DOMAIN." Master's thesis, Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/205117.
Full textM.S.E.E.
Abstract A Keyword Spotting System (KWS) is a system that recognizes predefined keywords in spoken utterances or written documents. The objective is to obtain the highest possible keyword detection rate without increasing the number of false detections in a system. The common approach to keyword spotting is the use of a Hidden Markov Model (HMM). These are usually complex systems which require training speech data. The Typical HMM approach uses garbage templates or HMM models to match non-keyword speech and non-speech sounds. The purpose of this research is to design a simple Keyword Spotting System. The system will be designed to spot English words and should be easily adaptable to other languages There are many challenges in designing a keyword spotting system such as variations in speech like pitch, loudness, timbre that make recognition difficult. There can be wide variations in utterances even from the same speaker. In this research, the use of cross-correlation, as an alternative means for detecting keywords in an utterance, was investigated. This research also involves the modeling of a global keyword using a quantized dynamic time warping algorithm, which can function effectively with multi-speakers. The global keyword is an aggregation of the features from several occurrences of the same keyword. This research also investigates the effect of pitch normalization on keyword detection. The use of cross-correlation as a method for keyword spotting was investigated in both the time and MFCC domain. In the time domain the global keyword was cross-correlated with a pitch-normalized utterance. A zero lag ratio (the ratio of the power around the zero lag obtained from a cross correlation to the power in the rest of the signal is computed) was computed for each speech frame, a threshold was then used to determine if the keyword is present. For the MFCC domain the MFCC features of each keyword were computed, normalized and cross-correlated with the normalized MFCC features of portions of the utterance of the same size as the keyword. Cross-correlation of MFCC features of the keyword with that of each portion of the utterance yields a single value between 0-1. The portion with the highest value is usually the location of the keyword. Results in the time domain varied from keyword to keyword, some words showed a 60% hit rate while the average obtained from various keywords from the Call Home database had an average of 41%. Cross-correlation of the keywords and utterance in the MFCC domain yielded a 66% hit rate in test conducted on all different keywords in the Call Home and Switchboard corpus. The system accuracy is keyword dependent with some keywords having an 85% hit rate
Temple University--Theses
Zemánek, Tomáš. "Detekce klíčových slov v mluvené řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2011. http://www.nusl.cz/ntk/nusl-229642.
Full textThomas, Simon. "Extraction d'information dans des documents manuscrits non contraints : application au traitement automatique des courriers entrants manuscrits." Rouen, 2012. http://www.theses.fr/2012ROUES048.
Full textDespite the avenment of our world into the digital era, a large amount of handwritten documents continue to be exchanged, forcing our companies and administrations to cope with the processing of masses of documents. Automatic processing of these documents requires access to an unknown but relevant part of their content, and implies taking into account three key points : the document segmentation into relevant entities, their recognition and the rejection of irrelevant entities. Contrary to traditional approaches (full documents reading or keyword detection), all processes are parallelized leading to an information extraction approach. The first contribution of the present work is the design of a generic text line model for information extraction purpose and the implementation of a complete system based on Hidden Markov Models (HMM) constrained by this model. In one pass, the recognition module seeks to discriminate relevant information, characterized by a set of alphabetic, numeric or alphanumeric queries, with the irrelevant information, characterized by a filler model. A second contribution concerns the improvement of the local frame discrimination by using a deep neural network. This allows one to infer high-level representation for the frames and thus automate the feature extraction process. These result is a complete, generic and industrially system, responding to emerging needs in the field of handwritten document automatic reading : the extraction of complex information in unconstrained documents
Khan, Wasiq. "A Novel Approach for Continuous Speech Tracking and Dynamic Time Warping. Adaptive Framing Based Continuous Speech Similarity Measure and Dynamic Time Warping using Kalman Filter and Dynamic State Model." Thesis, University of Bradford, 2014. http://hdl.handle.net/10454/14802.
Full textHe, Jeannie, and Matthew Norström. "Utvärdering av Part-of-Speech tagging som metod för identifiering av nyckelord i dialog." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263692.
Full textThis study presents Part-of-Speech tagging as a method for keyword spotting as well as a market research for a conversational robot to lead a language café. The results are evaluated using the answers from 30 anonymous Swedish native speakers. The results show that the method is plausible and could be implemented in a conversational robot to increase its understanding of the spoken language in a language café. The market research indicates that there is a market for the conversational robot. The conversional robot needs, however, improvements to successfully become a substitute for human language teachers in language cafés.
Tomec, Martin. "Optimalizace rozpoznávání řeči pro mobilní zařízení." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237132.
Full textKaranasou, Panagiota. "Phonemic variability and confusability in pronunciation modeling for automatic speech recognition." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00843589.
Full textWallace, Roy Geoffrey. "Fast and accurate phonetic spoken term detection." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/39610/1/Roy_Wallace_Thesis.pdf.
Full textThomas, S. "Extraction d'information dans des documents manuscrits non contraints : application au traitement automatique des courriers entrants manuscrits." Phd thesis, Université de Rouen, 2012. http://tel.archives-ouvertes.fr/tel-00863502.
Full textZhezhela, Oleksandr. "Vizualizace výstupu z řečových technologií pro potřeby kontaktních center." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236041.
Full textKúšik, Lukáš. "Electronic Flight Bag." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-449169.
Full textLiu, You-Te, and 劉佑德. "Keyword Spotting for Multi-keyword Sentences." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/69825552494630758074.
Full textQiu, Jian-Hong, and 邱建宏. "The Adaptive Keyword Spotting System." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/55713266507207501323.
Full textJong, Huey Jen, and 鐘慧真. "Improvement of Keyword Spotting Method." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/60239372937583640693.
Full textPai, Yu-sheng, and 白育昇. "A system for Keyword Spotting." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/14150552162566606147.
Full text國立中央大學
通訊工程研究所碩士在職專班
97
This paper’s goal is to research voice reorganization technique and to develop a speech keyword spotting system which can be working on any operation system and have the feature of probability and easy to use. This system are consist of three part, voice data reading program and keyword spotting program are working in the Microsoft Windows XP SP system, and develop platform is Borland C++ Builder 5. Speech keyword reorganization program is developed by HTK 3.3 and working in the Linux Fedora 5system. In this system we use HTK to develop HMM and to build the acoustics model, and we use 411 syllables which is build by 21 initials and 36 finals to develop a acoustics model which HMM state and mixtures is 6 and 17. In this model the training speech detection ratio must reach 92%, false alarm rate must under 13%. In the practical keywod model speech material input experiment, the differential between detection ratio and false alarm ratio keep in 3%, and detection ratio must reach 89%, false alarm rate under 16%. Finally we will use this model to build a speech keyword spotting reorganization system, and we will design a human interface program to provide to the operator, so that they can easy to use this system.
Haji, Mehdi. "Arbitrary Keyword Spotting in Handwritten Documents." Thesis, 2012. http://spectrum.library.concordia.ca/973970/1/Haji_PhD_S2012.pdf.
Full textChang, Wen-Hai, and 姜文海. "Keyword Spotting By Diverse NonKeyword Models." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/85257955390148289153.
Full textChen, Shing-Huai, and 陳芯暉. "Continuous Speech Keyword Spotting Using Phoneme Concatenation." Thesis, 1995. http://ndltd.ncl.edu.tw/handle/57886611966490955864.
Full text國立成功大學
資訊及電子工程研究所
83
In this thesis, s Continuous Speech Keyword SPotting System Using Phoneme Template Concatenation is described, the function of this system is to extract the keyword of a sentence which is from the continuous speech input of users can define their own keyword or nonkeyword database without retraining this system. In the procedure of training, we collect 176 monosyllables for training. they are segemented into consonants and vowels. Then we train them with Bayesain Network and svae them into the referance databese. In the recognition procedure, we use the One Stage dynamic algorithm as the main skeleton of the system. In order to increase the speed and the accuracy of recognition, we propose several useful methods. Lastly, we nornalize the accomulated distortion to decide the best recsult. In our experimemts, we collect 30 place names in the South of Taiwan as the keywords and 20 kprobable inquiry words as the nonkeywords to simulate the system. Experimental results show that the average pertage of accuracy is 86.3%.
陳世民. "Applying consensus hypothesis to the keyword spotting." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/35655018330966085376.
Full textWang, Yi-Lii, and 王怡理. "A Design of Mandarin Keyword Spotting System." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/50857546657271436378.
Full text國立中山大學
電機工程學系研究所
91
A Mandarin keyword spotting system based on LPC, VQ, discrete-time HMM and Viterbi algorithm is proposed in the thesis. Joining with a dialogue system, this keyword spotting platform is further refined to a prototype of Taiwan Railway Natural Language Reservation System. In the reservation process, five questions: name and ID number, departure station, destination station, train type and number of tickets, and time schedule are asked by the computer-dialogue attendant. Following by the customer’s speech confirmation, electronic tickets can be correctly issued and printed within 90 seconds in a laboratory environment.
Cheng, Han Min, and 鄭漢銘. "A Study On The Keyword Spotting System." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/92022534881446108652.
Full textChen, Yen-An, and 陳彥安. "A Hierarchical Keyword Spotting Method for Continuous Speech." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/33279403155913093840.
Full text國立東華大學
資訊工程學系
94
Filler-based keyword spotting method’s performance depends heavily on the efficacy of the built filler (garbage) model. Training an effective filler model is not a trivial task. Therefore, it is the goal of this study to propose a novel improved solution over the filler-based keyword spotting methods. This thesis proposes a hierarchical keyword spotting (HKWS) method that replaces the keyword spotting process with a syllable-level spotting process followed by another word-level composition process. The first step of the syllable-level spotting is to pinpoint out all possible segments in the input speech signal. Afterwards, the second step to prune those syllable segments with lower scores through a syllable verification process. The word-level composition process is to concatenate the verified syllable segments into the target keyword under a predefined reasonable condition for concatenation. The proposed approach achieves both the domain independence and vocabulary independence, and therefore is suitable for customization under different applications. Several experiments are also conducted to verify the effectiveness of the proposed HKWS method.
Hsu, Chih-wen, and 徐志文. "A Study on Mandarin Keyword Spotting and UtteranceVerification." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/01075663132747706823.
Full textTsai, Yan-Hsing, and 蔡炎興. "A System for Keyword Spotting and Speaker Recognition." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/12571122850830163508.
Full textLin, Chia-Hsien, and 林家賢. "Mandarin Keyword Spotting by Searching the Syllable Lattices." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/56911575721497804150.
Full textLi, Bo-Yi, and 李柏毅. "Apply Emotional Speech Recognition to Keyword Spotting System." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/36897528516588830798.
Full text崑山科技大學
數位生活科技研究所
98
This study proposed the emotional speech classification and speech recognition system through the collected emotional speech corpus and TCC300 speech corpus. We exploited one pass search algorithm to complete the recognition task of emotion and speech by the used of the MFCC and hidden Markov model-based recognition architecture. In emotional speech recognition, we assumed that the short-time speech emotion may varied, therefore, the mechanism of longest lasting time accumulation of the most possible speech emotion was adopted after one pass search algorithm to obtain the result. To evaluate the performance of emotional speech classification, we generate ten sets of training and test data collections randomly. The chi-square testing was adopted to examine the performance trends among different experiment data sets and the confusion between different emotional speech. After the emotional speech classification, the speech recognition was followed to extract the keywords for the spoken news query. The system listed appropriate spoken news to be ready for playback according the recognized speech emotion and the keywords.
Chou, Che-Hsuan, and 周哲玄. "Implementation and Comparison of Keyword Spotting for Taiwanese." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/31220962632445120532.
Full text國立清華大學
資訊工程學系
100
This thesis focuses on improving in the performance of a Taiwanese keyword spotting system by integrating speech assessment and pitch contour classification. In the first part of this research, we use different methods to implement a Taiwanese keyword spotting system. In second part, we improve the system by validation using speech assessment and pitch contour classification. In the first part, two methods are adopted to implement the keyword spotting system. The first method uses the hidden Markov model while the second method uses the phone mismatching method. The phone mismatching method can be further characterized into three types of algorithm: penalty matrix (PM), confusion matrix (CM) and Levenshtein matrix (LD). We then perform speech assessment and pitch contour classification to validates the candidate keywords selected by these two methods to refine the results. A threshold is used for each of these two methods, and a decision tree is used to make the final decision. Experimental results shows that the HMM method can achieve an equal error rate (ERR) of 46.5%. The ERR reduces to 26.5% after the HMM method is incorporated with speech assessment validation, the FAR further reduces to 24.7% after being incorporated with pitch contour classification. In the phone mismatch experiment, PM, CM, and LD achieve an ERR of 39.4%, 34.0%, and 42.2% respectively. After being incorporated speech assessment validation, ERRs reduce to 34.6% for PM and 28.4% for CM. After being incorporated with pitch contour classification, ERRs further reduce to 33.7% for PM and 27.3% for CM. This concludes that the validation technique using speech assessment and pitch contour classification can improve the performance of Taiwanese keyword spotting.
Wu, Kang-Lin, and 吳岡霖. "Application of Speech Keyword Spotting in Train Schedule Querying." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/7739p5.
Full textWu, Ching-Leung, and 武景龍. "A Keyword Spotting System using Syllable/Word Confusion Modeling." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/94151614845197171844.
Full text國立交通大學
電信工程系
89
In this thesis, a new keyword spotting system using reward function to discriminate keywords from the background fillers was proposed. It first defined a confusion measure between Mandarin syllable-pair as the HMM recognition score difference of the correct and alternate hypothesis. And, the Gaussian pdf is used to model the confusion measure of syllable-pair. Under the assumption of independence between adjacent syllables, the confusion measure of word-pair becomes the summation of corresponding syllable-pairs. Finally, the confusing measure of word-pair was used to decide the reward function in the keyword spotting system. In the proposed system, the Top-N syllable lattice was first found by HMM syllable recognizer, and the keyword candidate can be found from syllable lattice associate with the most probable filler model. And, the reward function of each keyword candidate can be decided from confusion measure between keyword-filler word-pair and the desired keyword recognition rate.
Hou, Cheng-Kuan, and 侯政寬. "A Design and Applications of Mandarin Keyword Spotting System." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/25291815334814226760.
Full text國立中山大學
通訊工程研究所
91
A Mandarin keyword spotting system based on MFCC, discrete-time HMM and Viterbi algorithm with DTW is proposed in this thesis. Joining with a dialogue system, this keyword spotting platform is further refined to a prototype of natural speech patient registration system of Kaohsiung Veterans General Hospital. After the ID number is asked by the computer-dialogue attendant in the registration process, the user can finish all relevant works in one sentence. Functions of searching clinical doctors, making and canceling registration are all built in this system. In a laboratory environment, the correct rate of this speaker-independent patient registration system can reach 97% and all registration process can be completed within 75 seconds.
Huang, Kuo-Chang, and 黃國彰. "A Study on Utterance Verification in Chinese Keyword Spotting." Thesis, 1997. http://ndltd.ncl.edu.tw/handle/74510893636928654215.
Full text國立中央大學
電機工程學系
85
In this thesis, we will search the best technique of utterance verification,using in keyword spotting system. The corpus of this thesis can divide into three parts, in the first part; we focus on increasing the recognition rate in keyword spotting system without the ability of rejection. From the experimental information, we proposed a decision rule that can rise the recognition rate from Top3 to Top1. From the experimental result, it is seem that the recognitionrate can up to 6%-7% after using the decision rule. Finally, we improve the system recognition speed about 30 times according to the rule of speeding up.In the second part, we make some utterance verification experiments based on likelihood score ratio. At the same time, the anti-keyword model is used as alternative model. By using the alternative model the better recognition resultswere obtained. In addition, another approach is proposed to generate the anti-keyword model without retrain the training data. In the final of this thesis, we integrate the front second parts to build a utterance verificationsystem with multiple parts (two-stages recognition system). The advantage is that it can modify the correct keyword path to optimal path by utterance verification in second stage if the correct keyword path doesn''t appear in Top1path in the first stage. It is found that the multiple paths of utterance verification system will perform well.
Chung, Chin-Chu, and 鐘進竹. "A Mandarin Keyword Spotting System Assisted with Tone Recognition." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/95216143773838972639.
Full text國立交通大學
電機學院電信學程
99
Most of today's Mandarin speech recognition systems use 411 syllables (regardless of tone information) as recognition unit, and most of them could be recognized correctly with the help of language model. However, in the case of keyword spotting, keywords are always Named Entities, such as person names, location names, company names,…, etc. Those keywords are usually only two characters in length and easily confused with each other. So it is important to recognize words with tone information. In this thesis, two-stage keyword spotting system is used. RAPT (A Robust Algorithm for Pitch Tracking) is applied to get the pitch contour in the feature extraction phase of the original system. The likelihood scores derived from Top-10 keyword recognition are added with the scores from the second stage MLP tone recognizer, and then the scores with Top-10 results are reordered to get better recognized answers. In this thesis, keyword spotting system is made for a specific keyword phrases: 341 company names (1074 including the aliases) in Hsinchu Science Park. The keyword recognition rate is 94.54% without tone recognition, which increases to 95.32% with the second stage tone recognizer, and the error reducing rate is 14.3%.
Hung, Yu-Chun, and 洪毓淳. "Robust Multi-keyword Spotting of Temephone Speech Using Stochastic Matching." Thesis, 1998. http://ndltd.ncl.edu.tw/handle/54754631912556353075.
Full text國立成功大學
資訊工程學系
86
Automatic speech recognition over telephone networks can offer various services for users. To develop a robust telephone speech recognition systemis one of the primary research in automatic speech recognition. When the speech recognition system is applied in telephone networks, the acoustic mismatch between the training and testing environments always causes performance degradation. In this thesis, a two-level codebook- based stochastic matching is proposed to deal with this problem. Besides, some bias removal algorithms are also described and compared. Keyword-based speech recognition systems have been developed for domain specific speech understanding applications. It is capable of spotting (single/multiple) keywords embedded in non-keyword speech and background noise. In this thesis, a multi-keyword spotting method which adopts the relationship between keywords is proposed to improve the spotting rate. Finally, we integrate the two-level codebook-based stochastic matching, which is used to deal with the acoustic mismatch problem to the multi-keyword spotting system with the 1275 keywords over telephone networks. From the experimental results, the top 1 recognition accuracy is improved from 49.33% to 75.50%.
Tsai, Yeong-Chi, and 蔡永琪. "Sub-Syllable Unit Based Keyword Spotting of Large Vocabulary Recognition." Thesis, 1995. http://ndltd.ncl.edu.tw/handle/14569386828196271167.
Full text國立中央大學
電機工程研究所
83
In this thesis, we mainly propose a technique for Continuous Speech Wordspotting System based on Sub- Syllable Unit. A Connected Word Recognition Algorithm has been implemented. We discuss techniques for dealing with both non- keyword speech models and keyword speech models in training. An HMM-based connected word recognition sys- tem is used to find the best sequence of keyword, non- keyword for matching the actual input. In this thesis, we task with a 35 Keyword by telephone network system to simulate an automatic recognition system of the Ph- one Operator for speaker independent case.
Deng, Cun-Zhi, and 鄧存智. "Voice Activity Detection and Keyword Spotting System on Embedded Platform." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/39049943250279766666.
Full text國立交通大學
電信工程系所
96
In this thesis, an embedded isolated Mandarin word recognition system and keyword spotting were implemented in a Windows-CE based mobile device. The speech recognition system used a HMM recognizer using partition Lapalacian observation probability. The system was carefully tuning in order to get the near real-time response without performance degeneration. Finally, the isolated Mandarin word recognition system, the system response time is about 1.2 times real time and 95.8% word recognition rate can be achieved for a 1000 words application, in the other hand, keyword spotting response time is 1.8 real time and 91.6% recognition rate for a 776 words.
Chan, Feng-Mao, and 詹豐懋. "New keyword Spotting Method using Confusion Measure between 411 Monosyllables." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/62287907574810638393.
Full text國立交通大學
電信工程系
87
In this thesis, a new keyword spotting method using confusion measure was proposed. The confusion measure of 411 monosyllables was first found from the pdf of recognition score of each models. By proper chosen the missing error probability, the confusion penalties could be found in monosyllable recognizer. Confusion penalties are applied to the keyword spotting system and the recognition rate of keyword recognition system is improved. Furthermore, tone recognizer is added in keyword spotting system. And, the confusion penalties of 5 tones are also applied to the keyword spotting system. Performance of the proposed method is examined by simulations using real telephone-speech database. The new method improves the recognition rate of one-keyword system from 71.55% to 75.34% and that of multi-keyword system from 72.76% to 75.00%. With tone recognition, the recognition rate of 80.86% for one-keyword system and 79.65% for multi-keyword system are achieved.