Dissertations / Theses: 'Pattern recognition, speech recognition'

1

Milner, Benjamin Peter. "Speech recognition in adverse environments." Thesis, University of East Anglia, 1994. https://ueaeprints.uea.ac.uk/2907/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Long, Christopher J. "Wavelet methods in speech recognition." Thesis, Loughborough University, 1999. https://dspace.lboro.ac.uk/2134/14108.

Full text

Abstract:

In this thesis, novel wavelet techniques are developed to improve parametrization of speech signals prior to classification. It is shown that non-linear operations carried out in the wavelet domain improve the performance of a speech classifier and consistently outperform classical Fourier methods. This is because of the localised nature of the wavelet, which captures correspondingly well-localised time-frequency features within the speech signal. Furthermore, by taking advantage of the approximation ability of wavelets, efficient representation of the non-stationarity inherent in speech can be achieved in a relatively small number of expansion coefficients. This is an attractive option when faced with the so-called 'Curse of Dimensionality' problem of multivariate classifiers such as Linear Discriminant Analysis (LDA) or Artificial Neural Networks (ANNs). Conventional time-frequency analysis methods such as the Discrete Fourier Transform either miss irregular signal structures and transients due to spectral smearing or require a large number of coefficients to represent such characteristics efficiently. Wavelet theory offers an alternative insight in the representation of these types of signals. As an extension to the standard wavelet transform, adaptive libraries of wavelet and cosine packets are introduced which increase the flexibility of the transform. This approach is observed to be yet more suitable for the highly variable nature of speech signals in that it results in a time-frequency sampled grid that is well adapted to irregularities and transients. They result in a corresponding reduction in the misclassification rate of the recognition system. However, this is necessarily at the expense of added computing time. Finally, a framework based on adaptive time-frequency libraries is developed which invokes the final classifier to choose the nature of the resolution for a given classification problem. The classifier then performs dimensionaIity reduction on the transformed signal by choosing the top few features based on their discriminant power. This approach is compared and contrasted to an existing discriminant wavelet feature extractor. The overall conclusions of the thesis are that wavelets and their relatives are capable of extracting useful features for speech classification problems. The use of adaptive wavelet transforms provides the flexibility within which powerful feature extractors can be designed for these types of application.

APA, Harvard, Vancouver, ISO, and other styles

3

Stewart, Darryl William. "Syllable based continuous speech recognition." Thesis, Queen's University Belfast, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.325993.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Luettin, Juergen. "Visual speech and speaker recognition." Thesis, University of Sheffield, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264432.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Mwangi, Elijah. "Speaker independent isolated word recognition." Thesis, Loughborough University, 1987. https://dspace.lboro.ac.uk/2134/15425.

Full text

Abstract:

The work presented in this thesis concerns the recognition of isolated words using a pattern matching approach. In such a system, an unknown speech utterance, which is to be identified, is transformed into a pattern of characteristic features. These features are then compared with a set of pre-stored reference patterns that were generated from the vocabulary words. The unknown word is identified as that vocabulary word for which the reference pattern gives the best match. One of the major difficul ties in the pattern comparison process is that speech patterns, obtained from the same word, exhibit non-linear temporal fluctuations and thus a high degree of redundancy. The initial part of this thesis considers various dynamic time warping techniques used for normalizing the temporal differences between speech patterns. Redundancy removal methods are also considered, and their effect on the recognition accuracy is assessed. Although the use of dynamic time warping algorithms provide considerable improvement in the accuracy of isolated word recognition schemes, the performance is ultimately limited by their poor ability to discriminate between acoustically similar words. Methods for enhancing the identification rate among acoustically similar words, by using common pattern features for similar sounding regions, are investigated. Pattern matching based, speaker independent systems, can only operate with a high recognition rate, by using multiple reference patterns for each of the words included in the vocabulary. These patterns are obtained from the utterances of a group of speakers. The use of multiple reference patterns, not only leads to a large increase in the memory requirements of the recognizer, but also an increase in the computational load. A recognition system is proposed in this thesis, which overcomes these difficulties by (i) employing vector quantization techniques to reduce the storage of reference patterns, and (ii) eliminating the need for dynamic time warping which reduces the computational complexity of the system. Finally, a method of identifying the acoustic structure of an utterance in terms of voiced, unvoiced, and silence segments by using fuzzy set theory is proposed. The acoustic structure is then employed to enhance the recognition accuracy of a conventional isolated word recognizer.

APA, Harvard, Vancouver, ISO, and other styles

6

Alphonso, Issac John. "Network training for continuous speech recognition." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-10252003-105104.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Allerhand, M. H. "A knowledge-based approach to speech pattern recognition." Thesis, University of Cambridge, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.377200.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Baothman, Fatmah bint Abdul Rahman. "Phonology-based automatic speech recognition for Arabic." Thesis, University of Huddersfield, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.273720.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Holmes, Wendy Jane. "Modelling segmental variability for automatic speech recognition." Thesis, University College London (University of London), 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.267859.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Prager, Richard William. "Parallel processing networks for automatic speech recognition." Thesis, University of Cambridge, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.238443.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Alexi, Ramsin. "A sub-neural network ensemble for speech recognition." Thesis, University of Sussex, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.298656.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Tran, Dat Tat, and n/a. "Fuzzy approaches to speech and peaker recognition." University of Canberra. Management & Technology, 2000. http://erl.canberra.edu.au./public/adt-AUC20061109.151916.

Full text

Abstract:

Stastical pattern recognition is the most successful approach to automatic speech and speaker recognition (ASASR). Of all the statistical pattern recognition techniques, the hidden Markov model (HMM) is the most important. The Gaussian mixture model (GMM) and vector quantisation (VQ) are also effective techniques, especially for speaker recognition and in conjunction with HMMs. for speech recognition. However, the performance of these techniques degrades rapidly in the context of insufficient training data and in the presence of noise or distortion. Fuzzy approaches with their adjustable parameters can reduce such degradation. Fuzzy set theory is one of the most, successful approaches in pattern recognition, where, based on the idea of a fuzzy membership function, fuzzy C'-means (FCM) clustering and noise clustering (NC) are the most, important techniques. To establish fuzzy approaches to ASASR, the following basic problems are solved. First, a time-dependent fuzzy membership function is defined for the HMM. Second, a general distance is proposed to obtain a relationship between modelling and clustering techniques. Third, fuzzy entropy (FE) clustering is proposed to relate fuzzy models to statistical models. Finally, fuzzy membership functions are proposed as discriminant functions in decison making. The following models are proposed: 1) the FE-HMM. NC-FE-HMM. FE-GMM. NC-FEGMM. FE-VQ and NC-FE-VQ in the FE approach. 2) the FCM-HMM. NC-FCM-HMM. FCM-GMM and NC-FCM-GMM in the FCM approach, and 3) the hard HMM and GMM as the special models of both FE and FCM approaches. Finally, a fuzzy approach to speaker verification and a further extension using possibility theory are also proposed. The evaluation experiments performed on the TI46, ANDOSL and YOHO corpora showbetter results for all of the proposed techniques in comparison with the non-fuzzy baseline techniques.

APA, Harvard, Vancouver, ISO, and other styles

13

Wang, Xuechuan, and n/a. "Feature Extraction and Dimensionality Reduction in Pattern Recognition and Their Application in Speech Recognition." Griffith University. School of Microelectronic Engineering, 2003. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20030619.162803.

Full text

Abstract:

Conventional pattern recognition systems have two components: feature analysis and pattern classification. Feature analysis is achieved in two steps: parameter extraction step and feature extraction step. In the parameter extraction step, information relevant for pattern classification is extracted from the input data in the form of parameter vector. In the feature extraction step, the parameter vector is transformed to a feature vector. Feature extraction can be conducted independently or jointly with either parameter extraction or classification. Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are the two popular independent feature extraction algorithms. Both of them extract features by projecting the parameter vectors into a new feature space through a linear transformation matrix. But they optimize the transformation matrix with different intentions. PCA optimizes the transformation matrix by finding the largest variations in the original feature space. LDA pursues the largest ratio of between-class variation and within-class variation when projecting the original feature space to a subspace. The drawback of independent feature extraction algorithms is that their optimization criteria are different from the classifiers minimum classification error criterion, which may cause inconsistency between feature extraction and the classification stages of a pattern recognizer and consequently, degrade the performance of classifiers. A direct way to overcome this problem is to conduct feature extraction and classification jointly with a consistent criterion. Minimum classification Error (MCE) training algorithm provides such an integrated framework. MCE algorithm was first proposed for optimizing classifiers. It is a type of discriminative learning algorithm but achieves minimum classification error directly. The flexibility of the framework of MCE algorithm makes it convenient to conduct feature extraction and classification jointly. Conventional feature extraction and pattern classification algorithms, LDA, PCA, MCE training algorithm, minimum distance classifier, likelihood classifier and Bayesian classifier, are linear algorithms. The advantage of linear algorithms is their simplicity and ability to reduce feature dimensionalities. However, they have the limitation that the decision boundaries generated are linear and have little computational flexibility. SVM is a recently developed integrated pattern classification algorithm with non-linear formulation. It is based on the idea that the classification that a.ords dot-products can be computed efficiently in higher dimensional feature spaces. The classes which are not linearly separable in the original parametric space can be linearly separated in the higher dimensional feature space. Because of this, SVM has the advantage that it can handle the classes with complex nonlinear decision boundaries. However, SVM is a highly integrated and closed pattern classification system. It is very difficult to adopt feature extraction into SVMs framework. Thus SVM is unable to conduct feature extraction tasks. This thesis investigates LDA and PCA for feature extraction and dimensionality reduction and proposes the application of MCE training algorithms for joint feature extraction and classification tasks. A generalized MCE (GMCE) training algorithm is proposed to mend the shortcomings of the MCE training algorithms in joint feature and classification tasks. SVM, as a non-linear pattern classification system is also investigated in this thesis. A reduced-dimensional SVM (RDSVM) is proposed to enable SVM to conduct feature extraction and classification jointly. All of the investigated and proposed algorithms are tested and compared firstly on a number of small databases, such as Deterding Vowels Database, Fishers IRIS database and Germans GLASS database. Then they are tested in a large-scale speech recognition experiment based on TIMIT database.

APA, Harvard, Vancouver, ISO, and other styles

14

Harte, Naomi Antonia. "Segmental phonetic features and models for speech recognition." Thesis, Queen's University Belfast, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.287466.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Ma, Chengyuan. "A detection-based pattern recognition framework and its applications." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33889.

Full text

Abstract:

The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future.

APA, Harvard, Vancouver, ISO, and other styles

16

Chen, Xin. "Ensemble methods in large vocabulary continuous speech recognition." Diss., Columbia, Mo. : University of Missouri-Columbia, 2008. http://hdl.handle.net/10355/5797.

Full text

Abstract:

Thesis (M.S.)--University of Missouri-Columbia, 2008.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on August 28, 2008) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

17

Josifovski, Ljubomir. "Robust automatic speech recognition with missing and unreliable data." Thesis, University of Sheffield, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.275021.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Chong, Michael Wai Hing. "Subword units and parallel processing for automatic speech recognition." Thesis, University of Cambridge, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.335663.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Hewett, Andrew John. "Training and speaker adaptation in template-based speech recognition." Thesis, University of Cambridge, 1989. https://www.repository.cam.ac.uk/handle/1810/250961.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Sidorova, Julia. "Optimization techniques for speech emotion recognition." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7575.

Full text

Abstract:

Hay tres aspectos innovadores. Primero, un algoritmo novedoso para calcular el contenido emocional de un enunciado, con un diseño mixto que emplea aprendizaje estadístico e información sintáctica. Segundo, una extensión para selección de rasgos que permite adaptar los pesos y así aumentar la flexibilidad del sistema. Tercero, una propuesta para incorporar rasgos de alto nivel al sistema. Dichos rasgos, combinados con los rasgos de bajo nivel, permiten mejorar el rendimiento del sistema.
The first contribution of this thesis is a speech emotion recognition system called the ESEDA capable of recognizing emotions in di®erent languages. The second contribution is the classifier TGI+. First objects are modeled by means of a syntactic method and then, with a statistical method the mappings of samples are classified, not their feature vectors. The TGI+ outperforms the state of the art top performer on a benchmark data set of acted emotions. The third contribution is high-level features, which are distances from a feature vector to the tree automata accepting class i, for all i in the set of class labels. The set of low-level features and the set of high-level features are concatenated and the resulting set is submitted to the feature selection procedure. Then the classification step is done in the usual way. Testing on a benchmark dataset of authentic emotions showed that this classification strategy outperforms the state of the art top performer.

APA, Harvard, Vancouver, ISO, and other styles

21

Peng, Yong Kian. "Speech coding based on a pitch synchronous pattern recognition approach." Thesis, University of Ulster, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245804.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Khurshid, Azar. "Pitch estimation for noisy speech." Thesis, University of Plymouth, 2002. http://hdl.handle.net/10026.1/1692.

Full text

Abstract:

In this dissertation a biologically plausible system of pitch estimation is proposed. The system is designed from the bottom up to be robust to challenging noise conditions. This robustness to the presence of noise in the signal is achieved by developing a new representation of the speech signal, based on the operation of damped harmonic oscillators, and temporal mode analysis of their output. This resulting representation is shown to possess qualities which are not degraded in presence of noise. A harmonic grouping based system is used to estimate the pitch frequency. A detailed statistical analysis is performed on the system, and performance compared with some of the most established and recent pitch estimation and tracking systems. The detailed analysis includes results of experiments with a variety of noises with a large range of signal to noise ratios, under different signal conditions. Situations where the interfering "noise" is speech from another speaker are also considered. The proposed system is able to estimate the pitch of both the main speaker, and the interfering speaker, thus emulating the phenomena of auditory streaming and "cocktail party effect" in terms of pitch perception. The results of the extensive statistical analysis show that the proposed system exhibits some very interesting properties in its ability of handling noise. The results also show that the proposed system’s overall performance is much better than any of the other systems tested, especially in presence of very large amounts of noise. The system is also shown to successfully simulate some very interesting psychoacoustical pitch perception phenomena. Through a detailed and comparative computational requirements analysis, it is also demonstrated that the proposed system is comparatively inexpensive in terms of processing and memory requirements.

APA, Harvard, Vancouver, ISO, and other styles

23

Sundaram, Anand R. K. "Vowel recognition using Kohonen's self-organizing feature maps /." Online version of thesis, 1991. http://hdl.handle.net/1850/10710.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Delmege, James W. "CLASS : a study of methods for coarse phonetic classification /." Online version of thesis, 1988. http://hdl.handle.net/1850/10449.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Rao, Hrishikesh. "Paralinguistic event detection in children's speech." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54332.

Full text

Abstract:

Paralinguistic events are useful indicators of the affective state of a speaker. These cues, in children's speech, are used to form social bonds with their caregivers. They have also been found to be useful in the very early detection of developmental disorders such as autism spectrum disorder (ASD) in children's speech. Prior work on children's speech has focused on the use of a limited number of subjects which don't have sufficient diversity in the type of vocalizations that are produced. Also, the features that are necessary to understand the production of paralinguistic events is not fully understood. To account for the lack of an off-the-shelf solution to detect instances of laughter and crying in children's speech, the focus of the thesis is to investigate and develop signal processing algorithms to extract acoustic features and use machine learning algorithms on various corpora. Results obtained using baseline spectral and prosodic features indicate the ability of the combination of spectral, prosodic, and dysphonation-related features that are needed to detect laughter and whining in toddlers' speech with different age groups and recording environments. The use of long-term features were found to be useful to capture the periodic properties of laughter in adults' and children's speech and detected instances of laughter to a high degree of accuracy. Finally, the thesis focuses on the use of multi-modal information using acoustic features and computer vision-based smile-related features to detect instances of laughter and to reduce the instances of false positives in adults' and children's speech. The fusion of the features resulted in an improvement of the accuracy and recall rates than when using either of the two modalities on their own.

APA, Harvard, Vancouver, ISO, and other styles

26

Lee, Gareth E. "Multi-modal prediction and modelling using artificial neural networks." Thesis, University of East Anglia, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.293823.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Williams, Geoffrey. "The phonological basis of speech recognition." Thesis, SOAS, University of London, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.287748.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Hu, Rusheng. "Statistical optimization of acoustic models for large vocabulary speech recognition." Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/4329.

Full text

Abstract:

Thesis (Ph. D.) University of Missouri-Columbia, 2006.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on August 2, 2007) Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

29

Nag, Ronjan. "Speech and speaker recognition using hidden Markov models and vector quantisation." Thesis, University of Cambridge, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.383109.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Hanna, Philip James. "Improving speech recognition through statistical modelling of context and temporal dependency." Thesis, Queen's University Belfast, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.287627.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Chan, Dominic Sai Fan. "Speech production modelling based on glottal inverse filtering." Thesis, Imperial College London, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.307161.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Kounoudes, Anastasis. "Epoch estmimation for closed-phase analysis of speech." Thesis, Imperial College London, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.248213.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Scott, Simon David. "A data-driven approach to visual speech synthesis." Thesis, University of Bath, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.307116.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Devaney, Jason Wayne. "A study of articulatory gestures for speech synthesis." Thesis, University of Liverpool, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.284254.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Savvides, Vasos E. "Perceptual models in speech quality assessment and coding." Thesis, Loughborough University, 1988. https://dspace.lboro.ac.uk/2134/36273.

Full text

Abstract:

The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment.

APA, Harvard, Vancouver, ISO, and other styles

36

Lo, Ka-Yiu. "Pitch synchronous speech coding at very low bit rates." Thesis, University of Liverpool, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.321128.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Farsi, Hassan. "Advanced pre-and-post processing techniques for speech coding." Thesis, University of Surrey, 2003. http://epubs.surrey.ac.uk/844491/.

Full text

Abstract:

Advances in digital technology in the last decade have motivated the development of very efficient and high quality speech compression algorithms. While in the early low bit rate coding systems, the main target was the production of intelligible speech at low bit rates, expansion of new applications such as mobile satellite systems increased the demand for reducing the transmission bandwidth and achieving higher speech quality. This resulted in the development of efficient parametric models for speech production system. These models were the basis of powerful speech compression algorithms such as CELP, MBE, MELP and WI. The performance of a speech coder not only depends on the speech production model employed but also on the accurate estimation of speech parameters. Periodicity, also known as pitch, is one of the speech parameters that greatly affect the synthesised speech quality. Thus, the subject of pitch determination has attracted much research in the area of low bit rate coding. In these studies it is assumed that for a short segment of speech, called frame, the pitch is fixed or smoothly evolving. The pitch estimation algorithms generally fail to determine irregular variations, which can occur at onset and offset speech segments. In order to overcome this problem, a novel preprocessing method, which detects irregular pitch variations and modifies the speech signal such as to improve the accuracy of the pitch estimation, is proposed. This method results in more regular speech while maintaining perceptual speech quality. The perceptual quality of the synthesised speech may also be improved using postfiltering techniques. Conventional postfiltering methods generally consider the enhancement of the whole speech spectrum. This may result in the broadening of the first formant, which leads to the increase of quantisation noise for this formant. A new postfiltering technique, which is based on factorising the linear prediction synthesis filter, is proposed. This provides more control over the formant bandwidth and attenuation of spectral speech valleys. Key words: Pitch smoothing, speech pre-processor, postfiltering.

APA, Harvard, Vancouver, ISO, and other styles

38

Jeon, Woojay. "Speech Analysis and Cognition Using Category-Dependent Features in a Model of the Central Auditory System." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14061.

Full text

Abstract:

It is well known that machines perform far worse than humans in recognizing speech and audio, especially in noisy environments. One method of addressing this issue of robustness is to study physiological models of the human auditory system and to adopt some of its characteristics in computers. As a first step in studying the potential benefits of an elaborate computational model of the primary auditory cortex (A1) in the central auditory system, we qualitatively and quantitatively validate the model under existing speech processing recognition methodology. Next, we develop new insights and ideas on how to interpret the model, and reveal some of the advantages of its dimension-expansion that may be potentially used to improve existing speech processing and recognition methods. This is done by statistically analyzing the neural responses to various classes of speech signals and forming empirical conjectures on how cognitive information is encoded in a category-dependent manner. We also establish a theoretical framework that shows how noise and signal can be separated in the dimension-expanded cortical space. Finally, we develop new feature selection and pattern recognition methods to exploit the category-dependent encoding of noise-robust cognitive information in the cortical response. Category-dependent features are proposed as features that "specialize" in discriminating specific sets of classes, and as a natural way of incorporating them into a Bayesian decision framework, we propose methods to construct hierarchical classifiers that perform decisions in a two-stage process. Phoneme classification tasks using the TIMIT speech database are performed to quantitatively validate all developments in this work, and the results encourage future work in exploiting high-dimensional data with category(or class)-dependent features for improved classification or detection.

APA, Harvard, Vancouver, ISO, and other styles

39

Wieworka, Adam. "Speech recognition using Hidden Markov Models with exponential interpolation of state parameters." Thesis, Imperial College London, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.286612.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Balakrishnan, Sreeram Viswanath. "Solving combinatorial optimization problems using neural networks with applications in speech recognition." Thesis, University of Cambridge, 1992. https://www.repository.cam.ac.uk/handle/1810/283679.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Rochford, Matthew. "Visual Speech Recognition Using a 3D Convolutional Neural Network." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2109.

Full text

Abstract:

Main stream automatic speech recognition (ASR) makes use of audio data to identify spoken words, however visual speech recognition (VSR) has recently been of increased interest to researchers. VSR is used when audio data is corrupted or missing entirely and also to further enhance the accuracy of audio-based ASR systems. In this research, we present both a framework for building 3D feature cubes of lip data from videos and a 3D convolutional neural network (CNN) architecture for performing classification on a dataset of 100 spoken words, recorded in an uncontrolled envi- ronment. Our 3D-CNN architecture achieves a testing accuracy of 64%, comparable with recent works, but using an input data size that is up to 75% smaller. Overall, our research shows that 3D-CNNs can be successful in finding spatial-temporal features using unsupervised feature extraction and are a suitable choice for VSR-based systems.

APA, Harvard, Vancouver, ISO, and other styles

42

Purdy, Trevor. "A Dynamic Vocabulary Speech Recognizer Using Real-Time, Associative-Based Learning." Thesis, University of Waterloo, 2006. http://hdl.handle.net/10012/942.

Full text

Abstract:

Conventional speech recognizers employ a training phase during which many of their parameters are configured - including vocabulary selection, feature selection, and decision mechanism tailoring to these selections. After this stage during normal operation, these traditional recognizers do not significantly alter any of these parameters. Conversely this work draws heavily on high level human thought patterns and speech perception to outline a set of precepts to eliminate this training phase and instead opt to perform all its tasks during the normal operation. A feature space model is discussed to establish a set of necessary and sufficient conditions to guide real-time feature selection. Detailed implementation and preliminary results are also discussed. These results indicate that benefits of this approach can be seen in increased speech recognizer adaptability while still retaining competitive recognition rates in controlled environments. Thus this can accommodate such changes as varying vocabularies, class migration, and new speakers.

APA, Harvard, Vancouver, ISO, and other styles

43

Tuerk, Christine M. "Automatic speech synthesis using auditory transforms and artificial neural networks." Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.385362.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Altun, Halis. "Evaluation of neural learning in a MLP NN for an acoustic-to-articulatory mapping problem using different training pattern vector characteristics." Thesis, University of Nottingham, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.263405.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Moore, John Humphrey. "Digitizing human faces for the analysis and synthesis of visible speech." Thesis, Leeds Beckett University, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.277886.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Arriola, Yosu. "Integration of multi-layer perception and hidden Markov models for automatic speech recognition." Thesis, Staffordshire University, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.292239.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Jantan, Adznan Bin. "A comparative study of various analysis techniques for use in speech recognition systems." Thesis, Swansea University, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.292473.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Ravindran, Sourabh. "Physiologically Motivated Methods For Audio Pattern Classification." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14066.

Full text

Abstract:

Human-like performance by machines in tasks of speech and audio processing has remained an elusive goal. In an attempt to bridge the gap in performance between humans and machines there has been an increased effort to study and model physiological processes. However, the widespread use of biologically inspired features proposed in the past has been hampered mainly by either the lack of robustness across a range of signal-to-noise ratios or the formidable computational costs. In physiological systems, sensor processing occurs in several stages. It is likely the case that signal features and biological processing techniques evolved together and are complementary or well matched. It is precisely for this reason that modeling the feature extraction processes should go hand in hand with modeling of the processes that use these features. This research presents a front-end feature extraction method for audio signals inspired by the human peripheral auditory system. New developments in the field of machine learning are leveraged to build classifiers to maximize the performance gains afforded by these features. The structure of the classification system is similar to what might be expected in physiological processing. Further, the feature extraction and classification algorithms can be efficiently implemented using the low-power cooperative analog-digital signal processing platform. The usefulness of the features is demonstrated for tasks of audio classification, speech versus non-speech discrimination, and speech recognition. The low-power nature of the classification system makes it ideal for use in applications such as hearing aids, hand-held devices, and surveillance through acoustic scene monitoring

APA, Harvard, Vancouver, ISO, and other styles

49

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing." Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.

Full text

Abstract:

Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.

APA, Harvard, Vancouver, ISO, and other styles

50

Combrinck, Hendrik Petrus. "A cost, complexity and performance comparison of two automatic language identification architectures." Pretoria : [s.n.], 2006. http://upetd.up.ac.za/thesis/available/etd-12212006-141335/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Pattern recognition, speech recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles