Log in

Relevant bibliographies by topics / Speech processing systems. Pattern recognition systems / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Speech processing systems. Pattern recognition systems.

Dissertations / Theses on the topic 'Speech processing systems. Pattern recognition systems'

Author: Grafiati

Published: 4 June 2021

Last updated: 7 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Speech processing systems. Pattern recognition systems.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Alphonso, Issac John. "Network training for continuous speech recognition." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-10252003-105104.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Combrinck, Hendrik Petrus. "A cost, complexity and performance comparison of two automatic language identification architectures." Pretoria : [s.n.], 2006. http://upetd.up.ac.za/thesis/available/etd-12212006-141335/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Sundaram, Anand R. K. "Vowel recognition using Kohonen's self-organizing feature maps /." Online version of thesis, 1991. http://hdl.handle.net/1850/10710.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Sukittanon, Somsak. "Modulation scale analysis : theory and application for nonstationary signal classification /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/5875.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Chen, Xin. "Ensemble methods in large vocabulary continuous speech recognition." Diss., Columbia, Mo. : University of Missouri-Columbia, 2008. http://hdl.handle.net/10355/5797.

Full text

Abstract:

Thesis (M.S.)--University of Missouri-Columbia, 2008.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on August 28, 2008) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

6

Jantan, Adznan Bin. "A comparative study of various analysis techniques for use in speech recognition systems." Thesis, Swansea University, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.292473.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Xue, Jian. "Improvement of decoding engine & phonetic decision tree in acoustic modeling for online large vocabulary conversational speech recognition." Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/4821.

Full text

Abstract:

Thesis (Ph. D.)--University of Missouri-Columbia, 2007.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on March 4, 2008) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

8

Chiou, Greg I. "Active contour models for distinct feature tracking and lipreading /." Thesis, Connect to this title online; UW restricted, 1995. http://hdl.handle.net/1773/6023.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Ravindran, Sourabh. "Physiologically Motivated Methods For Audio Pattern Classification." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14066.

Full text

Abstract:

Human-like performance by machines in tasks of speech and audio processing has remained an elusive goal. In an attempt to bridge the gap in performance between humans and machines there has been an increased effort to study and model physiological processes. However, the widespread use of biologically inspired features proposed in the past has been hampered mainly by either the lack of robustness across a range of signal-to-noise ratios or the formidable computational costs. In physiological systems, sensor processing occurs in several stages. It is likely the case that signal features and biological processing techniques evolved together and are complementary or well matched. It is precisely for this reason that modeling the feature extraction processes should go hand in hand with modeling of the processes that use these features. This research presents a front-end feature extraction method for audio signals inspired by the human peripheral auditory system. New developments in the field of machine learning are leveraged to build classifiers to maximize the performance gains afforded by these features. The structure of the classification system is similar to what might be expected in physiological processing. Further, the feature extraction and classification algorithms can be efficiently implemented using the low-power cooperative analog-digital signal processing platform. The usefulness of the features is demonstrated for tasks of audio classification, speech versus non-speech discrimination, and speech recognition. The low-power nature of the classification system makes it ideal for use in applications such as hearing aids, hand-held devices, and surveillance through acoustic scene monitoring

APA, Harvard, Vancouver, ISO, and other styles

10

Du, Toit A. (Andre). "Automatic classification of spoken South African English variants using a transcription-less speech recognition approach." Thesis, Stellenbosch : Stellenbosch University, 2004. http://hdl.handle.net/10019.1/49866.

Full text

Abstract:

Thesis (MEng)--University of Stellenbosch, 2004.
ENGLISH ABSTRACT: We present the development of a pattern recognition system which is capable of classifying different Spoken Variants (SVs) of South African English (SAE) using a transcriptionless speech recognition approach. Spoken Variants (SVs) allow us to unify the linguistic concepts of accent and dialect from a pattern recognition viewpoint. The need for the SAE SV classification system arose from the multi-linguality requirement for South African speech recognition applications and the costs involved in developing such applications.
AFRIKAANSE OPSOMMING: Ons beskryf die ontwikkeling van 'n patroon herkenning stelsel wat in staat is om verskillende Gesproke Variante (GVe) van Suid Afrikaanse Engels (SAE) te klassifiseer met behulp van 'n transkripsielose spraak herkenning metode. Gesproke Variante (GVe) stel ons in staat om die taalkundige begrippe van aksent en dialek te verenig vanuit 'n patroon her kenning oogpunt. Die behoefte aan 'n SAE GV klassifikasie stelsel het ontstaan uit die meertaligheid vereiste vir Suid Afrikaanse spraak herkenning stelsels en die koste verbonde aan die ontwikkeling van sodanige stelsels.

APA, Harvard, Vancouver, ISO, and other styles

11

Little, M. A. "Biomechanically informed nonlinear speech signal processing." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.

Full text

Abstract:

Linear digital signal processing based around linear, time-invariant systems theory finds substantial application in speech processing. The linear acoustic source-filter theory of speech production provides ready biomechanical justification for using linear techniques. Nonetheless, biomechanical studies surveyed in this thesis display significant nonlinearity and non-Gaussinity, casting doubt on the linear model of speech production. In order therefore to test the appropriateness of linear systems assumptions for speech production, surrogate data techniques can be used. This study uncovers systematic flaws in the design and use of exiting surrogate data techniques, and, by making novel improvements, develops a more reliable technique. Collating the largest set of speech signals to-date compatible with this new technique, this study next demonstrates that the linear assumptions are not appropriate for all speech signals. Detailed analysis shows that while vowel production from healthy subjects cannot be explained within the linear assumptions, consonants can. Linear assumptions also fail for most vowel production by pathological subjects with voice disorders. Combining this new empirical evidence with information from biomechanical studies concludes that the most parsimonious model for speech production, explaining all these findings in one unified set of mathematical assumptions, is a stochastic nonlinear, non-Gaussian model, which subsumes both Gaussian linear and deterministic nonlinear models. As a case study, to demonstrate the engineering value of nonlinear signal processing techniques based upon the proposed biomechanically-informed, unified model, the study investigates the biomedical engineering application of disordered voice measurement. A new state space recurrence measure is devised and combined with an existing measure of the fractal scaling properties of stochastic signals. Using a simple pattern classifier these two measures outperform all combinations of linear methods for the detection of voice disorders on a large database of pathological and healthy vowels, making explicit the effectiveness of such biomechanically-informed, nonlinear signal processing techniques.

APA, Harvard, Vancouver, ISO, and other styles

12

Yaman, Sibel. "A multi-objective programming perspective to statistical learning problems." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26470.

Full text

Abstract:

Thesis (Ph.D)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Chin-Hui Lee; Committee Member: Anthony Yezzi; Committee Member: Evans Harrell; Committee Member: Fred Juang; Committee Member: James H. McClellan. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

13

Hawkins, Mikhel E. "High speed target tracking using Kalman filter and partial window imaging." Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/16709.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Wong, Ing Hoo. "Design of a realtime high speed recognizer for unconstrained handprinted alphanumeric characters." Thesis, University of British Columbia, 1985. http://hdl.handle.net/2429/25135.

Full text

Abstract:

This thesis presents the design of a recognizer for unconstrained handprinted alphanumeric characters. The design is based on a thinning process that is capable of producing thinned images with well defined features that are considered essential for character image description and recognition. By choosing the topological points of the thinned ('line') character image as these desired features, the thinning process achieves not only a high degree of data reduction but also transforms a binary image into a discrete form of line drawing that can be represented by graphs. As a result powerful graphical analysis techniques can be applied to analyze and classify the image. The image classification is performed in two stages. Firstly, a technique for identifying the topological points in the thinned image is developed. These topological points represent the global features of the image and because of their invariance to elastic deformations, they are used for image preclassification. Preclassification results in a substantial reduction in the entropy of the input image. The subsequent process can concentrate only on the differentiation of images that are topologically equivalent. In the preclassifier simple logic operations localized to the immediate neighbourhood of each pixel are used. These operations are also highly independent and easy to implement using VLSI. A graphical technique for image extraction and representation called the chain coded digraph representation is introduced. The technique uses global features such as nodes and the Freeman's chain codes for digital curves as branches. The chain coded digraph contains all the information that is present in the thinned image. This avoids using the image feature extraction approach for image description and data reduction (a difficult process to optimize) without sacrificing speed or complexity. After preclassification, a second stage of the recognition process analyses the chain coded digraph using the concept of attributed relational graph (ARG). ARG representation of the image can be obtained readily through simple transformations or rewriting rules from the chain coded digraph. The ARG representation of an image describes the shape primitives in the image and their relationships. Final classification of the input image can be made by comparing its ARG with the ARGs of known characters. The final classification involves only the comparison of ARGs of a predetermined topology. This information is crucial to the design of a matching algorithm called the reference guided inexact matching procedure, designed for high speed matching of character image ARGs. This graph matching procedure is shown to be much faster than other conventional graph matching procedures. The designed recognizer is implemented in Pascal on the PDP11/23 and VAX 11/750 computer. Test using Munson's data shows a high recognition rate of 91.46%. However, the recognizer is designed with the aim of an eventual implementation using VLSI and also as a basic recognizer for further research in reading machines. Therefore its full potential is yet to be realized. Nevertheless, the experiments with Munson's data illustrates the effectiveness of the design approach and the advantages it offers as a basic system for future research.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

15

Theunissen, M. W. (Marthinus Wilhelmus). "Phonene-based topic spotting on the switchboard corpus." Thesis, Stellenbosch : Stellenbosch University, 2002. http://hdl.handle.net/10019.1/52998.

Full text

Abstract:

Thesis (MScEng)--Stellenbosch University, 2002.
ENGLISH ABSTRACT: The field of topic spotting in conversational speech deals with the problem of identifying "interesting" conversations or speech extracts contained within large volumes of speech data. Typical applications where the technology can be found include the surveillance and screening of messages before referring to human operators. Closely related methods can also be used for data-mining of multimedia databases, literature searches, language identification, call routing and message prioritisation. The first topic spotting systems used words as the most basic units. However, because of the poor performance of speech recognisers, a large amount of topic-specific hand-transcribed training data is needed. It is for this reason that researchers started concentrating on methods using phonemes instead, because the errors then occur on smaller, and therefore less important, units. Phoneme-based methods consequently make it feasible to use computer generated transcriptions as training data. Building on word-based methods, a number of phoneme-based systems have emerged. The two most promising ones are the Euclidean Nearest Wrong Neighbours (ENWN) algorithm and the newly developed Stochastic Method for the Automatic Recognition of Topics (SMART). Previous experiments on the Oregon Graduate Institute of Science and Technology's Multi-Language Telephone Speech Corpus suggested that SMART yields a large improvement over ENWN which outperformed competing phoneme-based systems in evaluations. However, the small amount of data available for these experiments meant that more rigorous testing was required. In this research, the algorithms were therefore re-implemented to run on the much larger Switchboard Corpus. Subsequently, a substantial improvement of SMART over ENWN was observed, confirming the result that was previously obtained. In addition to this, an investigation was conducted into the improvement of SMART. This resulted in a new counting strategy with a corresponding improvement in performance.
AFRIKAANSE OPSOMMING: Die veld van onderwerp-herkenning in spraak het te doen met die probleem om "interessante" gesprekke of spraaksegmente te identifiseer tussen groot hoeveelhede spraakdata. Die tegnologie word tipies gebruik om gesprekke te verwerk voor dit verwys word na menslike operateurs. Verwante metodes kan ook gebruik word vir die ontginning van data in multimedia databasisse, literatuur-soektogte, taal-herkenning, oproep-kanalisering en boodskap-prioritisering. Die eerste onderwerp-herkenners was woordgebaseerd, maar as gevolg van die swak resultate wat behaal word met spraak-herkenners, is groot hoeveelhede hand-getranskribeerde data nodig om sulke stelsels af te rig. Dit is om hierdie rede dat navorsers tans foneemgebaseerde benaderings verkies, aangesien die foute op kleiner, en dus minder belangrike, eenhede voorkom. Foneemgebaseerde metodes maak dit dus moontlik om rekenaargegenereerde transkripsies as afrigdata te gebruik. Verskeie foneemgebaseerde stelsels het verskyn deur voort te bou op woordgebaseerde metodes. Die twee belowendste stelsels is die "Euclidean Nearest Wrong Neighbours" (ENWN) algoritme en die nuwe "Stochastic Method for the Automatic Recognition of Topics" (SMART). Vorige eksperimente op die "Oregon Graduate Institute of Science and Technology's Multi-Language Telephone Speech Corpus" het daarop gedui dat die SMART algoritme beter vaar as die ENWN-stelsel wat ander foneemgebaseerde algoritmes geklop het. Die feit dat daar te min data beskikbaar was tydens die eksperimente het daarop gedui dat strenger toetse nodig was. Gedurende hierdie navorsing is die algoritmes dus herimplementeer sodat eksperimente op die "Switchboard Corpus" uitgevoer kon word. Daar is vervolgens waargeneem dat SMART aansienlik beter resultate lewer as ENWN en dit het dus die geldigheid van die vorige resultate bevestig. Ter aanvulling hiervan, is 'n ondersoek geloods om SMART te probeer verbeter. Dit het tot 'n nuwe telling-strategie gelei met 'n meegaande verbetering in resultate.

APA, Harvard, Vancouver, ISO, and other styles

16

Morris, Robert W. "Enhancement and recognition of whispered speech." Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04082004-180338/unrestricted/morris%5frobert%5fw%5f200312%5fphd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

YOUSSIF, ROSHDY S. "HYBRID INTELLIGENT SYSTEMS FOR PATTERN RECOGNITION AND SIGNAL PROCESSING." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1085714219.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Silvestre, Cerdà Joan Albert. "Different Contributions to Cost-Effective Transcription and Translation of Video Lectures." Doctoral thesis, Universitat Politècnica de València, 2016. http://hdl.handle.net/10251/62194.

Full text

Abstract:

[EN] In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions.
[ES] Durante estos últimos años, los repositorios multimedia on-line han experimentado un gran crecimiento que les ha hecho establecerse como fuentes fundamentales de conocimiento, especialmente en el área de la educación, donde se han creado grandes repositorios de vídeo charlas educativas para complementar e incluso reemplazar los métodos de enseñanza tradicionales. No obstante, la mayoría de estas charlas no están transcritas ni traducidas debido a la ausencia de soluciones de bajo coste que sean capaces de hacerlo garantizando una calidad mínima aceptable. Soluciones de este tipo son claramente necesarias para hacer que las vídeo charlas sean más accesibles para hablantes de otras lenguas o para personas con discapacidades auditivas. Además, dichas soluciones podrían facilitar la aplicación de funciones de búsqueda y de análisis tales como clasificación, recomendación o detección de plagios, así como el desarrollo de funcionalidades educativas avanzadas, como por ejemplo la generación de resúmenes automáticos de contenidos para ayudar al estudiante a tomar apuntes. Por este motivo, el principal objetivo de esta tesis es desarrollar una solución de bajo coste capaz de transcribir y traducir vídeo charlas con un nivel de calidad razonable. Más específicamente, abordamos la integración de técnicas estado del arte de Reconocimiento del Habla Automático y Traducción Automática en grandes repositorios de vídeo charlas educativas para la generación de subtítulos multilingües de alta calidad sin requerir intervención humana y con un reducido coste computacional. Además, también exploramos los beneficios potenciales que conllevaría la explotación de la información de la que disponemos a priori sobre estos repositorios, es decir, conocimientos específicos sobre las charlas tales como el locutor, la temática o las transparencias, para crear sistemas de transcripción y traducción especializados mediante técnicas de adaptación masiva. Las soluciones propuestas en esta tesis han sido testeadas en escenarios reales llevando a cabo nombrosas evaluaciones objetivas y subjetivas, obteniendo muy buenos resultados. El principal legado de esta tesis, The transLectures-UPV Platform, ha sido liberado públicamente como software de código abierto, y, en el momento de escribir estas líneas, está sirviendo transcripciones y traducciones automáticas para diversos miles de vídeo charlas educativas en nombrosas universidades e instituciones Españolas y Europeas.
[CAT] Durant aquests darrers anys, els repositoris multimèdia on-line han experimentat un gran creixement que els ha fet consolidar-se com a fonts fonamentals de coneixement, especialment a l'àrea de l'educació, on s'han creat grans repositoris de vídeo xarrades educatives per tal de complementar o inclús reemplaçar els mètodes d'ensenyament tradicionals. No obstant això, la majoria d'aquestes xarrades no estan transcrites ni traduïdes degut a l'absència de solucions de baix cost capaces de fer-ho garantint una qualitat mínima acceptable. Solucions d'aquest tipus són clarament necessàries per a fer que les vídeo xarres siguen més accessibles per a parlants d'altres llengües o per a persones amb discapacitats auditives. A més, aquestes solucions podrien facilitar l'aplicació de funcions de cerca i d'anàlisi tals com classificació, recomanació o detecció de plagis, així com el desenvolupament de funcionalitats educatives avançades, com per exemple la generació de resums automàtics de continguts per ajudar a l'estudiant a prendre anotacions. Per aquest motiu, el principal objectiu d'aquesta tesi és desenvolupar una solució de baix cost capaç de transcriure i traduir vídeo xarrades amb un nivell de qualitat raonable. Més específicament, abordem la integració de tècniques estat de l'art de Reconeixement de la Parla Automàtic i Traducció Automàtica en grans repositoris de vídeo xarrades educatives per a la generació de subtítols multilingües d'alta qualitat sense requerir intervenció humana i amb un reduït cost computacional. A més, també explorem els beneficis potencials que comportaria l'explotació de la informació de la que disposem a priori sobre aquests repositoris, és a dir, coneixements específics sobre les xarrades tals com el locutor, la temàtica o les transparències, per a crear sistemes de transcripció i traducció especialitzats mitjançant tècniques d'adaptació massiva. Les solucions proposades en aquesta tesi han estat testejades en escenaris reals duent a terme nombroses avaluacions objectives i subjectives, obtenint molt bons resultats. El principal llegat d'aquesta tesi, The transLectures-UPV Platform, ha sigut alliberat públicament com a programari de codi obert, i, en el moment d'escriure aquestes línies, està servint transcripcions i traduccions automàtiques per a diversos milers de vídeo xarrades educatives en nombroses universitats i institucions Espanyoles i Europees.
Silvestre Cerdà, JA. (2016). Different Contributions to Cost-Effective Transcription and Translation of Video Lectures [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62194
TESIS

APA, Harvard, Vancouver, ISO, and other styles

19

Kani, Bijan. "Enhanced logical adaptive systems for image processing and pattern recognition." Thesis, Brunel University, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358406.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Wilson, Shawn C. "Voice recognition systems : assessment of implementation aboard U.S. naval ships." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FWilson.pdf.

Full text

Abstract:

Thesis (M.S. in Information Systems and Operations)--Naval Postgraduate School, March 2003.
Thesis advisor(s): Michael T. McMaster, Kenneth J. Hagan. Includes bibliographical references (p. 47-49). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

21

Müller, J. J. "USB telephony interface device for speech recognition applications /." Link to the online version, 2005. http://hdl.handle.net/10019/1127.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Jeon, Woojay. "Speech Analysis and Cognition Using Category-Dependent Features in a Model of the Central Auditory System." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14061.

Full text

Abstract:

It is well known that machines perform far worse than humans in recognizing speech and audio, especially in noisy environments. One method of addressing this issue of robustness is to study physiological models of the human auditory system and to adopt some of its characteristics in computers. As a first step in studying the potential benefits of an elaborate computational model of the primary auditory cortex (A1) in the central auditory system, we qualitatively and quantitatively validate the model under existing speech processing recognition methodology. Next, we develop new insights and ideas on how to interpret the model, and reveal some of the advantages of its dimension-expansion that may be potentially used to improve existing speech processing and recognition methods. This is done by statistically analyzing the neural responses to various classes of speech signals and forming empirical conjectures on how cognitive information is encoded in a category-dependent manner. We also establish a theoretical framework that shows how noise and signal can be separated in the dimension-expanded cortical space. Finally, we develop new feature selection and pattern recognition methods to exploit the category-dependent encoding of noise-robust cognitive information in the cortical response. Category-dependent features are proposed as features that "specialize" in discriminating specific sets of classes, and as a natural way of incorporating them into a Bayesian decision framework, we propose methods to construct hierarchical classifiers that perform decisions in a two-stage process. Phoneme classification tasks using the TIMIT speech database are performed to quantitatively validate all developments in this work, and the results encourage future work in exploiting high-dimensional data with category(or class)-dependent features for improved classification or detection.

APA, Harvard, Vancouver, ISO, and other styles

23

Dobie, Mark Ralph. "Motion analysis in multimedia systems." Thesis, University of Southampton, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.359240.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Styne, Bruce Alan. "Management systems for computer graphics." Thesis, University of Cambridge, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.303247.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Neville, Katrina Lee, and katrina neville@rmit edu au. "Channel Compensation for Speaker Recognition Systems." RMIT University. Electrical and Computer Engineering, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080514.093453.

Full text

Abstract:

This thesis attempts to address the problem of how best to remedy different types of channel distortions on speech when that speech is to be used in automatic speaker recognition and verification systems. Automatic speaker recognition is when a person's voice is analysed by a machine and the person's identity is worked out by the comparison of speech features to a known set of speech features. Automatic speaker verification is when a person claims an identity and the machine determines if that claimed identity is correct or whether that person is an impostor. Channel distortion occurs whenever information is sent electronically through any type of channel whether that channel is a basic wired telephone channel or a wireless channel. The types of distortion that can corrupt the information include time-variant or time-invariant filtering of the information or the addition of 'thermal noise' to the information, both of these types of distortion can cause varying degrees of error in information being received and analysed. The experiments presented in this thesis investigate the effects of channel distortion on the average speaker recognition rates and testing the effectiveness of various channel compensation algorithms designed to mitigate the effects of channel distortion. The speaker recognition system was represented by a basic recognition algorithm consisting of: speech analysis, extraction of feature vectors in the form of the Mel-Cepstral Coefficients, and a classification part based on the minimum distance rule. Two types of channel distortion were investigated: Convolutional (or lowpass filtering) effects Addition of white Gaussian noise Three different methods of channel compensation were tested: Cepstral Mean Subtraction (CMS) RelAtive SpecTrAl (RASTA) Processing Constant Modulus Algorithm (CMA) The results from the experiments showed that for both CMS and RASTA processing that filtering at low cutoff frequencies, (3 or 4 kHz), produced improvements in the average speaker recognition rates compared to speech with no compensation. The levels of improvement due to RASTA processing were higher than the levels achieved due to the CMS method. Neither the CMS or RASTA methods were able to improve accuracy of the speaker recognition system for cutoff frequencies of 5 kHz, 6 kHz or 7 kHz. In the case of noisy speech all methods analysed were able to compensate for high SNR of 40 dB and 30 dB and only RASTA processing was able to compensate and improve the average recognition rate for speech corrupted with a high level of noise (SNR of 20 dB and 10 dB).

APA, Harvard, Vancouver, ISO, and other styles

26

Lai, Yiu Pong. "Maximum likelihood normalization for robust speech recognition /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20LAI.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 98-103). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

27

Li, Chak Fai. "Improved polynomial segment model for speech recognition /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20LI.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 80-84). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

28

Wanderley, Juliana Fernandes Camapum. "Colour-based recognition for remote sensing in environmental systems." Thesis, Coventry University, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266844.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Giles, Paul A. "Iterated function systems and shape representation." Thesis, Durham University, 1990. http://etheses.dur.ac.uk/6188/.

Full text

Abstract:

We propose the use of iterated function systems as an isomorphic shape representation scheme for use in a machine vision environment. A concise description of the basic theory and salient characteristics of iterated function systems is presented and from this we develop a formal framework within which to embed a representation scheme. Concentrating on the problem of obtaining automatically generated two-dimensional encodings we describe implementations of two solutions. The first is based on a deterministic algorithm and makes simplifying assumptions which limit its range of applicability. The second employs a novel formulation of a genetic algorithm and is intended to function with general data input. Keywords: Machine Vision, Shape Representation, Iterated Function Systems, Genetic Algorithms.

APA, Harvard, Vancouver, ISO, and other styles

30

Reynolds, Graham J. "Configurable graphics systems : modelling and specification." Thesis, University of East Anglia, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.293731.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Zahedi, Fariborz. "A systems approach to image segmentation." Thesis, University of Brighton, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260978.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Adami, André Gustavo. "Modeling prosodic differences for speaker and language recognition /." Full text open access at:, 2004. http://content.ohsu.edu/u?/etd,19.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Lee, Spencer Jaehoon Gilbert Juan E. "Post-speech-recognition processiing in domain-specific text-corpus-based distributed listening system analysis, interpretation and selection of speech recognition results /." Auburn, Ala., 2006. http://repo.lib.auburn.edu/2006%20Summer/Theses/LEE_SPENCER_7.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Rideout, Robert Martin. "Coded imaging systems for X-ray astronomy." Thesis, University of Birmingham, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.364854.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Tai, Anthony. "Perceptual grouping and knowledge-based vision systems." Thesis, University of Surrey, 1997. http://epubs.surrey.ac.uk/844407/.

Full text

Abstract:

One of the goals in computer vision is to interpret scene objects and establish relationships between them. One of tire problems associated with this task is that the image to be interpreted and the objects to be recognised correspond to different levels of information. The image is, on the one hand, represented as a collection of pixels in which three-dimensional information is transformed into two-dimensional one under perspective projection dictated by the camera position as well as photometric parameters such as focal length etc. On the other hand, the object is represented as a collection of three-dimensional structures and relations between them. These rather different representations highlighted the need to construct an intermediate-level representation which can facilitate the accomplishment of the goal of establishing correspondence between image features and scene objects. The complexity of the interpretation task is further compounded by image imperfections caused by lighting, total reflectance, surface markings, accidental viewpoints and so on. The problems highlighted earlier motivated the development of a novel feature grouping framework which takes into account feature stability and the underlying noise. This work advanced the state of the art in perceptual group extraction as the existing techniques tend to be ad hoc. Built upon the framework that we have established we developed the computational representation of higher level features such as junctions, collinear line and parallel line groupings. The low level feature representation and extraction phases of the work were the necessary prerequisites for the extraction of intermediate representations using AI techniques. These representations serve as visual cues in our role-based system (RBS) to classify runways/taxiways in most of the DRA supplied imagery captured from unknown viewpoints. Complexity problems reported in previous work on RBS for low and intermediate level vision tasks are apparently overcome by identifying a set of prioritised feature cues, uncertainties are handled by hypothesis generation and hypothesis verification, and the method can be regarded as a constrained search through the space of candidate hypotheses.

APA, Harvard, Vancouver, ISO, and other styles

36

Au, Wing Hei. "Improved acoustic model training for speech recognition and verification /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20AU.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 81-86). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

37

Cummings, Kathleen E. "Analysis, synthesis, and recognition of stressed speech." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15673.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Ng, Kwong Tim. "Exploring Chinese linguistic characteristics for speech recognition /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202005%20NGK.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Liu, Yi. "Pronunciation modeling for spontaneous mandarin speech recognition /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202002%20LIU.

Full text

Abstract:

Thesis (Ph. D.)--Hong Kong University of Science and Technology, 2002.
Includes bibliographical references (leaves 169-177). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

40

Al-Darkazali, Mohammed. "Image processing methods to segment speech spectrograms for word level recognition." Thesis, University of Sussex, 2017. http://sro.sussex.ac.uk/id/eprint/71675/.

Full text

Abstract:

The ultimate goal of automatic speech recognition (ASR) research is to allow a computer to recognize speech in real-time, with full accuracy, independent of vocabulary size, noise, speaker characteristics or accent. Today, systems are trained to learn an individual speaker's voice and larger vocabularies statistically, but accuracy is not ideal. A small gap between actual speech and acoustic speech representation in the statistical mapping causes a failure to produce a match of the acoustic speech signals by Hidden Markov Model (HMM) methods and consequently leads to classification errors. Certainly, these errors in the low level recognition stage of ASR produce unavoidable errors at the higher levels. Therefore, it seems that ASR additional research ideas to be incorporated within current speech recognition systems. This study seeks new perspective on speech recognition. It incorporates a new approach for speech recognition, supporting it with wider previous research, validating it with a lexicon of 533 words and integrating it with a current speech recognition method to overcome the existing limitations. The study focusses on applying image processing to speech spectrogram images (SSI). We, thus develop a new writing system, which we call the Speech-Image Recogniser Code (SIR-CODE). The SIR-CODE refers to the transposition of the speech signal to an artificial domain (the SSI) that allows the classification of the speech signal into segments. The SIR-CODE allows the matching of all speech features (formants, power spectrum, duration, cues of articulation places, etc.) in one process. This was made possible by adding a Realization Layer (RL) on top of the traditional speech recognition layer (based on HMM) to check all sequential phones of a word in single step matching process. The study shows that the method gives better recognition results than HMMs alone, leading to accurate and reliable ASR in noisy environments. Therefore, the addition of the RL for SSI matching is a highly promising solution to compensate for the failure of HMMs in low level recognition. In addition, the same concept of employing SSIs can be used for whole sentences to reduce classification errors in HMM based high level recognition. The SIR-CODE bridges the gap between theory and practice of phoneme recognition by matching the SSI patterns at the word level. Thus, it can be adapted for dynamic time warping on the SIR-CODE segments, which can help to achieve ASR, based on SSI matching alone.

APA, Harvard, Vancouver, ISO, and other styles

41

Van, der Walt Craig. "An investigation into the practical implementation of speech recognition for data capturing." Thesis, Cape Technikon, 1993. http://hdl.handle.net/20.500.11838/1156.

Full text

Abstract:

Thesis (Master Diploma (Technology))--Cape Technikon, Cape Town,1993
A study into the practical implementation of Speech Recognition for the purposes of Data Capturing within Telkom SA. is described. As datacapturing is increasing in demand a more efficient method of capturing is sought. The technology relating to Speech recognition is herein examined and practical gnidelines for selecting a Speech recognition system are described. These guidelines are used to show how commercially available systems can be evaluated. Specific tests on a selected speech recognition system are described, relating to the accuracy and adaptability of the system. The results obtained illustrate why at present speech recognition systems are not advisable for the purpose of Data capturing. The results also demonstrate how the selection of keywords words can affect system performance. Areas of further research are highlighted relating to recognition performance and vocabulary selection.

APA, Harvard, Vancouver, ISO, and other styles

42

Nanka-Bruce, Oona. "Some computer aided design methods for nonlinear control systems." Thesis, University of Sussex, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.252934.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Chu, Kam Keung. "Feature extraction based on perceptual non-uniform spectral compression for noisy speech recognition /." access full-text access abstract and table of contents, 2005. http://libweb.cityu.edu.hk/cgi-bin/ezdb/thesis.pl?mphil-ee-b19887516a.pdf.

Full text

Abstract:

Thesis (M.Phil.)--City University of Hong Kong, 2005.
"Submitted to Department of Electronic Engineering in partial fulfillment of the requirements for the degree of Master of Philosophy" Includes bibliographical references (leaves 143-147)

APA, Harvard, Vancouver, ISO, and other styles

44

Hu, Rusheng. "Statistical optimization of acoustic models for large vocabulary speech recognition." Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/4329.

Full text

Abstract:

Thesis (Ph. D.) University of Missouri-Columbia, 2006.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on August 2, 2007) Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

45

Johnson, Joanna. "The effectiveness of voice recognition technology as used by persons with disabilities." Online version, 1998. http://www.uwstout.edu/lib/thesis/1998/1998johnsonj.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Ma, Chengyuan. "A detection-based pattern recognition framework and its applications." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33889.

Full text

Abstract:

The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future.

APA, Harvard, Vancouver, ISO, and other styles

47

Hobbs, Mike. "Genetic algorithms for spatial data analysis in geographical information systems." Thesis, University of Kent, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.262636.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Wasmeier, Hans. "Development of tests and preprocessing algorithms for evaluation and improvement of speech recognition units." Thesis, University of British Columbia, 1986. http://hdl.handle.net/2429/26750.

Full text

Abstract:

This study considered the evaluation of commercially available isolated word, speaker dependent, speech recognition units, and preprocessing techniques that may be used for improving their performance. The problem was considered in three separate stages. A series of tests were designed to exercise an isolated word, speaker dependent, speech recognition unit. These tests provided a sound basis for determining a given unit's strengths and weaknesses. This knowledge permits a more informed decision on the best recognition device for a given price range. As well, this knowledge may be used in the design of a robust vocabulary, and creation of guidelines for best performance. The test vocabularies were based on the forty English phonemes identified by Rabiner and Schafer [28] and the test variations were representative of common variations which may be expected in normal use. A digital archive system was implemented for storing the voice input of test subjects. This facility provided a data base for an investigation of preprocessing techniques. As well, it permits the testing of different speech recognition units with the same voice input, providing a platform for device comparison. Several speech preprocessing and performance improvement techniques were then investigated. Specifically, two types of time normalization, the enhancement of low energy phonemes and a change in training technique were investigated. These techniques permit a more accurate analysis of the failure mechanism of the speech recognition unit. They may also provide the basis for a speech preprocessor design which could be placed in front of a commercial speech recognition unit. A commercially available speech recognition unit, the NEC SR100, was used as a measure of the effectiveness of the tests and of the improvements. Results of the study indicated that the designed tests and the preprocessing & performance improvement techniques investigated were useful in identifying the speech recognition unit's weaknesses. Also, depending on the economics of implementation, it was found that preprocessing may provide a cost effective solution to some of the recognition unit's shortcomings.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

49

Ross, Philip. "Network accessible parallel computing systems, based upon transputers, for image processing strategies." Thesis, University of Aberdeen, 1993. http://digitool.abdn.ac.uk/R?func=search-advanced-go&find_code1=WSN&request1=AAIU059635.

Full text

Abstract:

Over the last decade there has been a steady increase in the size of primary data sets collected from medical imaging devices, and a correspondingly increased requirement for the computational power needed for associated image processing techniques. Although conventional processors have shown considerable advances throughout this period, they have failed to keep pace with the demands placed upon them by clinicians keen to utilise techniques such as pseudo three dimensional volume image presentation and high speed dynamic display of multiple frames of data. One solution that has the capability to meet these needs is to use multiple processors, co-operating to solve specified tasks using parallel processing. This thesis, which reports work undertaken during the period 1988-1991, shows how a network accessible parallel computing resource can provide an effective solution to these classes of problems. Starting from the premise that any generally accessible array of processors has to be connected to the inter-computer communication network, an Ethernet node was designed and constructed using the Inmos transputer. With this it was possible to demonstrate the benefits of parallel processing. Particular emphasis had to be given to those elements of the software which must make a guaranteed real-time response to external stimuli and it is shown that by isolating high priority processes, relatively simple OCCAM code can satisfy this need. Parallel processing principles have been utilised by the communications software that implemented the upper layers of the OSI seven layer network reference model using the Internet suite of protocols. By developing an abstract high level language, software was developed which allowed users to specify the inter-processor connection topology of a point to point connected multi-transputer array, built in association with this work. After constructing a flexible, memory efficient, graphics library, a technique to allow the high speed zooming of byte sized pixel data using a lookup table technique was developed. By using a multi-transputer design this allowed a 128x128 pixel image to be displayed at 256x256 pixel resolution at up to 25 frames per second, a requirement imposed by a contemporary cardiac imaging project.

APA, Harvard, Vancouver, ISO, and other styles

50

Rao, Ram Raghavendra. "Audio-visual interaction in multimedia." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/13349.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!