To see the other types of publications on this topic, follow the link: Perceptual features for speech recognition.

Dissertations / Theses on the topic 'Perceptual features for speech recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Perceptual features for speech recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Haque, Serajul. "Perceptual features for speech recognition." University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0187.

Full text
Abstract:
Automatic speech recognition (ASR) is one of the most important research areas in the field of speech technology and research. It is also known as the recognition of speech by a machine or, by some artificial intelligence. However, in spite of focused research in this field for the past several decades, robust speech recognition with high reliability has not been achieved as it degrades in presence of speaker variabilities, channel mismatch condi- tions, and in noisy environments. The superb ability of the human auditory system has motivated researchers to include features of human perception
APA, Harvard, Vancouver, ISO, and other styles
2

Gu, Y. "Perceptually-based features in automatic speech recognition." Thesis, Swansea University, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.637182.

Full text
Abstract:
Interspeaker variability of speech features is one of most important problems in automatic speech recognition (ASR), and makes speaker-independent systems much more difficult to achieve than speaker-dependent ones. The work described in the Thesis examines two ideas to overcome this problem. The first attempts to extract more reliable speech features by perceptually-based modelling; the second investigates the speaker variability in this speech feature and reduces its effects by a speaker normalisation scheme. The application of human speech perception in automatic speech recognition is discus
APA, Harvard, Vancouver, ISO, and other styles
3

Chu, Kam Keung. "Feature extraction based on perceptual non-uniform spectral compression for noisy speech recognition /." access full-text access abstract and table of contents, 2005. http://libweb.cityu.edu.hk/cgi-bin/ezdb/thesis.pl?mphil-ee-b19887516a.pdf.

Full text
Abstract:
Thesis (M.Phil.)--City University of Hong Kong, 2005.<br>"Submitted to Department of Electronic Engineering in partial fulfillment of the requirements for the degree of Master of Philosophy" Includes bibliographical references (leaves 143-147)
APA, Harvard, Vancouver, ISO, and other styles
4

Koniaris, Christos. "Perceptually motivated speech recognition and mispronunciation detection." Doctoral thesis, KTH, Tal-kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-102321.

Full text
Abstract:
This doctoral thesis is the result of a research effort performed in two fields of speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly distinguishable, the proposed approaches share a common hypothesis based on psychoacoustic processing of speech signals. The conjecture implies that the human auditory periphery provides a relatively good separation of different sound classes. Hence, it is possible to use recent findings from psychoacoustic perception together with mathematical and computational tools to model the auditory sensitivities
APA, Harvard, Vancouver, ISO, and other styles
5

Koniaris, Christos. "A study on selecting and optimizing perceptually relevant features for automatic speech recognition." Licentiate thesis, Stockholm : Kungliga Tekniska högskolan, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-11470.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Sklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.

Full text
Abstract:
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and
APA, Harvard, Vancouver, ISO, and other styles
7

Atassi, Hicham. "Rozpoznání emočního stavu z hrané a spontánní řeči." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-233665.

Full text
Abstract:
Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Dru
APA, Harvard, Vancouver, ISO, and other styles
8

Temko, Andriy. "Acoustic event detection and classification." Doctoral thesis, Universitat Politècnica de Catalunya, 2007. http://hdl.handle.net/10803/6880.

Full text
Abstract:
L'activitat humana que té lloc en sales de reunions o aules d'ensenyament es veu reflectida en una rica varietat d'events acústics, ja siguin produïts pel cos humà o per objectes que les persones manegen. Per això, la determinació de la identitat dels sons i de la seva posició temporal pot ajudar a detectar i a descriure l'activitat humana que té lloc en la sala. A més a més, la detecció de sons diferents de la veu pot ajudar a millorar la robustes de tecnologies de la parla com el reconeixement automàtica a condicions de treball adverses. L'objectiu d'aquesta tesi és la detecció i classificac
APA, Harvard, Vancouver, ISO, and other styles
9

Lileikytė, Rasa. "Quality estimation of speech recognition features." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120302_090132-92071.

Full text
Abstract:
The accuracy of speech recognition system depends on characteristics of employed speech recognition features and classifier. Evaluating the accuracy of speech recognition system in ordinary way, the error of speech recognition system has to be calculated for each type of explored feature system and each type of classifier. The amount of such calculations can be reduced if the quality of explored feature system is estimated. Accordingly, the researches were made for quality estimation of speech recognition features. The proposed method for quality estimation of speech recognition features is ba
APA, Harvard, Vancouver, ISO, and other styles
10

Matthews, Iain. "Features for audio-visual speech recognition." Thesis, University of East Anglia, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266736.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Droppo, J. G. "Time-frequency features for speech recognition /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/5965.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Ore, Brian M. "Multilingual Articulatory Features for Speech Recognition." Wright State University / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=wright1176169264.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Leung, Ka Yee. "Combining acoustic features and articulatory features for speech recognition /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202002%20LEUNGK.

Full text
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2002.<br>Includes bibliographical references (leaves 92-96). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
14

Iliev, Alexander Iliev. "Emotion Recognition Using Glottal and Prosodic Features." Scholarly Repository, 2009. http://scholarlyrepository.miami.edu/oa_dissertations/515.

Full text
Abstract:
Emotion conveys the psychological state of a person. It is expressed by a variety of physiological changes, such as changes in blood pressure, heart beat rate, degree of sweating, and can be manifested in shaking, changes in skin coloration, facial expression, and the acoustics of speech. This research focuses on the recognition of emotion conveyed in speech. There were three main objectives of this study. One was to examine the role played by the glottal source signal in the expression of emotional speech. The second was to investigate whether it can provide improved robustness in real-world
APA, Harvard, Vancouver, ISO, and other styles
15

Mossmyr, Simon. "Noisy recognition of perceptual mid-level features in music." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-294229.

Full text
Abstract:
Self-training with noisy student is a consistency-based semi-supervised self- training method that achieved state-of-the-art accuracy on ImageNet image classification upon its release. It makes use of data noise and model noise when fitting a model to both labelled data and a large amount of artificially labelled data. In this work, we use self- training with noisy student to fit a VGG- style deep CNN model to a dataset of music piece excerpts labelled with perceptual mid-level features and compare its performance with the benchmark. To achieve this, we experiment with some common data warping
APA, Harvard, Vancouver, ISO, and other styles
16

Saenko, Ekaterina 1976. "Articulatory features for robust visual speech recognition." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/28736.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.<br>Includes bibliographical references (p. 99-105).<br>This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic segment. These units are defined heuristically by mapping several visually similar phonemes to one visual phoneme, sometimes referred to as a viseme. However, experimental evidence shows that phonetic models
APA, Harvard, Vancouver, ISO, and other styles
17

Väyrynen, E. (Eero). "Emotion recognition from speech using prosodic features." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526204048.

Full text
Abstract:
Abstract Emotion recognition, a key step of affective computing, is the process of decoding an embedded emotional message from human communication signals, e.g. visual, audio, and/or other physiological cues. It is well-known that speech is the main channel for human communication and thus vital in the signalling of emotion and semantic cues for the correct interpretation of contexts. In the verbal channel, the emotional content is largely conveyed as constant paralinguistic information signals, from which prosody is the most important component. The lack of evaluation of affect and emotional
APA, Harvard, Vancouver, ISO, and other styles
18

Rankin, D. "Extraction of features from speech spectra." Thesis, Queen's University Belfast, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.373541.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Domont, Xavier. "Hierarchical spectro-temporal features for robust speech recognition." Münster Verl.-Haus Monsenstein und Vannerdat, 2009. http://d-nb.info/1001282655/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Lal, Partha. "Cross-lingual automatic speech recognition using tandem features." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5773.

Full text
Abstract:
Automatic speech recognition requires many hours of transcribed speech recordings in order for an acoustic model to be effectively trained. However, recording speech corpora is time-consuming and expensive, so such quantities of data exist only for a handful of languages — there are many languages for which little or no data exist. Given that there are acoustic similarities between different languages, it may be fruitful to use data from a well-supported source language for the task of training a recogniser in a target language with little training data. Since most languages do not share a com
APA, Harvard, Vancouver, ISO, and other styles
21

Harte, Naomi Antonia. "Segmental phonetic features and models for speech recognition." Thesis, Queen's University Belfast, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.287466.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Schuy, Lars. "Speech features and their significance in speaker recognition." Thesis, University of Sussex, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.288845.

Full text
Abstract:
This thesis addresses the significance of speech features within the task of speaker recognition. Motivated by the perception of simple attributes like `loud', `smooth', `fast', more than 70 new speech features are developed. A set of basic speech features like pitch, loudness and speech speed are combined together with these new features in a feature set, one set per utterance. A neural network classifier is used to evaluate the significance of these features by creating a speaker recognition system and analysing the behaviour of successfully trained single-speaker networks. An in-depth analy
APA, Harvard, Vancouver, ISO, and other styles
23

Necioğlu, Burhan F. "Objectively measured descriptors for perceptual characterization of speakers." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15035.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Savvides, Vasos E. "Perceptual models in speech quality assessment and coding." Thesis, Loughborough University, 1988. https://dspace.lboro.ac.uk/2134/36273.

Full text
Abstract:
The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as
APA, Harvard, Vancouver, ISO, and other styles
25

Juneja, Amit. "Speech recognition based on phonetic features and acoustic landmarks." College Park, Md. : University of Maryland, 2004. http://hdl.handle.net/1903/2148.

Full text
Abstract:
Thesis (Ph. D.) -- University of Maryland, College Park, 2004.<br>Thesis research directed by: Electrical Engineering. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
APA, Harvard, Vancouver, ISO, and other styles
26

ALENCAR, VLADIMIR FABREGAS SURIGUE DE. "EFFICIENT FEATURES AND INTERPOLATION DOMAINS IN DISTRIBUTED SPEECH RECOGNITION." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2005. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=6201@1.

Full text
Abstract:
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR<br>Com o crescimento gigantesco da Internet e dos sistemas de comunicações móveis celulares, as aplicações de processamento de voz nessas redes têm despertado grande interesse . Um problema particularmente importante nessa área consiste no reconhecimento de voz em um sistema servidor, baseado nos parâmetros acústicos calculados e quantizados no terminal do usuário (Reconhecimento de Voz Distribuído). Como em geral estes parâmetros não são os mais indicados como atributos de voz para o sistema de reconhecimento remoto, é import
APA, Harvard, Vancouver, ISO, and other styles
27

Meng, Helen M. "The use of distinctive features for automatic speech recognition." Thesis, Massachusetts Institute of Technology, 1991. http://hdl.handle.net/1721.1/13279.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Schutte, Kenneth Thomas 1979. "Parts-based models and local features for automatic speech recognition." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/53301.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (p. 101-108).<br>While automatic speech recognition (ASR) systems have steadily improved and are now in widespread use, their accuracy continues to lag behind human performance, particularly in adverse conditions. This thesis revisits the basic acoustic modeling assumptions common to most ASR systems and argues that improvements to the underlying model of speech are required to address these shortcomi
APA, Harvard, Vancouver, ISO, and other styles
29

Tang, Min Ph D. Massachusetts Institute of Technology. "Large vocabulary continuous speech recognition using linguistic features and constraints." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/33203.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.<br>This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.<br>Includes bibliographical references (leaves 111-123).<br>Automatic speech recognition (ASR) is a process of applying constraints, as encoded in the computer system (the recognizer), to the speech signal until ambiguity is satisfactorily resolved to the extent that only one sequence of words is hypothesized. Such constraints fall
APA, Harvard, Vancouver, ISO, and other styles
30

Civile, Ciro. "The face inversion effect and perceptual learning : features and configurations." Thesis, University of Exeter, 2013. http://hdl.handle.net/10871/13564.

Full text
Abstract:
This thesis explores the causes of the face inversion effect, which is a substantial decrement in performance in recognising facial stimuli when they are presented upside down (Yin,1969). I will provide results from both behavioural and electrophysiological (EEG) experiments to aid in the analysis of this effect. Over the course of six chapters I summarise my work during the four years of my PhD, and propose an explanation of the face inversion effect that is based on the general mechanisms for learning that we also share with other animals. In Chapter 1 I describe and discuss some of the main
APA, Harvard, Vancouver, ISO, and other styles
31

Weatherholtz, Kodi. "Perceptual learning of systemic cross-category vowel variation." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1429782580.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Jeon, Woojay. "Speech Analysis and Cognition Using Category-Dependent Features in a Model of the Central Auditory System." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14061.

Full text
Abstract:
It is well known that machines perform far worse than humans in recognizing speech and audio, especially in noisy environments. One method of addressing this issue of robustness is to study physiological models of the human auditory system and to adopt some of its characteristics in computers. As a first step in studying the potential benefits of an elaborate computational model of the primary auditory cortex (A1) in the central auditory system, we qualitatively and quantitatively validate the model under existing speech processing recognition methodology. Next, we develop new insights and ide
APA, Harvard, Vancouver, ISO, and other styles
33

Bruijn, Christina Geertruida de. "Voice quality after dictation to speech recognition software : a perceptual and acoustic study." Thesis, University of Sheffield, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.440907.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Javadi, Ailar. "Bio-inspired noise robust auditory features." Thesis, Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44801.

Full text
Abstract:
The purpose of this work is to investigate a series of biologically inspired modifications to state-of-the-art Mel- frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition results. We have provided recommendations to improve speech recognition results de- pending on signal-to-noise ratio levels of input signals. This work has been motivated by noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a spatial derivative step is used to sharpen the results, followed by an envelope detector (recti
APA, Harvard, Vancouver, ISO, and other styles
35

Berg, Brian LaRoy. "Investigating Speaker Features From Very Short Speech Records." Diss., Virginia Tech, 2001. http://hdl.handle.net/10919/28691.

Full text
Abstract:
A procedure is presented that is capable of extracting various speaker features, and is of particular value for analyzing records containing single words and shorter segments of speech. By taking advantage of the fast convergence properties of adaptive filtering, the approach is capable of modeling the nonstationarities due to both the vocal tract and vocal cord dynamics. Specifically, the procedure extracts the vocal tract estimate from within the closed glottis interval and uses it to obtain a time-domain glottal signal. This procedure is quite simple, requires minimal manual intervention
APA, Harvard, Vancouver, ISO, and other styles
36

Clark, Tracy M. "A Study of Features and Processes Towards Real-time Speech Word Recognition." Thesis, University of Canterbury. Electrical and Electronic Engineering, 1993. http://hdl.handle.net/10092/7561.

Full text
Abstract:
Word recognition techniques are reviewed. An exhaustive comparative study of many of the factors that affect recognition accuracy is presented. Experiments centred on four major areas of word recognition are described: pre-processing techniques, recognition features, recognition algorithms and distance measures. Recognition accuracy, in the context of each of these four areas, is investigated using the digit vocabulary spoken by 10 New Zealand (6 male and 4 female) and 38 American (20 male and 18 female) speakers. Pre-processing techniques examined are the type of window, the length of the dat
APA, Harvard, Vancouver, ISO, and other styles
37

Peso, Pablo. "Spatial features of reverberant speech : estimation and application to recognition and diarization." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/45664.

Full text
Abstract:
Distant talking scenarios, such as hands-free calling or teleconference meetings, are essential for natural and comfortable human-machine interaction and they are being increasingly used in multiple contexts. The acquired speech signal in such scenarios is reverberant and affected by additive noise. This signal distortion degrades the performance of speech recognition and diarization systems creating troublesome human-machine interactions. This thesis proposes a method to non-intrusively estimate room acoustic parameters, paying special attention to a room acoustic parameter highly correlated
APA, Harvard, Vancouver, ISO, and other styles
38

Sidorova, Julia. "Optimization techniques for speech emotion recognition." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7575.

Full text
Abstract:
Hay tres aspectos innovadores. Primero, un algoritmo novedoso para calcular el contenido emocional de un enunciado, con un diseño mixto que emplea aprendizaje estadístico e información sintáctica. Segundo, una extensión para selección de rasgos que permite adaptar los pesos y así aumentar la flexibilidad del sistema. Tercero, una propuesta para incorporar rasgos de alto nivel al sistema. Dichos rasgos, combinados con los rasgos de bajo nivel, permiten mejorar el rendimiento del sistema.<br>The first contribution of this thesis is a speech emotion recognition system called the ESEDA capable of
APA, Harvard, Vancouver, ISO, and other styles
39

Lareau, Jonathan. "Application of shifted delta cepstral features for GMM language identification /." Electronic version of thesis, 2006. https://ritdml.rit.edu/dspace/handle/1850/2686.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Gangireddy, Siva Reddy. "Recurrent neural network language models for automatic speech recognition." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28990.

Full text
Abstract:
The goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) for large vocabulary continuous speech recognition (LVCSR). RNNLMs are currently state-of-the-art and shown to consistently reduce the word error rates (WERs) of LVCSR tasks when compared to other language models. In this thesis we propose various advances to RNNLMs. The advances are: improved learning procedures for RNNLMs, enhancing the context, and adaptation of RNNLMs. We learned better parameters by a novel pre-training approach and enhanced the context using prosody and syntactic features.
APA, Harvard, Vancouver, ISO, and other styles
41

Wong, Jimmy Pui Fung. "The use of prosodic features in Chinese speech recognition and spoken language processing /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20WONG.

Full text
Abstract:
Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2003.<br>Includes bibliographical references (leaves 97-101). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
42

Sun, Rui. "The evaluation of the stability of acoustic features in affective conveyance across multiple emotional databases." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/49041.

Full text
Abstract:
The objective of the research presented in this thesis was to systematically investigate the computational structure for cross-database emotion recognition. The research consisted of evaluating the stability of acoustic features, particularly the glottal and Teager Energy based features, and investigating three normalization methods and two data fusion techniques. One of the challenges of cross-database training and testing is accounting for the potential variation in the types of emotions expressed as well as the recording conditions. In an attempt to alleviate the impact of these types of va
APA, Harvard, Vancouver, ISO, and other styles
43

LUDWICZAK, LEIGH ANN. "CHILDRENS' FIRST FIVE WORDS: AN ANALYSIS OF PERCEPTUAL FEATURES, GRAMMATICAL CATEGORIES, AND COMMUNICATIVE INTENTIONS." University of Cincinnati / OhioLINK, 2001. http://rave.ohiolink.edu/etdc/view?acc_num=ucin990647609.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

SIQUEIRA, JAN KRUEGER. "CONTINUOUS SPEECH RECOGNITION WITH MFCC, SSCH AND PNCC FEATURES, WAVELET DENOISING AND NEURAL NETWORKS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2011. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=19143@1.

Full text
Abstract:
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO<br>Um dos maiores desafios na área de reconhecimento de voz contínua é desenvolver sistemas robustos ao ruído aditivo. Para isso, este trabalho analisa e testa três técnicas. A primeira delas é a extração de atributos do sinal de voz usando os métodos MFCC, SSCH e PNCC. A segunda é a remoção de ruído do sinal de voz via wavelet denoising. A terceira e última é uma proposta original batizada de feature denoising, que busca melhorar os atributos extraídos usando um conjunto de redes neurais. Embora algumas dessas técnicas já sejam con
APA, Harvard, Vancouver, ISO, and other styles
45

Ishizuka, Kentaro. "Studies on Acoustic Features for Automatic Speech Recognition and Speaker Diarization in Real Environments." 京都大学 (Kyoto University), 2009. http://hdl.handle.net/2433/123834.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Chan, Oscar. "Prosodic features for a maximum entropy language model." University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0244.

Full text
Abstract:
A statistical language model attempts to characterise the patterns present in a natural language as a probability distribution defined over word sequences. Typically, they are trained using word co-occurrence statistics from a large sample of text. In some language modelling applications, such as automatic speech recognition (ASR), the availability of acoustic data provides an additional source of knowledge. This contains, amongst other things, the melodic and rhythmic aspects of speech referred to as prosody. Although prosody has been found to be an important factor in human speech recognitio
APA, Harvard, Vancouver, ISO, and other styles
47

Juzwin, Kathryn Rossetto. "The effects of perceptual interference and noninterference on facial recognition based on outer and inner facial features." Virtual Press, 1986. http://liblink.bsu.edu/uhtbin/catkey/447843.

Full text
Abstract:
This study investigated the effects of interference from a center stimulus on the recognition of faces presented in each visual half-field using the tachistoscoptic presentation. Based on prior studies, it was hypothesized that faces would be recognized nnre accurately based on outline features when presented to the Left visual field - Right hemisphere and on inner features for the Right visual field - Left hemisphere. It was also hypothesized that digits presented at center fixation would interfere most with the recognition of the inner details of faces presented to the right hemisphere, sinc
APA, Harvard, Vancouver, ISO, and other styles
48

Christensen, Carl V. "Fluency Features and Elicited Imitation as Oral Proficiency Measurement." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3114.

Full text
Abstract:
The objective and automatic grading of oral language tests has been the subject of significant research in recent years. Several obstacles lie in the way of achieving this goal. Recent work has suggested a testing technique called elicited imitation (EI) can be used to accurately approximate global oral proficiency. This testing methodology, however, does not incorporate some fundamental aspects of language such as fluency. Other work has suggested another testing technique, simulated speech (SS), as a supplement to EI that can provide automated fluency metrics. In this work, I investigate a c
APA, Harvard, Vancouver, ISO, and other styles
49

Wang, Yihan. "Automatic Speech Recognition Model for Swedish using Kaldi." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285538.

Full text
Abstract:
With the development of intelligent era, speech recognition has been a hottopic. Although many automatic speech recognition(ASR) tools have beenput into the market, a considerable number of them do not support Swedishbecause of its small number. In this project, a Swedish ASR model basedon Hidden Markov Model and Gaussian Mixture Models is established usingKaldi which aims to help ICA Banken complete the classification of aftersalesvoice calls. A variety of model patterns have been explored, whichhave different phoneme combination methods and eigenvalue extraction andprocessing methods. Word E
APA, Harvard, Vancouver, ISO, and other styles
50

Pietrzyk, Mariusz W. "Spatial frequency analysis of the perceptual features involved in pulmonary nodule detection and recognition from posterior-anterior chest radiographs." Thesis, Lancaster University, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.556697.

Full text
Abstract:
RATIONALE AND OBJECTIVES: Radiological error due to the incorrect interpretation of medical images still occurs in current practice, and continues to be reported both in laboratory and clinical experimental conditions. In general radiological practice error rates range from 3 - 5%. However, that scale reaches up to 30% for detection of some early pulmonary cancers. Computer-Aided Detection (CAD) algorithms have been proposed to support human observers in verifying their choices. Although CAD systems might help in certain situations, its general implementation in clinical practice is still cont
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!