Dissertations / Theses: 'Speech processing systems; Speech synthesis'

1

Liu, Zhu Lin. "Speech synthesis via adaptive Fourier decomposition." Thesis, University of Macau, 2011. http://umaclib3.umac.mo/record=b2493215.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Mazel, David S. "Sinusoidal modeling of speech." Thesis, Georgia Institute of Technology, 1986. http://hdl.handle.net/1853/13873.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Macon, Michael W. "Speech synthesis based on sinusoidal modeling." Diss., Georgia Institute of Technology, 1996. http://hdl.handle.net/1853/13904.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Chung, Jae H. "A new homomorphic vocoder framework using analysis-by-synthesis excitation analysis." Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/15471.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Crosmer, Joel R. "Very low bit rate speech coding using the line spectrum pair transformation of the LPC coefficients." Diss., Georgia Institute of Technology, 1985. http://hdl.handle.net/1853/15739.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Cummings, Kathleen E. "Analysis, synthesis, and recognition of stressed speech." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15673.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Farges, Eric P. "An analysis-synthesis hidden Markov model of speech." Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/14775.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Rose, Richard C. "The design and performance of an analysis-by-synthesis class of predictive speech coders." Diss., Georgia Institute of Technology, 1988. http://hdl.handle.net/1853/16693.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Peters, Richard Alan II. "A LINEAR PREDICTION CODING MODEL OF SPEECH (SYNTHESIS, LPC, COMPUTER, ELECTRONIC)." Thesis, The University of Arizona, 1985. http://hdl.handle.net/10150/291240.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lam, Victor T. M. "The stability of pitch synthesis filters in speech coding /." Thesis, McGill University, 1985. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63361.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

McCree, Alan V. "A new LPC vocoder model for low bit rate speech coding." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

George, E. Bryan. "An analysis-by-synthesis approach to sinusoidal modeling applied to speech and music signal processing." Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/15747.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Richards, Elizabeth A. "Automatic formant labeling in continuous speech /." Online version of thesis, 1989. http://hdl.handle.net/1850/10543.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

CARLSON, GERRARD MERRILL. "THE QUALITY OF SYNTHESIZED SPEECH USING LINEAR PREDICTIVE CODING ON FINITE WORDLENGTH INTEGRATED CIRCUITS." Diss., The University of Arizona, 1985. http://hdl.handle.net/10150/188024.

Full text

Abstract:

This paper studies the quality of synthetic speech produced by integrated circuit (IC) hardware using fixed-point arithmetic and Linear Predictive Coding (LPC). A theoretical model explaining the combined effects of finite wordlength and parametric model order is developed. This model is used to predict the results obtained in the experimental phase of this study. In the experimental phase, selected model utterances are synthesized under finite wordlength constraints using LPC parameters. The synthetic speech is evaluated in terms of the log area ratios which define objective speech quality as a parametric distance. A theoretical model is developed to predict the experimental results. Simulations of this model produce data that predict the experimental results. The same information is extracted from the model as that obtained from actually running the fixed-point synthesizer simulator. Since the predictions of the theoretical model agree quite well with the experimental measurements, it is concluded that fixed-point synthesizer performance can be predicted without actually running a complicated and expensive fixed-point synthesizer. Secondly, results obtained from either method clearly indicate that for 15 or 16 bits, ten is the best number of poles to use. Eight useable poles are indicated for 14 bits, while seven are indicated for 13 bits. Based on the results of this study, the use of less than 13 bits for fixed-point calculations is not recommended.

APA, Harvard, Vancouver, ISO, and other styles

15

Baloyi, Ntsako. "A text-to-speech synthesis system for Xitsonga using hidden Markov models." Thesis, University of Limpopo (Turfloop Campus), 2012. http://hdl.handle.net/10386/1021.

Full text

Abstract:

Thesis (M.Sc. (Computer Science) --University of Limpopo, 2013 This research study focuses on building a general-purpose working Xitsonga speech synthesis system that is as far as can be possible reasonably intelligible, natural sounding, and flexible. The system built has to be able to model some of the desirable speaker characteristics and speaking styles. This research project forms part of the broader national speech technology project that aims at developing spoken language systems for human-machine interaction using the eleven official languages of South Africa (SA). Speech synthesis is the reverse of automatic speech recognition (which receives speech as input and converts it to text) in that it receives text as input and produces synthesized speech as output. It is generally accepted that most people find listening to spoken utterances better that reading the equivalent of such utterances. The Xitsonga speech synthesis system has been developed using a hidden Markov model (HMM) speech synthesis method. The HMM-based speech synthesis (HTS) system synthesizes speech that is intelligible, and natural sounding. This method can synthesize speech on a footprint of only a few megabytes of training speech data. The HTS toolkit is applied as a patch to the HTK toolkit which is a hidden Markov model toolkit primarily designed for use in speech recognition to build and manipulate hidden Markov models.

APA, Harvard, Vancouver, ISO, and other styles

16

Visagie, Albertus Sybrand. "Speech generation in a spoken dialogue system." Thesis, Stellenbosch : University of Stellenbosch, 2004. http://hdl.handle.net/10019.1/16460.

Full text

Abstract:

Thesis (MScIng)--University of Stellenbosch, 2004. ENGLISH ABSTRACT: Spoken dialogue systems accessed over the telephone network are rapidly becoming more popular as a means to reduce call-centre costs and improve customer experience. It is now technologically feasible to delegate repetitive and relatively simple tasks conducted in most telephone calls to automatic systems. Such a system uses speech recognition to take input from users. This work focuses on the speech generation component that a specific prototype system uses to convey audible speech output back to the user. Many commercial systems contain general text-to-speech synthesisers. Text-to-speech synthesis is a very active branch of speech processing. It aims to build machines that read text aloud. In some languages this has been a reality for almost two decades. While these synthesisers are often very understandable, they almost never sound natural. The output quality of synthetic speech is considered to be a very important factor in the user’s perception of the quality and usability of spoken dialogue systems. The static nature of the spoken dialogue system is exploited to produce a custom speech synthesis component that provides very high quality output speech for the particular application. To this end the current state of the art in speech synthesis is surveyed and summarised. A unit-selection synthesiser is produced that functions in Afrikaans, English and Xhosa. The unit-selection synthesiser selects short waveforms from a recorded speech corpus, and concatenates them to produce the required utterances. Techniques are developed for designing a compact corpus and processing it to produce a unit-selection database. Speech modification methods were researched to build a framework for natural-sounding speech concatenation. This framework also provides pitch and duration modification capabilities that will enable research in languages such as Afrikaans and Xhosa where text-to-speech capabilities are relatively immature. AFRIKAANSE OPSOMMING: Telefoniese, spraakgebaseerde dialoogstelsels word steeds meer algemeen, en is ’n doeltreffende metode om oproepsentrumkostes te verlaag. Dit is tans tegnologies moontlik om ’n groot aantal eenvoudige transaksies met automatiese stelsels te hanteer. Sulke stelsels gebruik spraakherkenning om intree van die gebruiker te ontvang. Hierdie werk fokus op die spraakgenerasiekomponent wat ’n spesifieke prototipestelsel gebruik om afvoer aan die gebruiker terug te speel. Vele kommersi¨ele stelsels gebruik generiese teks-na-spraak sintetiseerders. Sulke teksna- spraak sintetiseerders is steeds ’n baie aktiewe veld in spraaknavorsing. In die algemeen poog navorsing om teks te kan lees en om te sit in verstaanbare spraak. Sulke stelsels bestaan nou al vir ten minste twee dekades. Alhoewel heeltemal verstaanbaar, klink hierdie stelsels onnatuurlik. In telefoniese spraakgebaseerde dialoogstelsels is kwaliteit van die sintetiese spraak belangrik vir die gebruiker se persepsie van die stelsel se kwaliteit en bruikbaarheid. Die dialoog is meestal staties van aard en hierdie eienskap word benut om ho¨e kwaliteit spraak in ’n bepaalde toepassing te sintetiseer. Om dit reg te kry is die huidige stand van sake in hierdie veld bestudeer en opgesom. ’n Knip-en-plak sintetiseerder is gebou wat werk in Afrikaans, Engels en Xhosa. Die sintetiseerder selekteer kort stukkies spraakgolfvorms vanuit ’n spraakkorpus, en las dit aanmekaar om die vereiste spraak te produseer. Outomatiese tegnieke is ontwikkel om ’n kompakte korpus te ontwerp wat steeds alles bevat wat die sintetiseerder sal nodig hˆe om sy taak te verrig. Verdere tegnieke prosesseer die korpus tot ’n bruikbare vorm vir sintese. Metodes van spraakmodifikasie is ondersoek ten einde die aanmekaargelaste stukkies spraak meer natuurlik te laat klink en die intonasie en tempo daarvan te korrigeer. Dit verskaf infrastruktuur vir navorsing in tale soos Afrikaans en Xhosa waar teks-na-spraak vermo¨ens nog onvolwasse is.

APA, Harvard, Vancouver, ISO, and other styles

17

Harmse, Wynand. "Wavelet-based speech enhancement : a statistical approach." Thesis, Stellenbosch : University of Stellenbosch, 2004. http://hdl.handle.net/10019.1/16336.

Full text

Abstract:

Thesis (MScIng)--University of Stellenbosch, 2004. ENGLISH ABSTRACT: Speech enhancement is the process of removing background noise from speech signals. The equivalent process for images is known as image denoising. While the Fourier transform is widely used for speech enhancement, image denoising typically uses the wavelet transform. Research on wavelet-based speech enhancement has only recently emerged, yet it shows promising results compared to Fourier-based methods. This research is enhanced by the availability of new wavelet denoising algorithms based on the statistical modelling of wavelet coefficients, such as the hidden Markov tree. The aim of this research project is to investigate wavelet-based speech enhancement from a statistical perspective. Current Fourier-based speech enhancement and its evaluation process are described, and a framework is created for wavelet-based speech enhancement. Several wavelet denoising algorithms are investigated, and it is found that the algorithms based on the statistical properties of speech in the wavelet domain outperform the classical and more heuristic denoising techniques. The choice of wavelet influences the quality of the enhanced speech and the effect of this choice is therefore examined. The introduction of a noise floor parameter also improves the perceptual quality of the wavelet-based enhanced speech, by masking annoying residual artifacts. The performance of wavelet-based speech enhancement is similar to that of the more widely used Fourier methods at low noise levels, with a slight difference in the residual artifact. At high noise levels, however, the Fourier methods are superior. AFRIKAANSE OPSOMMING: Spraaksuiwering is die proses waardeur agtergrondgeraas uit spraakseine verwyder word. Die ekwivalente proses vir beelde word beeldsuiwering genoem. Terwyl spraaksuiwering in die algemeen in die Fourier-domein gedoen word, gebruik beeldsuiwering tipies die golfietransform. Navorsing oor golfie-gebaseerde spraaksuiwering het eers onlangs verskyn, en dit toon reeds belowende resultate in vergelyking met Fourier-gebaseerde metodes. Hierdie navorsingsveld word aangehelp deur die beskikbaarheid van nuwe golfie-gebaseerde suiweringstegnieke wat die golfie-ko¨effisi¨ente statisties modelleer, soos die verskuilde Markovboom. Die doel van hierdie navorsingsprojek is om golfie-gebaseerde spraaksuiwering vanuit ‘n statistiese oogpunt te bestudeer. Huidige Fourier-gebaseerde spraaksuiweringsmetodes asook die evalueringsproses vir sulke algoritmes word bespreek, en ‘n raamwerk word geskep vir golfie-gebaseerde spraaksuiwering. Verskeie golfie-gebaseerde algoritmes word ondersoek, en daar word gevind dat die metodes wat die statistiese eienskappe van spraak in die golfie-gebied gebruik, beter vaar as die klassieke en meer heuristiese metodes. Die keuse van golfie be¨ınvloed die kwaliteit van die gesuiwerde spraak, en die effek van hierdie keuse word dus ondersoek. Die gebruik van ‘n ruisvloer parameter verhoog ook die kwaliteit van die golfie-gesuiwerde spraak, deur steurende residuele artifakte te verberg. Die golfie-metodes vaar omtrent dieselfde as die klassieke Fourier-metodes by lae ruisvlakke, met ’n klein verskil in residuele artifakte. By ho¨e ruisvlakke vaar die Fouriermetodes egter steeds beter.

APA, Harvard, Vancouver, ISO, and other styles

18

Wallace, John Glenn. "Speech synthesis using a digital modulation scheme on the IBM personal computer." Diss., Rolla, Mo. : School of Mines and Metallurgy of the University of Missouri, 1989. http://scholarsmine.mst.edu/thesis/pdf/Wallace_09007dcc805dc178.pdf.

Full text

Abstract:

Thesis (M.S.)--University of Missouri--Rolla, 1989. Vita. The entire thesis text is included in file. Title from title screen of thesis/dissertation PDF file (viewed January 9, 2009) Includes bibliographical references (p. 39-40).

APA, Harvard, Vancouver, ISO, and other styles

19

Haque, Serajul. "Perceptual features for speech recognition." University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0187.

Full text

Abstract:

Automatic speech recognition (ASR) is one of the most important research areas in the field of speech technology and research. It is also known as the recognition of speech by a machine or, by some artificial intelligence. However, in spite of focused research in this field for the past several decades, robust speech recognition with high reliability has not been achieved as it degrades in presence of speaker variabilities, channel mismatch condi- tions, and in noisy environments. The superb ability of the human auditory system has motivated researchers to include features of human perception in the speech recognition process. This dissertation investigates the roles of perceptual features of human hearing in automatic speech recognition in clean and noisy environments. Methods of simplified synaptic adaptation and two-tone suppression by companding are introduced by temporal processing of speech using a zero-crossing algorithm. It is observed that a high frequency enhancement technique such as synaptic adaptation performs better in stationary Gaussian white noise, whereas a low frequency enhancement technique such as the two-tone sup- pression performs better in non-Gaussian non-stationary noise types. The effects of static compression on ASR parametrization are investigated as observed in the psychoacoustic input/output (I/O) perception curves. A method of frequency dependent asymmetric compression technique, that is, higher compression in the higher frequency regions than the lower frequency regions, is proposed. By asymmetric compression, degradation of the spectral contrast of the low frequency formants due to the added compression is avoided. A novel feature extraction method for ASR based on the auditory processing in the cochlear nucleus is presented. The processings for synchrony detection, average discharge (mean rate) processing and the two tone suppression are segregated and processed separately at the feature extraction level according to the differential processing scheme as observed in the AVCN, PVCN and the DCN, respectively, of the cochlear nucleus. It is further observed that improved ASR performances can be achieved by separating the synchrony detection from the synaptic processing. A time-frequency perceptual spectral subtraction method based on several psychoacoustic properties of human audition is developed and evaluated by an ASR front-end. An auditory masking threshold is determined based on these psychoacoustic e?ects. It is observed that in speech recognition applications, spec- tral subtraction utilizing psychoacoustics may be used for improved performance in noisy conditions. The performance may be further improved if masking of noise by the tonal components is augmented by spectral subtraction in the masked region.

APA, Harvard, Vancouver, ISO, and other styles

20

Herlong, David W. "Effects of voice coding and speech rate on a synthetic speech display in a telephone information system." Thesis, Virginia Polytechnic Institute and State University, 1988. http://hdl.handle.net/10919/80037.

Full text

Abstract:

Despite the lack of formal guidelines, synthetic speech displays are used in a growing variety of applications. Telephone information systems permitting human-computer interaction from remote locations are an especially popular implementation of computer-generated speech. Currently, human factors research is needed to specify design characteristics providing usable telephone information systems as defined by task performance and user ratings. Previous research used nonintegrated tasks such as transcription of phonetic syllables, words, or sentences to assess task performance or user preference differences. This study used a computer-driven telephone information system as a real-time, human-computer interface to simulate applications where synthetic speech is used to access data. Subjects used a telephone keypad to navigate through an automated, department store database to locate and transcribe specific information messages. Because speech provides a sequential and transient information display, users may have difficulty navigating through auditory databases. One issue investigated in this study was whether use of alternating male and female voices to code different levels in the database hierarchy would improve user search performance. Other issues investigated were basic intelligibility of these male and female voices as influenced by different levels of speech rate. All factors were assessed as functions of search or transcription task performance and user preference. Analysis of transcription accuracy, search efficiency and time, and subjective ratings revealed an overall significant effect of speech rate on all groups of measures but no significant effects for voice type or coding scheme. Results were used to recommend design guidelines for developing speech displays for telephone information systems. Master of Science

APA, Harvard, Vancouver, ISO, and other styles

21

Yoon, Kyuchul. "Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system." Connect to this title online, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1119010941.

Full text

Abstract:

Thesis (Ph. D.)--Ohio State University, 2005. Title from first page of PDF file. Document formatted into pages; contains xxii, 291 p.; also includes graphics (some col.) Includes bibliographical references (p. 210-216). Available online via OhioLINK's ETD Center

APA, Harvard, Vancouver, ISO, and other styles

22

Holmes, William Paul. "Voice input for the disabled /." Title page, contents and summary only, 1987. http://web4.library.adelaide.edu.au/theses/09ENS/09ensh749.pdf.

Full text

Abstract:

Thesis (M. Eng. Sc.)--University of Adelaide, 1987. Typescript. Includes a copy of a paper presented at TADSEM '85 --Australian Seminar on Devices for Expressive Communication and Environmental Control, co-authored by the author. Includes bibliographical references (leaves [115-121]).

APA, Harvard, Vancouver, ISO, and other styles

23

Leonavičius, Romas. "Melizmų sintezė dirbtinių neuronų tinklais." Vilniaus Gedimino technikos universitetas, 2006. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2006~D_20061221.SEQN_3887.

Full text

Abstract:

Modern methods of speech synthesis are not suitable for restoration of song signals due to lack of vitality and intonation in the resulted sounds. The aim of presented work is to synthesize melismas met in Lithuanian folk songs, by applying Artificial Neural Networks. An analytical survey of rather a widespread literature is presented. First classification and comprehensive discussion of melismas are given. The theory of dynamic systems which will make the basis for studying melismas is presented and finally the relationship for modeling a melisma with nonlinear and dynamic systems is outlined. Investigation of the most widely used Linear Prediction Coding method and possibilities of its improvement. The modification of original Linear Prediction method based on dynamic LPC frame positioning is proposed. On its basis, the new melisma synthesis technique is presented.Developed flexible generalized melisma model, based on two Artificial Neural Networks – a Multilayer Perceptron and Adaline – as well as on two network training algorithms – Levenberg- Marquardt and the Least Squares error minimization – is presented. Moreover, original mathematical models of Fortis, Gruppett, Mordent and Trill are created, fit for synthesizing melismas, and their minimal sizes are proposed. The last chapter concerns experimental investigation, using over 500 melisma records, and corroborates application of the new mathematical models to melisma synthesis of one [ ...].

APA, Harvard, Vancouver, ISO, and other styles

24

Carvalho, Sarah Negreiros de 1985. "Estudo de um sistema de conversão texto-fala baseado em HMM." [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/259046.

Full text

Abstract:

Orientador: Fábio Violaro Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação Made available in DSpace on 2018-08-22T07:58:43Z (GMT). No. of bitstreams: 1 Carvalho_SarahNegreirosde_M.pdf: 2350561 bytes, checksum: 950d33430acbd816700ef5de4c78fa5d (MD5) Previous issue date: 2013 Resumo: Com o contínuo desenvolvimento da tecnologia, há uma demanda crescente por sistemas de síntese de fala que sejam capazes de falar como humanos, para integrá-los nas mais diversas aplicações, seja no âmbito da automação robótica, sejam para acessibilidade de pessoas com deficiências, seja em aplicativos destinados a cultura e lazer. A síntese de fala baseada em modelos ocultos de Markov (HMM) mostra-se promissora em suprir esta necessidade tecnológica. A sua natureza estatística e paramétrica a tornam um sistema flexível, capaz de adaptar vozes artificiais, inserir emoções no discurso e obter fala sintética de boa qualidade usando uma base de treinamento limitada. Esta dissertação apresenta o estudo realizado sobre o sistema de síntese de fala baseado em HMM (HTS), descrevendo as etapas que envolvem o treinamento dos modelos HMMs e a geração do sinal de fala. São apresentados os modelos espectrais, de pitch e de duração que constituem estes modelos HMM dos fonemas dependentes de contexto, considerando as diversas técnicas de estruturação deles. Alguns dos problemas encontrados no HTS, tais como a característica abafada e monótona da fala artificial, são analisados juntamente com algumas técnicas propostas para aprimorar a qualidade final do sinal de fala sintetizado Abstract: With the continuous development of technology, there is a growing demand for text-to-speech systems that are able to speak like humans, in order to integrate them in the most diverse applications whether in the field of automation and robotics, or for accessibility of people with disabilities, as for culture and leisure activities. Speech synthesis based on hidden Markov models (HMM) shows to be promising in addressing this need. Their statistical and parametric nature make it a flexible system capable of adapting artificial voices, insert emotions in speech and get artificial speech of good quality using a limited amount of speech data for HMM training. This thesis presents the study realized on HMM-based speech synthesis system (HTS), describing the steps that involve the training of HMM models and the artificial speech generation. Spectral, pitch and duration models are presented, which form context-dependent HMM models, and also are considered the various techniques for structuring them. Some of the problems encountered in the HTS, such as the characteristic muffled and monotone of artificial speech, are analyzed along with some of the proposed techniques to improve the final quality of the synthesized speech signal Mestrado Telecomunicações e Telemática Mestra em Engenharia Elétrica

APA, Harvard, Vancouver, ISO, and other styles

25

Leite, Harlei Miguel de Arruda 1989. "Proposta de metodologia de avaliação de voz sintética com ênfase no ambiente educacional." [s.n.], 2014. http://repositorio.unicamp.br/jspui/handle/REPOSIP/258883.

Full text

Abstract:

Orientador: Dalton Soares Arantes Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação Made available in DSpace on 2018-08-25T15:09:09Z (GMT). No. of bitstreams: 1 Leite_HarleiMigueldeArruda_M.pdf: 3631088 bytes, checksum: b997adfa6f8915d31a23e0eb6daf0cc3 (MD5) Previous issue date: 2014 Resumo: A principal contribuição desta dissertação é a proposta de uma metodologia de avaliação de voz sintetizada. O método consiste em um conjunto de etapas que buscam auxiliar o avaliador nas etapas de planejamento, aplicação e análise dos dados coletados. O método foi originalmente desenvolvido para avaliar um conjunto de vozes sintetizadas para encontrar a voz que melhor se adapta a ambientes de educação a distância usando avatares. Também foram estudadas as relações entre inteligibilidade, compreensibilidade e naturalidade a fim conhecer os fatores a serem considerados para aprimorar os sintetizadores de fala. Esta dissertação também apresenta os principais métodos de avaliação encontrados na literatura e o princípio de funcionamento dos sistemas TTS Abstract: This thesis proposes, as main contribution, a new synthesized voice evaluation methodology. The method consists of a set of steps that seek to assist the assessor in the stages of planning, implementation and analysis of data collected. The method was originally developed to evaluate a set of synthesized voices to find the voice that best fits the environments for distance education using avatars. Relations between intelligibility, comprehensibility and naturalness were studied in order to know the factors to be considered to enhance the speech synthesizers. This thesis also presents the main evaluation methods in the literature and how TTS (Text-to-Speech) systems work Mestrado Telecomunicações e Telemática Mestre em Engenharia Elétrica

APA, Harvard, Vancouver, ISO, and other styles

26

Dours, Daniel. "Conception d'un systeme multiprocesseur traitant un flot continu de donnees en temps reel pour la realisation d'une interface vocale intelligente." Toulouse 3, 1986. http://www.theses.fr/1986TOU30107.

Full text

Abstract:

Une serie de transformations syntaxiques et semantiques permettant de paralleliser une application, sont definies dans le deuxieme chapitre. On obtient ainsi une representation de l'application en terme de reseaux de modules imbriques. Une architecture modulaire reconfigurable adaptee a ce type de representation est decrite dans le troisieme chapitre. Pour projeter l'application sur cette architecture, un langage approprie est defini et un ensemble de moyens et de methodes permettant la construction d'un logiciel interactif recherchant la configuration optimale du systeme multiprocesseur executant l'application donnee est decrit. Quant a la derniere partie, elle a pour but de montrer la parfaite adequation entre le systeme multiprocesseur ainsi concu et l'organisation modulaire d'un terminal vocal, de jeter un regard prospectif sur l'utilisation d'un tel systeme dans d'autre domaines d'application en particulier les systemes de vision et les robots intelligents

APA, Harvard, Vancouver, ISO, and other styles

27

Coetzee, H. J. "The development of a new objective speech quality measure for speech coding applications." Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/15474.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Charpentier, Francis. "Traitement de la parole par analyse-synthese de fourier : application a la synthese par diphones." Paris, ENST, 1988. http://www.theses.fr/1988ENST0009.

Full text

Abstract:

Ces techniaues sont utilisees dans le but d'obtenir une meilleure qualite de son que celle obtenue par les methodes paramagnetiques habituelles. L'accent est mis sur la double approche suivante: 1) interpretation de la transformee de fourier a court terme comme un banc de filtres et synthese par addition des sorties de ce banc filtre; 2) synthese par superposition et addition de signaux a court terme

APA, Harvard, Vancouver, ISO, and other styles

29

Morris, Robert W. "Enhancement and recognition of whispered speech." Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04082004-180338/unrestricted/morris%5frobert%5fw%5f200312%5fphd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Quackenbush, Schuyler Reynier. "Objective measures of speech quality." Diss., Georgia Institute of Technology, 1995. http://hdl.handle.net/1853/13376.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

陳我智 and Ngor-chi Chan. "Text-to-speech conversion for Putonghua." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1990. http://hub.hku.hk/bib/B31209580.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Chiou, Fred Y. "User-interactive speech enhancement using fuzzy logic." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/14916.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Scott, Simon David. "A data-driven approach to visual speech synthesis." Thesis, University of Bath, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.307116.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Devaney, Jason Wayne. "A study of articulatory gestures for speech synthesis." Thesis, University of Liverpool, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.284254.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Larreategui, Mikel. "High-quality text-to-speech synthesis using sinusoidal techniques." Thesis, Staffordshire University, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.309790.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Bulyko, Ivan. "Flexible speech synthesis using weighted finite-state transducers /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6081.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Chan, Ngor-chi. "Text-to-speech conversion for Putonghua /." [Hong Kong : University of Hong Kong], 1990. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12929475.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Bamini, Praveen Kumar. "FPGA-based Implementation of Concatenative Speech Synthesis Algorithm." [Tampa, Fla.] : University of South Florida, 2003. http://purl.fcla.edu/fcla/etd/SFE0000187.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Little, M. A. "Biomechanically informed nonlinear speech signal processing." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.

Full text

Abstract:

Linear digital signal processing based around linear, time-invariant systems theory finds substantial application in speech processing. The linear acoustic source-filter theory of speech production provides ready biomechanical justification for using linear techniques. Nonetheless, biomechanical studies surveyed in this thesis display significant nonlinearity and non-Gaussinity, casting doubt on the linear model of speech production. In order therefore to test the appropriateness of linear systems assumptions for speech production, surrogate data techniques can be used. This study uncovers systematic flaws in the design and use of exiting surrogate data techniques, and, by making novel improvements, develops a more reliable technique. Collating the largest set of speech signals to-date compatible with this new technique, this study next demonstrates that the linear assumptions are not appropriate for all speech signals. Detailed analysis shows that while vowel production from healthy subjects cannot be explained within the linear assumptions, consonants can. Linear assumptions also fail for most vowel production by pathological subjects with voice disorders. Combining this new empirical evidence with information from biomechanical studies concludes that the most parsimonious model for speech production, explaining all these findings in one unified set of mathematical assumptions, is a stochastic nonlinear, non-Gaussian model, which subsumes both Gaussian linear and deterministic nonlinear models. As a case study, to demonstrate the engineering value of nonlinear signal processing techniques based upon the proposed biomechanically-informed, unified model, the study investigates the biomedical engineering application of disordered voice measurement. A new state space recurrence measure is devised and combined with an existing measure of the fractal scaling properties of stochastic signals. Using a simple pattern classifier these two measures outperform all combinations of linear methods for the detection of voice disorders on a large database of pathological and healthy vowels, making explicit the effectiveness of such biomechanically-informed, nonlinear signal processing techniques.

APA, Harvard, Vancouver, ISO, and other styles

40

Alphonso, Issac John. "Network training for continuous speech recognition." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-10252003-105104.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Yatrou, Paul M. "Analysis of predictor mistracking in ADPCM speech coders." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=66242.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Wilson, Shawn C. "Voice recognition systems : assessment of implementation aboard U.S. naval ships." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FWilson.pdf.

Full text

Abstract:

Thesis (M.S. in Information Systems and Operations)--Naval Postgraduate School, March 2003. Thesis advisor(s): Michael T. McMaster, Kenneth J. Hagan. Includes bibliographical references (p. 47-49). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

43

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/36534.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders." Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-06072004-131138/unrestricted/ertan%5Fali%5Fe%5F200405%5Fphd.pdf.

Full text

Abstract:

Thesis (Ph. D.)--School of Electrical and Computer Engineering, Georgia Institute of Technology, 2004. Directed by Thomas P. Barnwell, III. Vita. Includes bibliographical references (leaves 221-226).

APA, Harvard, Vancouver, ISO, and other styles

45

Nel, Pieter Willem. "Automatic syllabification of untranscribed speech." Thesis, Stellenbosch : Stellenbosch University, 2005. http://hdl.handle.net/10019.1/50285.

Full text

Abstract:

Thesis (MScEng)--Stellenbosch University, 2005. ENGLISH ABSTRACT: The syllable has been proposed as a unit of automatic speech recognition due to its strong links with human speech production and perception. Recently, it has been proved that incorporating information from syllable-length time-scales into automatic speech recognition improves results in large vocabulary recognition tasks. It was also shown to aid in various language recognition tasks and in foreign accent identification. Therefore, the ability to automatically segment speech into syllables is an important research tool. Where most previous studies employed knowledge-based methods, this study presents a purely statistical method for the automatic syllabification of speech. We introduce the concept of hierarchical hidden Markov model structures and show how these can be used to implement a purely acoustical syllable segmenter based, on general sonority theory, combined with some of the phonotactic constraints found in the English language. The accurate reporting of syllabification results is a problem in the existing literature. We present a well-defined dynamic time warping (DTW) distance measure used for reporting syllabification results. We achieve a token error rate of 20.3% with a 42ms average boundary error on a relatively large set of data. This compares well with previous knowledge-based and statistically- based methods. AFRIKAANSE OPSOMMING: Die syllabe is voorheen voorgestel as 'n basiese eenheid vir automatiese spraakherkenning weens die sterk verwantwskap wat dit het met spraak produksie en persepsie. Onlangs is dit bewys dat die gebruik van informasie van syllabe-lengte tydskale die resultate verbeter in groot woordeskat herkennings take. Dit is ook bewys dat die gebruik van syllabes automatiese taalherkenning en vreemdetaal aksent herkenning vergemaklik. Dit is daarom belangrik om vir navorsingsdoeleindes syllabes automaties te kan segmenteer. Vorige studies het kennisgebaseerde metodes gebruik om hierdie segmentasie te bewerkstellig. Hierdie studie gebruik 'n suiwer statistiese metode vir die automatiese syllabifikasie van spraak. Ons gebruik die konsep van hierargiese verskuilde Markov model strukture en wys hoe dit gebruik kan word om 'n suiwer akoestiese syllabe segmenteerder te implementeer. Die model word gebou deur dit te baseer op die teorie van sonoriteit asook die fonotaktiese beperkinge teenwoordig in die Engelse taal. Die akkurate voorstelling van syllabifikasie resultate is problematies in die bestaande literatuur. Ons definieer volledig 'n DTW (Dynamic Time Warping) afstands funksie waarmee ons ons syllabifikasie resultate weergee. Ons behaal 'n TER (Token Error Rate) van 20.3% met 'n 42ms gemiddelde grens fout op 'n relatiewe groot stel data. Dit vergelyk goed met vorige kennis-gebaseerde en statisties-gebaseerde metodes.

APA, Harvard, Vancouver, ISO, and other styles

46

Boulis, Constantinos. "Topic learning in text and conversational speech /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/5914.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Mészáros, Tomáš. "Speech Analysis for Processing of Musical Signals." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234974.

Full text

Abstract:

Hlavním cílem této práce je obohatit hudební signály charakteristikami lidské řeči. Práce zahrnuje tvorbu audioefektu inspirovaného efektem talk-box: analýzu hlasového ústrojí vhodným algoritmem jako je lineární predikce, a aplikaci odhadnutého filtru na hudební audio-signál. Důraz je kladen na dokonalou kvalitu výstupu, malou latenci a nízkou výpočetní náročnost pro použití v reálném čase. Výstupem práce je softwarový plugin využitelný v profesionálních aplikacích pro úpravu audia a při využití vhodné hardwarové platformy také pro živé hraní. Plugin emuluje reálné zařízení typu talk-box a poskytuje podobnou kvalitu výstupu s unikátním zvukem.

APA, Harvard, Vancouver, ISO, and other styles

48

Ng, H. N. Elaine. "Effects of noise type on speech understanding." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B37990159.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Ng, H. N. Elaine, and 吳凱寧. "Effects of noise type on speech understanding." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B37990159.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Lai, Yiu Pong. "Maximum likelihood normalization for robust speech recognition /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20LAI.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003. Includes bibliographical references (leaves 98-103). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Speech processing systems; Speech synthesis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles