Log in

Relevant bibliographies by topics / Text-to-speech synthesis system / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Text-to-speech synthesis system.

Dissertations / Theses on the topic 'Text-to-speech synthesis system'

Author: Grafiati

Published: 4 June 2021

Last updated: 3 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 20 dissertations / theses for your research on the topic 'Text-to-speech synthesis system.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Micallef, Paul. "A text to speech synthesis system for Maltese." Thesis, University of Surrey, 1997. http://epubs.surrey.ac.uk/842702/.

Full text

Abstract:

The subject of this thesis covers a considerably varied multidisciplinary area which needs to be addressed to be able to achieve a text-to-speech synthesis system of high quality, in any language. This is the first time that such a system has been built for Maltese, and therefore, there was the additional problem of no computerised sources or corpora. However many problems and much of the system designs are common to all languages. This thesis focuses on two general problems. The first is that of automatic labelling of phonemic data, since this is crucial for the setting up of Maltese speech corpora, which in turn can be used to improve the system. A novel way of achieving such automatic segmentation was investigated. This uses a mixed parameter model with maximum likelihood training of the first derivative of the features across a set of phonetic class boundaries. It was found that this gives good results even for continuous speech provided that a phonemic labelling of the text is available. A second general problem is that of segment concatenation, since the end and beginning of subsequent diphones can have mismatches in amplitude, frequency, phase and spectral envelope. The use of-intermediate frames, build up from the last and first frames of two concatenated diphones, to achieve a smoother continuity was analysed. The analysis was done both in time and in frequency. The use of wavelet theory for the separation of the spectral envelope from the excitation was also investigated. The linguistic system modules have been built for this thesis. In particular a rule based grapheme to phoneme conversion system that is serial and not hierarchical was developed. The morphological analysis required the design of a system which allowed two dissimilar lexical structures, (semitic and romance) to be integrated into one overall morphological analyser. Appendices at the back are included with detailed rules of the linguistic modules developed. The present system, while giving satisfactory intelligibility, with capability of modifying duration, does not include as yet a prosodic module.

APA, Harvard, Vancouver, ISO, and other styles

2

Baloyi, Ntsako. "A text-to-speech synthesis system for Xitsonga using hidden Markov models." Thesis, University of Limpopo (Turfloop Campus), 2012. http://hdl.handle.net/10386/1021.

Full text

Abstract:

Thesis (M.Sc. (Computer Science) --University of Limpopo, 2013
This research study focuses on building a general-purpose working Xitsonga speech synthesis system that is as far as can be possible reasonably intelligible, natural sounding, and flexible. The system built has to be able to model some of the desirable speaker characteristics and speaking styles. This research project forms part of the broader national speech technology project that aims at developing spoken language systems for human-machine interaction using the eleven official languages of South Africa (SA). Speech synthesis is the reverse of automatic speech recognition (which receives speech as input and converts it to text) in that it receives text as input and produces synthesized speech as output. It is generally accepted that most people find listening to spoken utterances better that reading the equivalent of such utterances. The Xitsonga speech synthesis system has been developed using a hidden Markov model (HMM) speech synthesis method. The HMM-based speech synthesis (HTS) system synthesizes speech that is intelligible, and natural sounding. This method can synthesize speech on a footprint of only a few megabytes of training speech data. The HTS toolkit is applied as a patch to the HTK toolkit which is a hidden Markov model toolkit primarily designed for use in speech recognition to build and manipulate hidden Markov models.

APA, Harvard, Vancouver, ISO, and other styles

3

Yoon, Kyuchul. "Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system." Connect to this title online, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1119010941.

Full text

Abstract:

Thesis (Ph. D.)--Ohio State University, 2005.
Title from first page of PDF file. Document formatted into pages; contains xxii, 291 p.; also includes graphics (some col.) Includes bibliographical references (p. 210-216). Available online via OhioLINK's ETD Center

APA, Harvard, Vancouver, ISO, and other styles

4

Beněk, Tomáš. "Implementing and Improving a Speech Synthesis System." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236079.

Full text

Abstract:

Tato práce se zabývá syntézou řeči z textu. V práci je podán základní teoretický úvod do syntézy řeči z textu. Práce je postavena na MARY TTS systému, který umožňuje využít existujících modulů k vytvoření vlastního systému pro syntézu řeči z textu, a syntéze řeči pomocí skrytých Markovových modelů natrénovaných na vytvořené řečové databázi. Bylo vytvořeno několik jednoduchých programů ulehčujících vytvoření databáze a přidání nového jazyka a hlasu pro MARY TTS systém bylo demonstrováno. Byl vytvořen a publikován modul a hlas pro Český jazyk. Byl popsán a implementován algoritmus pro přepis grafémů na fonémy.

APA, Harvard, Vancouver, ISO, and other styles

5

Malatji, Promise Tshepiso. "The development of accented English synthetic voices." Thesis, University of Limpopo, 2019. http://hdl.handle.net/10386/2917.

Full text

Abstract:

Thesis (M. Sc. (Computer Science)) --University of Limpopo, 2019
A Text-to-speech (TTS) synthesis system is a software system that receives text as input and produces speech as output. A TTS synthesis system can be used for, amongst others, language learning, and reading out text for people living with different disabilities, i.e., physically challenged, visually impaired, etc., by native and non-native speakers of the target language. Most people relate easily to a second language spoken by a non-native speaker they share a native language with. Most online English TTS synthesis systems are usually developed using native speakers of English. This research study focuses on developing accented English synthetic voices as spoken by non-native speakers in the Limpopo province of South Africa. The Modular Architecture for Research on speech sYnthesis (MARY) TTS engine is used in developing the synthetic voices. The Hidden Markov Model (HMM) method was used to train the synthetic voices. Secondary training text corpus is used to develop the training speech corpus by recording six speakers reading the text corpus. The quality of developed synthetic voices is measured in terms of their intelligibility, similarity and naturalness using a listening test. The results in the research study are classified based on evaluators’ occupation and gender and the overall results. The subjective listening test indicates that the developed synthetic voices have a high level of acceptance in terms of similarity and intelligibility. A speech analysis software is used to compare the recorded synthesised speech and the human recordings. There is no significant difference in the voice pitch of the speakers and the synthetic voices except for one synthetic voice.

APA, Harvard, Vancouver, ISO, and other styles

6

Cohen, Andrew Dight. "The use of learnable phonetic representations in connectionist text-to-speech system." Thesis, University of Reading, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360787.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Breitenbücher, Mark. "Textvorverarbeitung zur deutschen Version des Festival Text-to-Speech Synthese Systems." [S.l.] : Universität Stuttgart , Fakultät Philosophie, 1997. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB6783514.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Lambert, Tanya. "Databases for concatenative text-to-speech synthesis systems : unit selection and knowledge-based approach." Thesis, University of East Anglia, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.421192.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Xiao, He. "An affective personality for an embodied conversational agent." Curtin University of Technology, Department of Computer Engineering, 2006. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=16139.

Full text

Abstract:

Curtin Universitys Embodied Conversational Agents (ECA) combine an MPEG-4 compliant Facial Animation Engine (FAE), a Text To Emotional Speech Synthesiser (TTES), and a multi-modal Dialogue Manager (DM), that accesses a Knowledge Base (KB) and outputs Virtual Human Markup Language (VHML) text which drives the TTES and FAE. A user enters a question and an animated ECA responds with a believable and affective voice and actions. However, this response to the user is normally marked up in VHML by the KB developer to produce the required facial gestures and emotional display. A real person does not react by fixed rules but on personality, beliefs, previous experiences, and training. This thesis details the design, implementation and pilot study evaluation of an Affective Personality Model for an ECA. The thesis discusses the Email Agent system that informs a user when they have email. The system, built in Curtins ECA environment, has personality traits of Friendliness, Extraversion and Neuroticism. A small group of participants evaluated the Email Agent system to determine the effectiveness of the implemented personality system. An analysis of the qualitative and quantitative results from questionnaires is presented.

APA, Harvard, Vancouver, ISO, and other styles

10

XIE, GING-JIANG, and 謝清江. "A Chinese text-to-speech system based on formant synthesis." Thesis, 1987. http://ndltd.ncl.edu.tw/handle/68840754016731337307.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

HUANG, SHAO-HUA, and 黃紹華. "A synthesis of prosodic information in mandarin text-to-speech system." Thesis, 1991. http://ndltd.ncl.edu.tw/handle/08240353472600497334.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Fu, Zhen-Hong, and 傅振宏. "Automatic Generation of Synthesis Units for Taiwanese Text-to-Speech System." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/46706238089789082381.

Full text

Abstract:

碩士
長庚大學
電機工程研究所
88
In this thesis, we’ll demonstrate a Taiwanese (Min-nan) text-to-speech (TTS) system based on automatically generated synthetic units. It can read out any modern Taiwanese articles rather naturally. This TTS system is composed of 3 functional modules, namely a text analysis module, a prosody module, and a waveform synthesis module. Modern Taiwanese texts consist of Chinese characters and English alphabets simultaneously. For this reason, the text analysis module should be able to deal with the Chinese-English mixed texts first of all. In this module, text normalization, words segmentation, letter-to-phonemes and word frequency are used to deal with the multi-pronunciation. The prosody module process tone sandhi appearance and phonetic variation in Taiwanese. The synthetic units in the waveform synthesis module come from 2 sources: (1) the isolated-uttered tonal syllables including all possible tonal variations in Taiwanese, totally about 4521 in numbers, (2) the automatically generated synthetic units from a designated speech corpus. We employ a HMM-based large vocabulary Taiwanese speech recognition system to do the forced alignment for the speech corpus. The short pause recognition was proposed in the recognition system. After the synthesis units string has been extracted, the inter-syllable coarticulation information will be applied to decide how to concatenation these units. After the energy normalization, the output speech was generated. We evaluate our system on automatically segmented speech. Comparing with the human segmentation, about 85% correct rate can be achieved. The system was already implemented on a PC running MS-windows 9x/NT/2000.

APA, Harvard, Vancouver, ISO, and other styles

13

Konakanchi, Parthasarathy. "A Research Bed For Unit Selection Based Text To Speech Synthesis System." Thesis, 2009. http://etd.iisc.ernet.in/handle/2005/1348.

Full text

Abstract:

After trying Festival Speech Synthesis System, we decided to develop our own TTS framework, conducive to perform the necessary research experiments for developing good quality TTS for Indian languages. In most of the attempts on Indian language TTS, there is no prosody model, provision for handling foreign language words and no phrase break prediction leading to the possibility of introducing appropriate pauses in the synthesized speech. Further, in the Indian context, there is a real felt need for a bilingual TTS, involving English, along with the Indian language. In fact, it may be desirable to also have a trilingual TTS, which can also take care of the language of the neighboring state or Hindi, in addition. Thus, there is a felt need for a full-fledged TTS development framework, which lends itself for experimentation involving all the above issues and more. This thesis work is therefore such a serious attempt to develop a modular, unit selection based TTS framework. The developed system has been tested for its effectiveness to create intelligible speech in Tamil and Kannada. The created system has also been used to carry out two research experiments on TTS. The first part of the work is the design and development of corpus-based concatenative Tamil speech synthesizer in Matlab and C. A synthesis database has been created with 1027 phonetically rich, pre-recorded sentences, segmented at the phone level. From the sentence to be synthesized, specifications of the required target units are predicted. During synthesis, database units are selected that best match the target specification according to a distance metric and a concatenation quality metric. To accelerate matching, the features of the end frames of the database units have been precomputed and stored. The selected units are concatenated to produce synthetic speech. The high values of the obtained mean opinion scores for the TTS output reveal that speech synthesized using our TTS is intelligible and acceptably natural and can possibly be put to commercial use with some additional features. Experiments carried out by others using my TTS framework have shown that, whenever the required phonetic context is not available in the synthesis database., similar phones that are perceptually indistinguishable may be substituted. The second part of the work deals with the design and modification of the developed TTS framework to be embedded in mobile phones. Commercial GSM FR, EFR and AMR speech codecs are used for compressing our synthesis database. Perception experiments reveal that speech synthesized using a highly compressed database is reasonably natural. This holds promise in the future to read SMSs and emails on mobile phones in Indian languages. Finally, we observe that incorporating prosody and pause models for Indian language TTS would further enhance the quality of the synthetic speech. These are some of the potential, unexplored areas ahead, for research in speech synthesis in Indian languages.

APA, Harvard, Vancouver, ISO, and other styles

14

Chang, Tang-Yu, and 張唐瑜. "A Mandarin Text-to-speech System Using A Large Number Of Words As Synthesis Units." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/86362630071664979157.

Full text

Abstract:

碩士
國立中興大學
資訊科學研究所
93
In recent years, many TTS(Text to Speech) systems were implemented in a corpus-based structure. Synthesis units are directly connected in synthesizing, This method can get good speech quality. Compared with traditional TTS using syllables as synthesis units, it is more acceptable to the listeners. This type of system have the following features: (1) Recording a large speech corpus. (2) Signal processing is not used. (3)Select proper synthesis units. A powerful company - Microsoft, has developed a corpus-based bilingual ( Mandarin / English ) system - Mulan. The corpus is constructed by recording sentences. In their corpus, a hierarchical prosodic structure is used. In the module of selecting non-uniform synthesis units, decision tree is used. There is no signal processing in the procedure. However, rare words may not appear in the recorded sentences. Another reason is that there is weak co-articulation between word and word. So, we decide to use a word set (word-based) as synthesis units, and use rule-based methods to select synthesis units. There is little signal processing – only Fading-in and Fading-out. Totally, we have recorded about 12224 two-character words and 2690 three-character words. Break in speaking is very important in understanding a sentence. We employ CART (Classification And Regression Tree) to predict boundary types. The result is used to make corresponding break. In this paper, we implemented a Mandarin TTS system. We preformed naturalness testing, preference testing and intelligibility testing in our evaluation.

APA, Harvard, Vancouver, ISO, and other styles

15

Chen, Jau-Hung, and 陳昭宏. "A Study on Synthesis Unit Selection and Prosodic Information Generation in a Chinese Text-to-Speech System." Thesis, 1998. http://ndltd.ncl.edu.tw/handle/74871225642803891214.

Full text

Abstract:

博士
國立成功大學
資訊工程學系
86
In this dissertation, some approaches to synthesis unit selection and prosodicinformation generation are proposed for Chinese text-to-speech conversion. The monosyllables are adopted as the basic synthesis units. A set of synthesis units is selected from a large continuous speech database based on two cost functions which minimize the inter- and intra-syllable distortion. The speech database is also employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a sentence. This template tree stores the prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. On the other hand, a Bayesian network is used to model the relationship between linguistic features and prosodic information. Finally, a Speech Activated Telephony Email Reader (SATER) is proposed. SATER is an integrated system combining speaker verification, network, and text-to-speech conversion. A registered user can activate and listen to his email through a wired/wireless telephone. In the speaker verification subsystem, a time-varying verification phrase is adopted. The speaker''s password is used to generate the verification phrases for that speaker. A hidden Markov Model with states of variable number is used to model each verification phrase. The experimental results for the TTS conversion system showed that synthesized prosodic features quite resembled their original counterparts for most syllables in the inside test. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches.

APA, Harvard, Vancouver, ISO, and other styles

16

Lin, Tso-Chih, and 林佐治. "A Study of Synthesis Unit Selection and Phrase Prosody Adjustment in a Chinese Text-to-Speech System." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/16675087551502426246.

Full text

Abstract:

碩士
國立成功大學
資訊工程學系研究所
85
In this thesis, we present a Chinese text-to-speech conversion system whic hfocuses on coarticulation processing and phrase-prosody adjustment. Atime-domain waveform synthesis algorithm - pitch synchronous overlap and add(TD-PSOLA) - is applied to the speech synthesizer for producing high qua litysynthetic speech. In order to process the coarticulation between conn ectedsyllables, synthesis units are extracted from a large continuous speechdatabase. By cooperation with a rule-based unit selection module, we cho ose asynthesis unit from several candidates for each syllable. Considering th espectral discontinuity caused by direct connection of two individual speech, atime-domain spectral smoothing method is adopted. In order to improve thenatu ralness of the synthetic speech, a method for prosodic modification isproposed to replace the traditional rule-based approach for pronunciation. Bythe obser vation, it appears that the prosodic properties of a Chinese word isclosely af fected by the tone combination of monosyllables and its position inthe sentenc e. Consequently, we create a word-prosody database from thecontinuous speech d atabase. Each word-prosody pattern contains the length,energy variation, pitch period contour (coded by Cubic Spline) of everysyllable in the word. With the result of lexical analysis, the prosodygeneration module selects a proper pro sodic pattern for each word.

APA, Harvard, Vancouver, ISO, and other styles

17

"Cantonese text-to-speech synethesis using sub-syllable units." 2001. http://library.cuhk.edu.hk/record=b5890790.

Full text

Abstract:

Law Ka Man = 利用子音節的粤語文語轉換系統 / 羅家文.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references.
Text in English; abstracts in English and Chinese.
Law Ka Man = Li yong zi yin jie de Yue yu wen yu zhuan huan xi tong / Luo Jiawen.
Chapter 1. --- INTRODUCTION --- p.1
Chapter 1.1 --- Text analysis --- p.2
Chapter 1.2 --- Prosody prediction --- p.3
Chapter 1.3 --- Speech generation --- p.3
Chapter 1.4 --- The trend of TTS technology --- p.5
Chapter 1.5 --- TTS systems for different languages --- p.6
Chapter 1.6 --- Objectives of the thesis --- p.8
Chapter 1.7 --- Thesis outline --- p.8
References --- p.10
Chapter 2. --- BACKGROUND --- p.11
Chapter 2.1 --- Cantonese phonology --- p.11
Chapter 2.2 --- Cantonese TTS - a baseline system --- p.16
Chapter 2.3 --- Time-Domain Prrch-Synchronous-OverLap-Add --- p.17
Chapter 2.3.1 --- "From, speech signal to short-time analysis signals" --- p.18
Chapter 2.3.2 --- From short-time analysis signals to short-time synthesis signals --- p.19
Chapter 2.3.3 --- From short-time synthesis signals to synthetic speech --- p.20
Chapter 2.4 --- Time-scale and Pitch-scale modifications --- p.20
Chapter 2.4.1 --- Voiced speech --- p.20
Chapter 2.4.2 --- Unvoiced speech --- p.21
Chapter 2.5 --- Summary --- p.22
References --- p.23
Chapter 3. --- SUB-SYLLABLE BASED TTS SYSTEM --- p.24
Chapter 3.1 --- Motivations --- p.24
Chapter 3.2 --- Choices of synthesis units --- p.27
Chapter 3.2.1 --- Sub-syllable unit --- p.29
Chapter 3.2.2 --- "Diphones, demi-syllables and sub-syllable units" --- p.31
Chapter 3.3 --- Proposed TTS system --- p.32
Chapter 3.3.1 --- Text analysis module --- p.33
Chapter 3.3.2 --- Synthesis module --- p.36
Chapter 3.3.3 --- Prosody module --- p.37
Chapter 3.4 --- Summary --- p.38
References --- p.39
Chapter 4. --- ACOUSTIC INVENTORY --- p.40
Chapter 4.1 --- The full set of Cantonese sub-syllable units --- p.40
Chapter 4.2 --- A reduced set of sub-syllable units --- p.42
Chapter 4.3 --- Corpus design --- p.44
Chapter 4.4 --- Recording --- p.46
Chapter 4.5 --- Post-processing of speech data --- p.47
Chapter 4.6 --- Summary --- p.51
References --- p.51
Chapter 5. --- CONCATENATION TECHNIQUES --- p.52
Chapter 5.1 --- Concatenation of sub-syllable units --- p.52
Chapter 5.1.1 --- Concatenation of plosives and affricates --- p.54
Chapter 5.1.2 --- Concatenation of fricatives --- p.55
Chapter 5.1.3 --- "Concatenation of vowels, semi-vowels and nasals" --- p.55
Chapter 5.1.4 --- Spectral distance measure --- p.57
Chapter 5.2 --- Waveform concatenation method --- p.58
Chapter 5.3 --- Selected examples of waveform concatenation --- p.59
Chapter 5.3.1 --- I-I concatenation --- p.60
Chapter 5.3.2 --- F-F concatenation --- p.66
Chapter 5.4 --- Summary --- p.71
References --- p.72
Chapter 6. --- PERFORMANCE EVALUATION --- p.73
Chapter 6.1 --- Listening test --- p.73
Chapter 6.2 --- Test results： --- p.74
Chapter 6.3 --- Discussions --- p.75
References --- p.78
Chapter 7. --- CONCLUSIONS & FUTURE WORKS --- p.79
Chapter 7.1 --- Conclusions --- p.79
Chapter 7.2 --- Suggested future work --- p.81
APPENDIX 1 SYLLABLE DURATION --- p.82
APPENDIX 2 PERCEPTUAL TEST PARAGRAPHS --- p.86

APA, Harvard, Vancouver, ISO, and other styles

18

"Prosody analysis and modeling for Cantonese text-to-speech." 2003. http://library.cuhk.edu.hk/record=b5891678.

Full text

Abstract:

Li Yu Jia.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references.
Abstracts in English and Chinese.
Chapter Chapter 1 --- Introduction --- p.1
Chapter 1.1. --- TTS Technology --- p.1
Chapter 1.2. --- Prosody --- p.2
Chapter 1.2.1. --- What is Prosody --- p.2
Chapter 1.2.2. --- Prosody from Different Perspectives --- p.3
Chapter 1.2.3. --- Acoustical Parameters of Prosody --- p.3
Chapter 1.2.4. --- Prosody in TTS --- p.5
Chapter 1.2.4.1 --- Analysis --- p.5
Chapter 1.2.4.2 --- Modeling --- p.6
Chapter 1.2.4.3 --- Evaluation --- p.6
Chapter 1.3. --- Thesis Objectives --- p.7
Chapter 1.4. --- Thesis Outline --- p.7
Reference --- p.8
Chapter Chapter 2 --- Cantonese --- p.9
Chapter 2.1. --- The Cantonese Dialect --- p.9
Chapter 2.1.1. --- Phonology --- p.10
Chapter 2.1.1.1 --- Initial --- p.11
Chapter 2.1.1.2 --- Final --- p.12
Chapter 2.1.1.3 --- Tone --- p.13
Chapter 2.1.2. --- Phonological Constraints --- p.14
Chapter 2.2. --- Tones in Cantonese --- p.15
Chapter 2.2.1. --- Tone System --- p.15
Chapter 2.2.2. --- Linguistic Significance --- p.18
Chapter 2.2.3. --- Acoustical Realization --- p.18
Chapter 2.3. --- Prosodic Variation in Continuous Cantonese Speech --- p.20
Chapter 2.4. --- Cantonese Speech Corpus - CUProsody --- p.21
Reference --- p.23
Chapter Chapter 3 --- F0 Normalization --- p.25
Chapter 3.1. --- F0 in Speech Production --- p.25
Chapter 3.2. --- F0 Extraction --- p.27
Chapter 3.3. --- Duration-normalized Tone Contour --- p.29
Chapter 3.4. --- F0 Normalization --- p.30
Chapter 3.4.1. --- Necessity and Motivation --- p.30
Chapter 3.4.2. --- F0 Normalization --- p.33
Chapter 3.4.2.1 --- Methodology --- p.33
Chapter 3.4.2.2 --- Assumptions --- p.34
Chapter 3.4.2.3 --- Estimation of Relative Tone Ratios --- p.35
Chapter 3.4.2.4 --- Derivation of Phrase Curve --- p.37
Chapter 3.4.2.5 --- Normalization of Absolute FO Values --- p.39
Chapter 3.4.3. --- Experiments and Discussion --- p.39
Chapter 3.5. --- Conclusions --- p.44
Reference --- p.45
Chapter Chapter 4 --- Acoustical FO Analysis --- p.48
Chapter 4.1. --- Methodology of FO Analysis --- p.48
Chapter 4.1.1. --- Analysis-by-Synthesis --- p.48
Chapter 4.1.2. --- Acoustical Analysis --- p.51
Chapter 4.2. --- Acoustical FO Analysis for Cantonese --- p.52
Chapter 4.2.1. --- Analysis of Phrase Curves --- p.52
Chapter 4.2.2. --- Analysis of Tone Contours --- p.55
Chapter 4.2.2.1 --- Context-independent Single-tone Contours --- p.56
Chapter 4.2.2.2 --- Contextual Variation --- p.58
Chapter 4.2.2.3 --- Co-articulated Tone Contours of Disyllabic Word --- p.59
Chapter 4.2.2.4 --- Cross-word Contours --- p.62
Chapter 4.2.2.5 --- Phrase-initial Tone Contours --- p.65
Chapter 4.3. --- Summary --- p.66
Reference --- p.67
Chapter Chapter5 --- Prosody Modeling for Cantonese Text-to-Speech --- p.70
Chapter 5.1. --- Parametric Model and Non-parametric Model --- p.70
Chapter 5.2. --- Cantonese Text-to-Speech: Baseline System --- p.72
Chapter 5.2.1. --- Sub-syllable Unit --- p.72
Chapter 5.2.2. --- Text Analysis Module --- p.73
Chapter 5.2.3. --- Acoustical Synthesis --- p.74
Chapter 5.2.4. --- Prosody Module --- p.74
Chapter 5.3. --- Enhanced Prosody Model --- p.74
Chapter 5.3.1. --- Modeling Tone Contours --- p.75
Chapter 5.3.1.1 --- Word-level FO Contours --- p.76
Chapter 5.3.1.2 --- Phrase-initial Tone Contours --- p.77
Chapter 5.3.1.3 --- Tone Contours at Word Boundary --- p.78
Chapter 5.3.2. --- Modeling Phrase Curves --- p.79
Chapter 5.3.3. --- Generation of Continuous FO Contours --- p.81
Chapter 5.4. --- Summary --- p.81
Reference --- p.82
Chapter Chapter 6 --- Performance Evaluation --- p.83
Chapter 6.1. --- Introduction to Perceptual Test --- p.83
Chapter 6.1.1. --- Aspects of Evaluation --- p.84
Chapter 6.1.2. --- Methods of Judgment Test --- p.84
Chapter 6.1.3. --- Problems in Perceptual Test --- p.85
Chapter 6.2. --- Perceptual Tests for Cantonese TTS --- p.86
Chapter 6.2.1. --- Intelligibility Tests --- p.86
Chapter 6.2.1.1 --- Method --- p.86
Chapter 6.2.1.2 --- Results --- p.88
Chapter 6.2.1.3 --- Analysis --- p.89
Chapter 6.2.2. --- Naturalness Tests --- p.90
Chapter 6.2.2.1 --- Word-level --- p.90
Chapter 6.2.2.1.1 --- Method --- p.90
Chapter 6.2.2.1.2 --- Results --- p.91
Chapter 6.2.3.1.3 --- Analysis --- p.91
Chapter 6.2.2.2 --- Sentence-level --- p.92
Chapter 6.2.2.2.1 --- Method --- p.92
Chapter 6.2.2.2.2 --- Results --- p.93
Chapter 6.2.2.2.3 --- Analysis --- p.94
Chapter 6.3. --- Conclusions --- p.95
Chapter 6.4. --- Summary --- p.95
Reference --- p.96
Chapter Chapter 7 --- Conclusions and Future Work --- p.97
Chapter 7.1. --- Conclusions --- p.97
Chapter 7.2. --- Suggested Future Work --- p.99
Appendix --- p.100
Appendix 1 Linear Regression --- p.100
Appendix 2 36 Templates of Cross-word Contours --- p.101
Appendix 3 Word List for Word-level Tests --- p.102
Appendix 4 Syllable Occurrence in Word List of Intelligibility Test --- p.108
Appendix 5 Wrongly Identified Word List --- p.112
Appendix 6 Confusion Matrix --- p.115
Appendix 7 Unintelligible Word List --- p.117
Appendix 8 Noisy Word List --- p.119
Appendix 9 Sentence List for Naturalness Test --- p.120

APA, Harvard, Vancouver, ISO, and other styles

19

"Unit selection and waveform concatenation strategies in Cantonese text-to-speech." 2005. http://library.cuhk.edu.hk/record=b5892349.

Full text

Abstract:

Oey Sai Lok.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.
Includes bibliographical references.
Abstracts in English and Chinese.
Chapter 1. --- Introduction --- p.1
Chapter 1.1 --- An overview of Text-to-Speech technology --- p.2
Chapter 1.1.1 --- Text processing --- p.2
Chapter 1.1.2 --- Acoustic synthesis --- p.3
Chapter 1.1.3 --- Prosody modification --- p.4
Chapter 1.2 --- Trends in Text-to-Speech technologies --- p.5
Chapter 1.3 --- Objectives of this thesis --- p.7
Chapter 1.4 --- Outline of the thesis --- p.9
References --- p.11
Chapter 2. --- Cantonese Speech --- p.13
Chapter 2.1 --- The Cantonese dialect --- p.13
Chapter 2.2 --- Phonology of Cantonese --- p.14
Chapter 2.2.1 --- Initials --- p.15
Chapter 2.2.2 --- Finals --- p.16
Chapter 2.2.3 --- Tones --- p.18
Chapter 2.3 --- Acoustic-phonetic properties of Cantonese syllables --- p.19
References --- p.24
Chapter 3. --- Cantonese Text-to-Speech --- p.25
Chapter 3.1 --- General overview --- p.25
Chapter 3.1.1 --- Text processing --- p.25
Chapter 3.1.2 --- Corpus based acoustic synthesis --- p.26
Chapter 3.1.3 --- Prosodic control --- p.27
Chapter 3.2 --- Syllable based Cantonese Text-to-Speech system --- p.28
Chapter 3.3 --- Sub-syllable based Cantonese Text-to-Speech system --- p.29
Chapter 3.3.1 --- Definition of sub-syllable units --- p.29
Chapter 3.3.2 --- Acoustic inventory --- p.31
Chapter 3.3.3 --- Determination of the concatenation points --- p.33
Chapter 3.4 --- Problems --- p.34
References --- p.36
Chapter 4. --- Waveform Concatenation for Sub-syllable Units --- p.37
Chapter 4.1 --- Previous work in concatenation methods --- p.37
Chapter 4.1.1 --- Determination of concatenation point --- p.38
Chapter 4.1.2 --- Waveform concatenation --- p.38
Chapter 4.2 --- Problems and difficulties in concatenating sub-syllable units --- p.39
Chapter 4.2.1 --- Mismatch of acoustic properties --- p.40
Chapter 4.2.2 --- "Allophone problem of Initials /z/, Id and /s/" --- p.42
Chapter 4.3 --- General procedures in concatenation strategies --- p.44
Chapter 4.3.1 --- Concatenation of unvoiced segments --- p.45
Chapter 4.3.2 --- Concatenation of voiced segments --- p.45
Chapter 4.3.3 --- Measurement of spectral distance --- p.48
Chapter 4.4 --- Detailed procedures in concatenation points determination --- p.50
Chapter 4.4.1 --- Unvoiced segments --- p.50
Chapter 4.4.2 --- Voiced segments --- p.53
Chapter 4.5 --- Selected examples in concatenation strategies --- p.58
Chapter 4.5.1 --- Concatenation at Initial segments --- p.58
Chapter 4.5.1.1 --- Plosives --- p.58
Chapter 4.5.1.2 --- Fricatives --- p.59
Chapter 4.5.2 --- Concatenation at Final segments --- p.60
Chapter 4.5.2.1 --- V group (long vowel) --- p.60
Chapter 4.5.2.2 --- D group (diphthong) --- p.61
References --- p.63
Chapter 5. --- Unit Selection for Sub-syllable Units --- p.65
Chapter 5.1 --- Basic requirements in unit selection process --- p.65
Chapter 5.1.1 --- Availability of multiple copies of sub-syllable units --- p.65
Chapter 5.1.1.1 --- "Levels of ""identical""" --- p.66
Chapter 5.1.1.2 --- Statistics on the availability --- p.67
Chapter 5.1.2 --- Variations in acoustic parameters --- p.70
Chapter 5.1.2.1 --- Pitch level --- p.71
Chapter 5.1.2.2 --- Duration --- p.74
Chapter 5.1.2.3 --- Intensity level --- p.75
Chapter 5.2 --- Selection process: availability check on sub-syllable units --- p.77
Chapter 5.2.1 --- Multiple copies found --- p.79
Chapter 5.2.2 --- Unique copy found --- p.79
Chapter 5.2.3 --- No matched copy found --- p.80
Chapter 5.2.4 --- Illustrative examples --- p.80
Chapter 5.3 --- Selection process: acoustic analysis on candidate units --- p.81
References --- p.88
Chapter 6. --- Performance Evaluation --- p.89
Chapter 6.1 --- General information --- p.90
Chapter 6.1.1 --- Objective test --- p.90
Chapter 6.1.2 --- Subjective test --- p.90
Chapter 6.1.3 --- Test materials --- p.91
Chapter 6.2 --- Details of the objective test --- p.92
Chapter 6.2.1 --- Testing method --- p.92
Chapter 6.2.2 --- Results --- p.93
Chapter 6.2.3 --- Analysis --- p.96
Chapter 6.3 --- Details of the subjective test --- p.98
Chapter 6.3.1 --- Testing method --- p.98
Chapter 6.3.2 --- Results --- p.99
Chapter 6.3.3 --- Analysis --- p.101
Chapter 6.4 --- Summary --- p.107
References --- p.108
Chapter 7. --- Conclusions and Future Works --- p.109
Chapter 7.1 --- Conclusions --- p.109
Chapter 7.2 --- Suggested future works --- p.111
References --- p.113
Appendix 1 Mean pitch level of Initials and Finals stored in the inventory --- p.114
Appendix 2 Mean durations of Initials and Finals stored in the inventory --- p.121
Appendix 3 Mean intensity level of Initials and Finals stored in the inventory --- p.124
Appendix 4 Test word used in performance evaluation --- p.127
Appendix 5 Test paragraph used in performance evaluation --- p.128
Appendix 6 Pitch profile used in the Text-to-Speech system --- p.131
Appendix 7 Duration model used in Text-to-Speech system --- p.132

APA, Harvard, Vancouver, ISO, and other styles

20

Rato, João Pedro Cordeiro. "Conversação homem-máquina. Caracterização e avaliação do estado actual das soluções de speech recognition, speech synthesis e sistemas de conversação homem-máquina." Master's thesis, 2016. http://hdl.handle.net/10400.8/2375.

Full text

Abstract:

A comunicação verbal humana é realizada em dois sentidos, existindo uma compreensão de ambas as partes que resulta em determinadas considerações. Este tipo de comunicação, também chamada de diálogo, para além de agentes humanos pode ser constituído por agentes humanos e máquinas. A interação entre o Homem e máquinas, através de linguagem natural, desempenha um papel importante na melhoria da comunicação entre ambos. Com o objetivo de perceber melhor a comunicação entre Homem e máquina este documento apresenta vários conhecimentos sobre sistemas de conversação Homemmáquina, entre os quais, os seus módulos e funcionamento, estratégias de diálogo e desafios a ter em conta na sua implementação. Para além disso, são ainda apresentados vários sistemas de Speech Recognition, Speech Synthesis e sistemas que usam conversação Homem-máquina. Por último são feitos testes de performance sobre alguns sistemas de Speech Recognition e de forma a colocar em prática alguns conceitos apresentados neste trabalho, é apresentado a implementação de um sistema de conversação Homem-máquina. Sobre este trabalho várias ilações foram obtidas, entre as quais, a alta complexidade dos sistemas de conversação Homem-máquina, a baixa performance no reconhecimento de voz em ambientes com ruído e as barreiras que se podem encontrar na implementação destes sistemas.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!