Dissertations / Theses on the topic 'Text-to-speech synthesis system'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 20 dissertations / theses for your research on the topic 'Text-to-speech synthesis system.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Micallef, Paul. "A text to speech synthesis system for Maltese." Thesis, University of Surrey, 1997. http://epubs.surrey.ac.uk/842702/.
Full textBaloyi, Ntsako. "A text-to-speech synthesis system for Xitsonga using hidden Markov models." Thesis, University of Limpopo (Turfloop Campus), 2012. http://hdl.handle.net/10386/1021.
Full textThis research study focuses on building a general-purpose working Xitsonga speech synthesis system that is as far as can be possible reasonably intelligible, natural sounding, and flexible. The system built has to be able to model some of the desirable speaker characteristics and speaking styles. This research project forms part of the broader national speech technology project that aims at developing spoken language systems for human-machine interaction using the eleven official languages of South Africa (SA). Speech synthesis is the reverse of automatic speech recognition (which receives speech as input and converts it to text) in that it receives text as input and produces synthesized speech as output. It is generally accepted that most people find listening to spoken utterances better that reading the equivalent of such utterances. The Xitsonga speech synthesis system has been developed using a hidden Markov model (HMM) speech synthesis method. The HMM-based speech synthesis (HTS) system synthesizes speech that is intelligible, and natural sounding. This method can synthesize speech on a footprint of only a few megabytes of training speech data. The HTS toolkit is applied as a patch to the HTK toolkit which is a hidden Markov model toolkit primarily designed for use in speech recognition to build and manipulate hidden Markov models.
Yoon, Kyuchul. "Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system." Connect to this title online, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1119010941.
Full textTitle from first page of PDF file. Document formatted into pages; contains xxii, 291 p.; also includes graphics (some col.) Includes bibliographical references (p. 210-216). Available online via OhioLINK's ETD Center
Beněk, Tomáš. "Implementing and Improving a Speech Synthesis System." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236079.
Full textMalatji, Promise Tshepiso. "The development of accented English synthetic voices." Thesis, University of Limpopo, 2019. http://hdl.handle.net/10386/2917.
Full textA Text-to-speech (TTS) synthesis system is a software system that receives text as input and produces speech as output. A TTS synthesis system can be used for, amongst others, language learning, and reading out text for people living with different disabilities, i.e., physically challenged, visually impaired, etc., by native and non-native speakers of the target language. Most people relate easily to a second language spoken by a non-native speaker they share a native language with. Most online English TTS synthesis systems are usually developed using native speakers of English. This research study focuses on developing accented English synthetic voices as spoken by non-native speakers in the Limpopo province of South Africa. The Modular Architecture for Research on speech sYnthesis (MARY) TTS engine is used in developing the synthetic voices. The Hidden Markov Model (HMM) method was used to train the synthetic voices. Secondary training text corpus is used to develop the training speech corpus by recording six speakers reading the text corpus. The quality of developed synthetic voices is measured in terms of their intelligibility, similarity and naturalness using a listening test. The results in the research study are classified based on evaluators’ occupation and gender and the overall results. The subjective listening test indicates that the developed synthetic voices have a high level of acceptance in terms of similarity and intelligibility. A speech analysis software is used to compare the recorded synthesised speech and the human recordings. There is no significant difference in the voice pitch of the speakers and the synthetic voices except for one synthetic voice.
Cohen, Andrew Dight. "The use of learnable phonetic representations in connectionist text-to-speech system." Thesis, University of Reading, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360787.
Full textBreitenbücher, Mark. "Textvorverarbeitung zur deutschen Version des Festival Text-to-Speech Synthese Systems." [S.l.] : Universität Stuttgart , Fakultät Philosophie, 1997. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB6783514.
Full textLambert, Tanya. "Databases for concatenative text-to-speech synthesis systems : unit selection and knowledge-based approach." Thesis, University of East Anglia, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.421192.
Full textXiao, He. "An affective personality for an embodied conversational agent." Curtin University of Technology, Department of Computer Engineering, 2006. http://espace.library.curtin.edu.au:80/R/?func=dbin-jump-full&object_id=16139.
Full textXIE, GING-JIANG, and 謝清江. "A Chinese text-to-speech system based on formant synthesis." Thesis, 1987. http://ndltd.ncl.edu.tw/handle/68840754016731337307.
Full textHUANG, SHAO-HUA, and 黃紹華. "A synthesis of prosodic information in mandarin text-to-speech system." Thesis, 1991. http://ndltd.ncl.edu.tw/handle/08240353472600497334.
Full textFu, Zhen-Hong, and 傅振宏. "Automatic Generation of Synthesis Units for Taiwanese Text-to-Speech System." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/46706238089789082381.
Full text長庚大學
電機工程研究所
88
In this thesis, we’ll demonstrate a Taiwanese (Min-nan) text-to-speech (TTS) system based on automatically generated synthetic units. It can read out any modern Taiwanese articles rather naturally. This TTS system is composed of 3 functional modules, namely a text analysis module, a prosody module, and a waveform synthesis module. Modern Taiwanese texts consist of Chinese characters and English alphabets simultaneously. For this reason, the text analysis module should be able to deal with the Chinese-English mixed texts first of all. In this module, text normalization, words segmentation, letter-to-phonemes and word frequency are used to deal with the multi-pronunciation. The prosody module process tone sandhi appearance and phonetic variation in Taiwanese. The synthetic units in the waveform synthesis module come from 2 sources: (1) the isolated-uttered tonal syllables including all possible tonal variations in Taiwanese, totally about 4521 in numbers, (2) the automatically generated synthetic units from a designated speech corpus. We employ a HMM-based large vocabulary Taiwanese speech recognition system to do the forced alignment for the speech corpus. The short pause recognition was proposed in the recognition system. After the synthesis units string has been extracted, the inter-syllable coarticulation information will be applied to decide how to concatenation these units. After the energy normalization, the output speech was generated. We evaluate our system on automatically segmented speech. Comparing with the human segmentation, about 85% correct rate can be achieved. The system was already implemented on a PC running MS-windows 9x/NT/2000.
Konakanchi, Parthasarathy. "A Research Bed For Unit Selection Based Text To Speech Synthesis System." Thesis, 2009. http://etd.iisc.ernet.in/handle/2005/1348.
Full textChang, Tang-Yu, and 張唐瑜. "A Mandarin Text-to-speech System Using A Large Number Of Words As Synthesis Units." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/86362630071664979157.
Full text國立中興大學
資訊科學研究所
93
In recent years, many TTS(Text to Speech) systems were implemented in a corpus-based structure. Synthesis units are directly connected in synthesizing, This method can get good speech quality. Compared with traditional TTS using syllables as synthesis units, it is more acceptable to the listeners. This type of system have the following features: (1) Recording a large speech corpus. (2) Signal processing is not used. (3)Select proper synthesis units. A powerful company - Microsoft, has developed a corpus-based bilingual ( Mandarin / English ) system - Mulan. The corpus is constructed by recording sentences. In their corpus, a hierarchical prosodic structure is used. In the module of selecting non-uniform synthesis units, decision tree is used. There is no signal processing in the procedure. However, rare words may not appear in the recorded sentences. Another reason is that there is weak co-articulation between word and word. So, we decide to use a word set (word-based) as synthesis units, and use rule-based methods to select synthesis units. There is little signal processing – only Fading-in and Fading-out. Totally, we have recorded about 12224 two-character words and 2690 three-character words. Break in speaking is very important in understanding a sentence. We employ CART (Classification And Regression Tree) to predict boundary types. The result is used to make corresponding break. In this paper, we implemented a Mandarin TTS system. We preformed naturalness testing, preference testing and intelligibility testing in our evaluation.
Chen, Jau-Hung, and 陳昭宏. "A Study on Synthesis Unit Selection and Prosodic Information Generation in a Chinese Text-to-Speech System." Thesis, 1998. http://ndltd.ncl.edu.tw/handle/74871225642803891214.
Full text國立成功大學
資訊工程學系
86
In this dissertation, some approaches to synthesis unit selection and prosodicinformation generation are proposed for Chinese text-to-speech conversion. The monosyllables are adopted as the basic synthesis units. A set of synthesis units is selected from a large continuous speech database based on two cost functions which minimize the inter- and intra-syllable distortion. The speech database is also employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a sentence. This template tree stores the prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. On the other hand, a Bayesian network is used to model the relationship between linguistic features and prosodic information. Finally, a Speech Activated Telephony Email Reader (SATER) is proposed. SATER is an integrated system combining speaker verification, network, and text-to-speech conversion. A registered user can activate and listen to his email through a wired/wireless telephone. In the speaker verification subsystem, a time-varying verification phrase is adopted. The speaker''s password is used to generate the verification phrases for that speaker. A hidden Markov Model with states of variable number is used to model each verification phrase. The experimental results for the TTS conversion system showed that synthesized prosodic features quite resembled their original counterparts for most syllables in the inside test. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches.
Lin, Tso-Chih, and 林佐治. "A Study of Synthesis Unit Selection and Phrase Prosody Adjustment in a Chinese Text-to-Speech System." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/16675087551502426246.
Full text國立成功大學
資訊工程學系研究所
85
In this thesis, we present a Chinese text-to-speech conversion system whic hfocuses on coarticulation processing and phrase-prosody adjustment. Atime-domain waveform synthesis algorithm - pitch synchronous overlap and add(TD-PSOLA) - is applied to the speech synthesizer for producing high qua litysynthetic speech. In order to process the coarticulation between conn ectedsyllables, synthesis units are extracted from a large continuous speechdatabase. By cooperation with a rule-based unit selection module, we cho ose asynthesis unit from several candidates for each syllable. Considering th espectral discontinuity caused by direct connection of two individual speech, atime-domain spectral smoothing method is adopted. In order to improve thenatu ralness of the synthetic speech, a method for prosodic modification isproposed to replace the traditional rule-based approach for pronunciation. Bythe obser vation, it appears that the prosodic properties of a Chinese word isclosely af fected by the tone combination of monosyllables and its position inthe sentenc e. Consequently, we create a word-prosody database from thecontinuous speech d atabase. Each word-prosody pattern contains the length,energy variation, pitch period contour (coded by Cubic Spline) of everysyllable in the word. With the result of lexical analysis, the prosodygeneration module selects a proper pro sodic pattern for each word.
"Cantonese text-to-speech synethesis using sub-syllable units." 2001. http://library.cuhk.edu.hk/record=b5890790.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references.
Text in English; abstracts in English and Chinese.
Law Ka Man = Li yong zi yin jie de Yue yu wen yu zhuan huan xi tong / Luo Jiawen.
Chapter 1. --- INTRODUCTION --- p.1
Chapter 1.1 --- Text analysis --- p.2
Chapter 1.2 --- Prosody prediction --- p.3
Chapter 1.3 --- Speech generation --- p.3
Chapter 1.4 --- The trend of TTS technology --- p.5
Chapter 1.5 --- TTS systems for different languages --- p.6
Chapter 1.6 --- Objectives of the thesis --- p.8
Chapter 1.7 --- Thesis outline --- p.8
References --- p.10
Chapter 2. --- BACKGROUND --- p.11
Chapter 2.1 --- Cantonese phonology --- p.11
Chapter 2.2 --- Cantonese TTS - a baseline system --- p.16
Chapter 2.3 --- Time-Domain Prrch-Synchronous-OverLap-Add --- p.17
Chapter 2.3.1 --- "From, speech signal to short-time analysis signals" --- p.18
Chapter 2.3.2 --- From short-time analysis signals to short-time synthesis signals --- p.19
Chapter 2.3.3 --- From short-time synthesis signals to synthetic speech --- p.20
Chapter 2.4 --- Time-scale and Pitch-scale modifications --- p.20
Chapter 2.4.1 --- Voiced speech --- p.20
Chapter 2.4.2 --- Unvoiced speech --- p.21
Chapter 2.5 --- Summary --- p.22
References --- p.23
Chapter 3. --- SUB-SYLLABLE BASED TTS SYSTEM --- p.24
Chapter 3.1 --- Motivations --- p.24
Chapter 3.2 --- Choices of synthesis units --- p.27
Chapter 3.2.1 --- Sub-syllable unit --- p.29
Chapter 3.2.2 --- "Diphones, demi-syllables and sub-syllable units" --- p.31
Chapter 3.3 --- Proposed TTS system --- p.32
Chapter 3.3.1 --- Text analysis module --- p.33
Chapter 3.3.2 --- Synthesis module --- p.36
Chapter 3.3.3 --- Prosody module --- p.37
Chapter 3.4 --- Summary --- p.38
References --- p.39
Chapter 4. --- ACOUSTIC INVENTORY --- p.40
Chapter 4.1 --- The full set of Cantonese sub-syllable units --- p.40
Chapter 4.2 --- A reduced set of sub-syllable units --- p.42
Chapter 4.3 --- Corpus design --- p.44
Chapter 4.4 --- Recording --- p.46
Chapter 4.5 --- Post-processing of speech data --- p.47
Chapter 4.6 --- Summary --- p.51
References --- p.51
Chapter 5. --- CONCATENATION TECHNIQUES --- p.52
Chapter 5.1 --- Concatenation of sub-syllable units --- p.52
Chapter 5.1.1 --- Concatenation of plosives and affricates --- p.54
Chapter 5.1.2 --- Concatenation of fricatives --- p.55
Chapter 5.1.3 --- "Concatenation of vowels, semi-vowels and nasals" --- p.55
Chapter 5.1.4 --- Spectral distance measure --- p.57
Chapter 5.2 --- Waveform concatenation method --- p.58
Chapter 5.3 --- Selected examples of waveform concatenation --- p.59
Chapter 5.3.1 --- I-I concatenation --- p.60
Chapter 5.3.2 --- F-F concatenation --- p.66
Chapter 5.4 --- Summary --- p.71
References --- p.72
Chapter 6. --- PERFORMANCE EVALUATION --- p.73
Chapter 6.1 --- Listening test --- p.73
Chapter 6.2 --- Test results: --- p.74
Chapter 6.3 --- Discussions --- p.75
References --- p.78
Chapter 7. --- CONCLUSIONS & FUTURE WORKS --- p.79
Chapter 7.1 --- Conclusions --- p.79
Chapter 7.2 --- Suggested future work --- p.81
APPENDIX 1 SYLLABLE DURATION --- p.82
APPENDIX 2 PERCEPTUAL TEST PARAGRAPHS --- p.86
"Prosody analysis and modeling for Cantonese text-to-speech." 2003. http://library.cuhk.edu.hk/record=b5891678.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references.
Abstracts in English and Chinese.
Chapter Chapter 1 --- Introduction --- p.1
Chapter 1.1. --- TTS Technology --- p.1
Chapter 1.2. --- Prosody --- p.2
Chapter 1.2.1. --- What is Prosody --- p.2
Chapter 1.2.2. --- Prosody from Different Perspectives --- p.3
Chapter 1.2.3. --- Acoustical Parameters of Prosody --- p.3
Chapter 1.2.4. --- Prosody in TTS --- p.5
Chapter 1.2.4.1 --- Analysis --- p.5
Chapter 1.2.4.2 --- Modeling --- p.6
Chapter 1.2.4.3 --- Evaluation --- p.6
Chapter 1.3. --- Thesis Objectives --- p.7
Chapter 1.4. --- Thesis Outline --- p.7
Reference --- p.8
Chapter Chapter 2 --- Cantonese --- p.9
Chapter 2.1. --- The Cantonese Dialect --- p.9
Chapter 2.1.1. --- Phonology --- p.10
Chapter 2.1.1.1 --- Initial --- p.11
Chapter 2.1.1.2 --- Final --- p.12
Chapter 2.1.1.3 --- Tone --- p.13
Chapter 2.1.2. --- Phonological Constraints --- p.14
Chapter 2.2. --- Tones in Cantonese --- p.15
Chapter 2.2.1. --- Tone System --- p.15
Chapter 2.2.2. --- Linguistic Significance --- p.18
Chapter 2.2.3. --- Acoustical Realization --- p.18
Chapter 2.3. --- Prosodic Variation in Continuous Cantonese Speech --- p.20
Chapter 2.4. --- Cantonese Speech Corpus - CUProsody --- p.21
Reference --- p.23
Chapter Chapter 3 --- F0 Normalization --- p.25
Chapter 3.1. --- F0 in Speech Production --- p.25
Chapter 3.2. --- F0 Extraction --- p.27
Chapter 3.3. --- Duration-normalized Tone Contour --- p.29
Chapter 3.4. --- F0 Normalization --- p.30
Chapter 3.4.1. --- Necessity and Motivation --- p.30
Chapter 3.4.2. --- F0 Normalization --- p.33
Chapter 3.4.2.1 --- Methodology --- p.33
Chapter 3.4.2.2 --- Assumptions --- p.34
Chapter 3.4.2.3 --- Estimation of Relative Tone Ratios --- p.35
Chapter 3.4.2.4 --- Derivation of Phrase Curve --- p.37
Chapter 3.4.2.5 --- Normalization of Absolute FO Values --- p.39
Chapter 3.4.3. --- Experiments and Discussion --- p.39
Chapter 3.5. --- Conclusions --- p.44
Reference --- p.45
Chapter Chapter 4 --- Acoustical FO Analysis --- p.48
Chapter 4.1. --- Methodology of FO Analysis --- p.48
Chapter 4.1.1. --- Analysis-by-Synthesis --- p.48
Chapter 4.1.2. --- Acoustical Analysis --- p.51
Chapter 4.2. --- Acoustical FO Analysis for Cantonese --- p.52
Chapter 4.2.1. --- Analysis of Phrase Curves --- p.52
Chapter 4.2.2. --- Analysis of Tone Contours --- p.55
Chapter 4.2.2.1 --- Context-independent Single-tone Contours --- p.56
Chapter 4.2.2.2 --- Contextual Variation --- p.58
Chapter 4.2.2.3 --- Co-articulated Tone Contours of Disyllabic Word --- p.59
Chapter 4.2.2.4 --- Cross-word Contours --- p.62
Chapter 4.2.2.5 --- Phrase-initial Tone Contours --- p.65
Chapter 4.3. --- Summary --- p.66
Reference --- p.67
Chapter Chapter5 --- Prosody Modeling for Cantonese Text-to-Speech --- p.70
Chapter 5.1. --- Parametric Model and Non-parametric Model --- p.70
Chapter 5.2. --- Cantonese Text-to-Speech: Baseline System --- p.72
Chapter 5.2.1. --- Sub-syllable Unit --- p.72
Chapter 5.2.2. --- Text Analysis Module --- p.73
Chapter 5.2.3. --- Acoustical Synthesis --- p.74
Chapter 5.2.4. --- Prosody Module --- p.74
Chapter 5.3. --- Enhanced Prosody Model --- p.74
Chapter 5.3.1. --- Modeling Tone Contours --- p.75
Chapter 5.3.1.1 --- Word-level FO Contours --- p.76
Chapter 5.3.1.2 --- Phrase-initial Tone Contours --- p.77
Chapter 5.3.1.3 --- Tone Contours at Word Boundary --- p.78
Chapter 5.3.2. --- Modeling Phrase Curves --- p.79
Chapter 5.3.3. --- Generation of Continuous FO Contours --- p.81
Chapter 5.4. --- Summary --- p.81
Reference --- p.82
Chapter Chapter 6 --- Performance Evaluation --- p.83
Chapter 6.1. --- Introduction to Perceptual Test --- p.83
Chapter 6.1.1. --- Aspects of Evaluation --- p.84
Chapter 6.1.2. --- Methods of Judgment Test --- p.84
Chapter 6.1.3. --- Problems in Perceptual Test --- p.85
Chapter 6.2. --- Perceptual Tests for Cantonese TTS --- p.86
Chapter 6.2.1. --- Intelligibility Tests --- p.86
Chapter 6.2.1.1 --- Method --- p.86
Chapter 6.2.1.2 --- Results --- p.88
Chapter 6.2.1.3 --- Analysis --- p.89
Chapter 6.2.2. --- Naturalness Tests --- p.90
Chapter 6.2.2.1 --- Word-level --- p.90
Chapter 6.2.2.1.1 --- Method --- p.90
Chapter 6.2.2.1.2 --- Results --- p.91
Chapter 6.2.3.1.3 --- Analysis --- p.91
Chapter 6.2.2.2 --- Sentence-level --- p.92
Chapter 6.2.2.2.1 --- Method --- p.92
Chapter 6.2.2.2.2 --- Results --- p.93
Chapter 6.2.2.2.3 --- Analysis --- p.94
Chapter 6.3. --- Conclusions --- p.95
Chapter 6.4. --- Summary --- p.95
Reference --- p.96
Chapter Chapter 7 --- Conclusions and Future Work --- p.97
Chapter 7.1. --- Conclusions --- p.97
Chapter 7.2. --- Suggested Future Work --- p.99
Appendix --- p.100
Appendix 1 Linear Regression --- p.100
Appendix 2 36 Templates of Cross-word Contours --- p.101
Appendix 3 Word List for Word-level Tests --- p.102
Appendix 4 Syllable Occurrence in Word List of Intelligibility Test --- p.108
Appendix 5 Wrongly Identified Word List --- p.112
Appendix 6 Confusion Matrix --- p.115
Appendix 7 Unintelligible Word List --- p.117
Appendix 8 Noisy Word List --- p.119
Appendix 9 Sentence List for Naturalness Test --- p.120
"Unit selection and waveform concatenation strategies in Cantonese text-to-speech." 2005. http://library.cuhk.edu.hk/record=b5892349.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 2005.
Includes bibliographical references.
Abstracts in English and Chinese.
Chapter 1. --- Introduction --- p.1
Chapter 1.1 --- An overview of Text-to-Speech technology --- p.2
Chapter 1.1.1 --- Text processing --- p.2
Chapter 1.1.2 --- Acoustic synthesis --- p.3
Chapter 1.1.3 --- Prosody modification --- p.4
Chapter 1.2 --- Trends in Text-to-Speech technologies --- p.5
Chapter 1.3 --- Objectives of this thesis --- p.7
Chapter 1.4 --- Outline of the thesis --- p.9
References --- p.11
Chapter 2. --- Cantonese Speech --- p.13
Chapter 2.1 --- The Cantonese dialect --- p.13
Chapter 2.2 --- Phonology of Cantonese --- p.14
Chapter 2.2.1 --- Initials --- p.15
Chapter 2.2.2 --- Finals --- p.16
Chapter 2.2.3 --- Tones --- p.18
Chapter 2.3 --- Acoustic-phonetic properties of Cantonese syllables --- p.19
References --- p.24
Chapter 3. --- Cantonese Text-to-Speech --- p.25
Chapter 3.1 --- General overview --- p.25
Chapter 3.1.1 --- Text processing --- p.25
Chapter 3.1.2 --- Corpus based acoustic synthesis --- p.26
Chapter 3.1.3 --- Prosodic control --- p.27
Chapter 3.2 --- Syllable based Cantonese Text-to-Speech system --- p.28
Chapter 3.3 --- Sub-syllable based Cantonese Text-to-Speech system --- p.29
Chapter 3.3.1 --- Definition of sub-syllable units --- p.29
Chapter 3.3.2 --- Acoustic inventory --- p.31
Chapter 3.3.3 --- Determination of the concatenation points --- p.33
Chapter 3.4 --- Problems --- p.34
References --- p.36
Chapter 4. --- Waveform Concatenation for Sub-syllable Units --- p.37
Chapter 4.1 --- Previous work in concatenation methods --- p.37
Chapter 4.1.1 --- Determination of concatenation point --- p.38
Chapter 4.1.2 --- Waveform concatenation --- p.38
Chapter 4.2 --- Problems and difficulties in concatenating sub-syllable units --- p.39
Chapter 4.2.1 --- Mismatch of acoustic properties --- p.40
Chapter 4.2.2 --- "Allophone problem of Initials /z/, Id and /s/" --- p.42
Chapter 4.3 --- General procedures in concatenation strategies --- p.44
Chapter 4.3.1 --- Concatenation of unvoiced segments --- p.45
Chapter 4.3.2 --- Concatenation of voiced segments --- p.45
Chapter 4.3.3 --- Measurement of spectral distance --- p.48
Chapter 4.4 --- Detailed procedures in concatenation points determination --- p.50
Chapter 4.4.1 --- Unvoiced segments --- p.50
Chapter 4.4.2 --- Voiced segments --- p.53
Chapter 4.5 --- Selected examples in concatenation strategies --- p.58
Chapter 4.5.1 --- Concatenation at Initial segments --- p.58
Chapter 4.5.1.1 --- Plosives --- p.58
Chapter 4.5.1.2 --- Fricatives --- p.59
Chapter 4.5.2 --- Concatenation at Final segments --- p.60
Chapter 4.5.2.1 --- V group (long vowel) --- p.60
Chapter 4.5.2.2 --- D group (diphthong) --- p.61
References --- p.63
Chapter 5. --- Unit Selection for Sub-syllable Units --- p.65
Chapter 5.1 --- Basic requirements in unit selection process --- p.65
Chapter 5.1.1 --- Availability of multiple copies of sub-syllable units --- p.65
Chapter 5.1.1.1 --- "Levels of ""identical""" --- p.66
Chapter 5.1.1.2 --- Statistics on the availability --- p.67
Chapter 5.1.2 --- Variations in acoustic parameters --- p.70
Chapter 5.1.2.1 --- Pitch level --- p.71
Chapter 5.1.2.2 --- Duration --- p.74
Chapter 5.1.2.3 --- Intensity level --- p.75
Chapter 5.2 --- Selection process: availability check on sub-syllable units --- p.77
Chapter 5.2.1 --- Multiple copies found --- p.79
Chapter 5.2.2 --- Unique copy found --- p.79
Chapter 5.2.3 --- No matched copy found --- p.80
Chapter 5.2.4 --- Illustrative examples --- p.80
Chapter 5.3 --- Selection process: acoustic analysis on candidate units --- p.81
References --- p.88
Chapter 6. --- Performance Evaluation --- p.89
Chapter 6.1 --- General information --- p.90
Chapter 6.1.1 --- Objective test --- p.90
Chapter 6.1.2 --- Subjective test --- p.90
Chapter 6.1.3 --- Test materials --- p.91
Chapter 6.2 --- Details of the objective test --- p.92
Chapter 6.2.1 --- Testing method --- p.92
Chapter 6.2.2 --- Results --- p.93
Chapter 6.2.3 --- Analysis --- p.96
Chapter 6.3 --- Details of the subjective test --- p.98
Chapter 6.3.1 --- Testing method --- p.98
Chapter 6.3.2 --- Results --- p.99
Chapter 6.3.3 --- Analysis --- p.101
Chapter 6.4 --- Summary --- p.107
References --- p.108
Chapter 7. --- Conclusions and Future Works --- p.109
Chapter 7.1 --- Conclusions --- p.109
Chapter 7.2 --- Suggested future works --- p.111
References --- p.113
Appendix 1 Mean pitch level of Initials and Finals stored in the inventory --- p.114
Appendix 2 Mean durations of Initials and Finals stored in the inventory --- p.121
Appendix 3 Mean intensity level of Initials and Finals stored in the inventory --- p.124
Appendix 4 Test word used in performance evaluation --- p.127
Appendix 5 Test paragraph used in performance evaluation --- p.128
Appendix 6 Pitch profile used in the Text-to-Speech system --- p.131
Appendix 7 Duration model used in Text-to-Speech system --- p.132
Rato, João Pedro Cordeiro. "Conversação homem-máquina. Caracterização e avaliação do estado actual das soluções de speech recognition, speech synthesis e sistemas de conversação homem-máquina." Master's thesis, 2016. http://hdl.handle.net/10400.8/2375.
Full text