To see the other types of publications on this topic, follow the link: Text-to-speech synthesis system.

Journal articles on the topic 'Text-to-speech synthesis system'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Text-to-speech synthesis system.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Lin, Kun‐Shan. "Text‐to‐speech synthesis system." Journal of the Acoustical Society of America 86, no. 5 (November 1989): 2051–52. http://dx.doi.org/10.1121/1.398486.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sunitha, Dr K. V. N., and P. Sunitha Devi. "Text Normalization for Telugu Text-to-Speech Synthesis." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 11, no. 2 (October 10, 2013): 2241–49. http://dx.doi.org/10.24297/ijct.v11i2.1176.

Full text
Abstract:
Most areas related to language and speech technology, directly or indirectly, require handling of unrestricted text, and Text-to-speech systems directly need to work on real text. To build a natural sounding speech synthesis system, it is essential that the text processing component produce an appropriate sequence of phonemic units corresponding to an arbitrary input text. A novel approach is used, where the input text is tokenized, and classification is done based on token type. The token sense disambiguation is achieved by the semantic nature of the language and then the expansion rules are applied to get the normalized text. However, for Telugu language not much work is done on text normalization. In this paper we discuss our efforts for designing a rule based system to achieve text normalization in the context of building Telugu text-to-speech system.
APA, Harvard, Vancouver, ISO, and other styles
3

SPROAT, RICHARD. "Multilingual text analysis for text-to-speech synthesis." Natural Language Engineering 2, no. 4 (December 1996): 369–80. http://dx.doi.org/10.1017/s1351324997001654.

Full text
Abstract:
We present a model of text analysis for text-to-speech (TTS) synthesis based on (weighted) finite state transducers, which serves as the text analysis module of the multilingual Bell Labs TTS system. The transducers are constructed using a lexical toolkit that allows declarative descriptions of lexicons, morphological rules, numeral-expansion rules, and phonological rules, inter alia. To date, the model has been applied to eight languages: Spanish, Italian, Romanian, French, German, Russian, Mandarin and Japanese.
APA, Harvard, Vancouver, ISO, and other styles
4

Hassana, IKANI Lucy, and MUHAMMAD SANUSI. "Text to Speech Synthesis System in Yoruba Language." International Journal of Advances in Scientific Research and Engineering 05, no. 10 (2019): 180–91. http://dx.doi.org/10.31695/ijasre.2019.33568.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Shah, Krishna Bikram, Kiran Kumar Chaudhary, and Ashmita Ghimire. "Nepali Text to Speech Synthesis System using FreeTTS." SCITECH Nepal 13, no. 1 (December 31, 2018): 24–31. http://dx.doi.org/10.3126/scitech.v13i1.23498.

Full text
Abstract:
This paper confers the tools and methodology used in developing a Nepali Text to Speech Synthesis System using FreeTTS and is entirely developed in Java and uses FreeTTS synthesize1: Vocalized form of human communication is Speech. Here the Nepali Language is Synthetized based on formant approach and the use of one of the popular generic frameworks FreeTTS that is available in public domain for the development of a TTS system. The Text To Speech Architecture has been developed putting more emphasis on the Natural Language Processing (NLP) component rather than Digital Signal Processing (DSP) component. Nepali language being mostly used language in Nepal and some parts of India and abroad, a text-to-speech (TTS} synthesizer for this language will prove to be a convenient tool and communication technology (JCT) based system to aid to those majorities of people who are illiterate and also to those who are physical impairments like visually handicapped and vocally disabled persons. This ability to convert text to voice may reduce the dependency, frustration, and sense of helplessness of these people. The system can be extended to include more features such as emotions, improved tokenization, interactive options and the use of minimal database.
APA, Harvard, Vancouver, ISO, and other styles
6

Rebai, Ilyes, and Yassine BenAyed. "Text-to-speech synthesis system with Arabic diacritic recognition system." Computer Speech & Language 34, no. 1 (November 2015): 43–60. http://dx.doi.org/10.1016/j.csl.2015.04.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Lin-Shan Lee, Chiu-Yu Tseng, and Ming Ouh-Young. "The synthesis rules in a Chinese text-to-speech system." IEEE Transactions on Acoustics, Speech, and Signal Processing 37, no. 9 (1989): 1309–20. http://dx.doi.org/10.1109/29.31286.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Ahn, Seung-Kwon, and Koeng-Mo Sung. "Korean text-to-speech system using a formant synthesis method." Journal of the Acoustical Society of Japan (E) 13, no. 3 (1992): 151–60. http://dx.doi.org/10.1250/ast.13.151.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Win, Kyawt Yin, and Tomio Takara. "Myanmar text-to-speech system with rule-based tone synthesis." Acoustical Science and Technology 32, no. 5 (2011): 174–81. http://dx.doi.org/10.1250/ast.32.174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Jacob, Agnes, and P. Mythili. "Developing a Child Friendly Text-to-Speech System." Advances in Human-Computer Interaction 2008 (2008): 1–6. http://dx.doi.org/10.1155/2008/597971.

Full text
Abstract:
This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices. Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS system was implemented in MATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).
APA, Harvard, Vancouver, ISO, and other styles
11

Janai, Siddhanna, Shreekanth T., Chandan M., and Ajish K. Abraham. "Speech-to-Speech Conversion." International Journal of Ambient Computing and Intelligence 12, no. 1 (January 2021): 184–206. http://dx.doi.org/10.4018/ijaci.2021010108.

Full text
Abstract:
A novel approach to build a speech-to-speech conversion (STSC) system for individuals with speech impairment dysarthria is described. STSC system takes impaired speech having inherent disturbance as input and produces a synthesized output speech with good pronunciation and noise free utterance. The STSC system involves two stages, namely automatic speech recognition (ASR) and automatic speech synthesis. ASR transforms speech into text, while automatic speech synthesis (or text-to-speech [TTS]) performs the reverse task. At present, the recognition system is developed for a small vocabulary of 50 words and the accuracy of 94% is achieved for normal speakers and 88% for speakers with dysarthria. The output speech of TTS system has achieved a MOS value of 4.5 out of 5 as obtained by averaging the response of 20 listeners. This method of STSC would be an augmentative and alternative communication aid for speakers with dysarthria.
APA, Harvard, Vancouver, ISO, and other styles
12

Chettri, Bhusan, and Krishna Bikram Shah. "Nepali Text to Speech Synthesis System using ESNOLA Method of Concatenation." International Journal of Computer Applications 62, no. 2 (January 18, 2013): 24–28. http://dx.doi.org/10.5120/10053-4909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Pitrelli, J. F., R. Bakis, E. M. Eide, R. Fernandez, W. Hamza, and M. A. Picheny. "The IBM expressive text-to-speech synthesis system for American English." IEEE Transactions on Audio, Speech and Language Processing 14, no. 4 (July 2006): 1099–108. http://dx.doi.org/10.1109/tasl.2006.876123.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Narendra, N. P., K. Sreenivasa Rao, Krishnendu Ghosh, Ramu Reddy Vempada, and Sudhamay Maity. "Development of syllable-based text to speech synthesis system in Bengali." International Journal of Speech Technology 14, no. 3 (June 16, 2011): 167–81. http://dx.doi.org/10.1007/s10772-011-9094-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Popovic, Branislav, Dragan Knezevic, Milan Secujski, and Darko Pekar. "Automatic prosody generation in a text-to-speech system for Hebrew." Facta universitatis - series: Electronics and Energetics 27, no. 3 (2014): 467–77. http://dx.doi.org/10.2298/fuee1403467p.

Full text
Abstract:
The paper presents the module for automatic prosody generation within a system for automatic synthesis of high-quality speech based on arbitrary text in Hebrew. The high quality of synthesis is due to the high accuracy of automatic prosody generation, enabling the introduction of elements of natural sentence prosody of Hebrew. Automatic morphological annotation of text is based on the application of an expert algorithm relying on transformational rules. Syntactic-prosodic parsing is also rule based, while the generation of the acoustic representation of prosodic features is based on classification and regression trees. A tree structure generated during the training phase enables accurate prediction of the acoustic representatives of prosody, namely, durations of phonetic segments as well as temporal evolution of fundamental frequency and energy. Such an approach to automatic prosody generation has lead to an improvement in the quality of synthesized speech, as confirmed by listening tests.
APA, Harvard, Vancouver, ISO, and other styles
16

Schroeter, Horst Juergen. "Method and system for training a text-to-speech synthesis system using a domain-specific speech database." Journal of the Acoustical Society of America 127, no. 5 (2010): 3294. http://dx.doi.org/10.1121/1.3432305.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Eriss Eisa Babikir Adam. "Deep Learning based NLP Techniques In Text to Speech Synthesis for Communication Recognition." December 2020 2, no. 4 (December 18, 2020): 209–15. http://dx.doi.org/10.36548/jscp.2020.4.002.

Full text
Abstract:
The computer system is developing the model for speech synthesis of various aspects for natural language processing. The speech synthesis explores by articulatory, formant and concatenate synthesis. These techniques lead more aperiodic distortion and give exponentially increasing error rate during process of the system. Recently, advances on speech synthesis are tremendously moves towards deep learning process in order to achieve better performance. Due to leverage of large scale data gives effective feature representations to speech synthesis. The main objective of this research article is that implements deep learning techniques into speech synthesis and compares the performance in terms of aperiodic distortion with prior model of algorithms in natural language processing.
APA, Harvard, Vancouver, ISO, and other styles
18

Yu, Hong Zhi, Jin Xi Zhang, Guang Rong Shan, and Ning Ma. "Research on Tibetan Language Synthesis System Front-End Text Processing Technology Based on HMM." Applied Mechanics and Materials 411-414 (September 2013): 308–12. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.308.

Full text
Abstract:
The standardization of the text, word segmentation, the basic stitching unit divided for rhythm analysis and pronunciation conversion is an important content of the speech synthesis system front-end text processing modules. Lhasa Tibetan language and voice characteristics proposed the implementation of a set of Tibetan speech synthesis text analysis module to analyze and describe the Lhasa Tibetan language layer information and maps voice layer. The completion of the study is to lay a solid foundation for further Tibetan speech synthesis system.
APA, Harvard, Vancouver, ISO, and other styles
19

Kaur, Tejinder, and Charanjiv Singh. "Error Free Punjabi Text to Speech Generation System based on Phonemes." International Journal of Emerging Research in Management and Technology 6, no. 8 (June 25, 2018): 172. http://dx.doi.org/10.23956/ijermt.v6i8.134.

Full text
Abstract:
Text-to-speech (TTS) is the generation ofsynthesized speech from text.Language is the ability to express one’sthoughts by means of a set of signs (text), gestures,and sounds. It is a distinctive feature of humanbeings, who are the only creatures to use such asystem. Speech is the oldest means of communicationbetween people and it is also the most widely used.‘Speech synthesis’ also called ‘Text to speechsynthesis’ is the artificial production ofhuman speech. A computer system used for thispurpose is called a speech synthesizer and can beimplemented in software. A text-to-speech(TTS) system converts text to speech.The proposed Enhanced Transcriptions Method is developed using Microsoft Visual Studio in VB.Net Language. Firstly word indexing is performed for the predefined words then corresponding speech signal is detected and errors in words are calculated using Euclidean distance. The results of the proposed work shows that Enhanced Transcriptions Method has more accuracy 89% as compared to previous Transcriptions Method 79%. The value of specificity for proposed method is 0.89 and for previous method is 0.79.
APA, Harvard, Vancouver, ISO, and other styles
20

Sudhakar, B., and R. Bensraj. "Development of Concatenative Syllable based Text to Speech Synthesis System for Tamil." International Journal of Computer Applications 91, no. 5 (April 18, 2014): 22–25. http://dx.doi.org/10.5120/15878-4839.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Zahariev, Vadim Anatol'evich, Aleksandr Aleksandrovich Petrovsky, and Boris Mefod'evich Lobanov. "Text to speech synthesis system with the target speaker voice customization capability." SPIIRAS Proceedings 1, no. 32 (March 31, 2014): 82. http://dx.doi.org/10.15622/sp.32.6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Malsheen, Bathsheba J., Gabriel F. Groner, and Linda D. Williams. "Text to speech synthesis system and method using context dependent vowel allophones." Journal of the Acoustical Society of America 91, no. 4 (April 1992): 2305. http://dx.doi.org/10.1121/1.403608.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Imedjdouben, Fayçal, and Amrane Houacine. "Development of an automatic phonetization system for Arabic text-to-speech synthesis." International Journal of Speech Technology 17, no. 4 (July 19, 2014): 417–26. http://dx.doi.org/10.1007/s10772-014-9241-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Yoon, Kyuchul. "A prosodic phrasing model for a Korean text-to-speech synthesis system." Computer Speech & Language 20, no. 1 (January 2006): 69–79. http://dx.doi.org/10.1016/j.csl.2005.01.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Shreekanth, T., M. R. Deeksha, and Karthikeya R. Kaushik. "A Novel Data Independent Approach for Conversion of Hand Punched Kannada Braille Script to Text and Speech." International Journal of Image and Graphics 18, no. 02 (April 2018): 1850010. http://dx.doi.org/10.1142/s0219467818500109.

Full text
Abstract:
In society, there exists a gap in communication between the sighted community and the visually challenged people due to different scripts followed to read and write. To bridge this gap there is a need for a system that supports automatic conversion of Braille script to text and speech in the corresponding language. Optical Braille Recognition (OBR) system converts the hand-punched Braille characters into their equivalent natural language characters. The Text-to-Speech (TTS) system converts the recognized characters into audible speech using speech synthesis techniques. Existing literature reveals that OBR and TTS systems have been well established independently for English. There is a scope for development of OBR and TTS systems for regional languages. In spite of Kannada being one of the most widely spoken regional languages in India, minimal work has been done towards Kannada OBR and TTS. There is no system that directly converts Braille script to speech, therefore, this development of Kannada Braille to text and speech system is one of a kind. The acquired image is processed and feature extraction is performed using [Formula: see text]-means algorithm and heuristics to convert the Braille characters to Kannada script. The concatenation based speech synthesis technique employing phoneme as the basic unit is used to convert Kannada TTS using Festival TTS framework. Performance evaluation of the proposed system is done using Kannada Braille database developed independently, and the results obtained are found to be satisfactory when compared to existing methods in the literature.
APA, Harvard, Vancouver, ISO, and other styles
26

Bachenko, Joan, Eileen Fitzpatrick, and Jeffrey Daugherty. "A rule-based phrase parser for real-time text-to-speech synthesis." Natural Language Engineering 1, no. 2 (June 1995): 191–212. http://dx.doi.org/10.1017/s1351324900000140.

Full text
Abstract:
AbstractText-to-speech systems are currently designed to work on complete sentences and paragraphs, thereby allowing front end processors access to large amounts of linguistic context. Problems with this design arise when applications require text to be synthesized in near real time, as it is being typed. How does the system decide which incoming words should be collected and synthesized as a group when prior and subsequent word groups are unknown? We describe a rule-based parser that uses a three cell buffer and phrasing rules to identify break points for incoming text. Words up to the break point are synthesized as new text is moved into the buffer; no hierarchical structure is built beyond the lexical level. The parser was developed for use in a system that synthesizes written telecommunications by Deaf and hard of hearing people. These are texts written entirely in upper case, with little or no punctuation, and using a nonstandard variety of English (e.g. WHEN DO I WILL CALL BACK YOU). The parser performed well in a three month field trial utilizing tens of thousands of texts. Laboratory tests indicate that the parser exhibited a low error rate when compared with a human reader.
APA, Harvard, Vancouver, ISO, and other styles
27

STATHOPOULOU-ZOIS, P. "A GRAPHEME-TO-PHONEME TRANSLATOR FOR TTS SYNTHESIS IN GREEK." International Journal on Artificial Intelligence Tools 14, no. 06 (December 2005): 901–18. http://dx.doi.org/10.1142/s0218213005002466.

Full text
Abstract:
In this paper is presented the algorithm of an automatic grapheme-to-phoneme translator for the Greek language. The proposed algorithm is designed to collaborate with a high quality Text-to-Speech synthesis system in Greek. The algorithm assimilates the full reading process of written text as realized by a Greek speaking person. A detailed study of the Greek language's operation, led us to the implementation of an automatic integrated system which describes its phonetic behaviour in an exact and natural way. The software that implements the algorithm has the capability to receive written text from any input (keyboard, file, screen reader, OCR system e.t.c.) and transform it to phonetic form. Afterwards the output of the algorithm is directed to the input of a concatenation-based speech synthesizer and the right pronunciation of any written text is achieved in real-time. During the reading process the software locates and distinguishes Greek written text from any foreign language words, specially written symbols, abbreviations e. t. c… and afterwards manages them in order that the flow of the reading process permits the right perception of the produced spoken messages. The most important qualification of the algorithm is the possibility to incorporate it in other Text-to-Speech synthesis systems of different technology. Finally experimental measurements indicate the successful operation of the algorithm.
APA, Harvard, Vancouver, ISO, and other styles
28

Ning, Yishuang, Sheng He, Zhiyong Wu, Chunxiao Xing, and Liang-Jie Zhang. "A Review of Deep Learning Based Speech Synthesis." Applied Sciences 9, no. 19 (September 27, 2019): 4050. http://dx.doi.org/10.3390/app9194050.

Full text
Abstract:
Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more attention. Recent advances on speech synthesis are overwhelmingly contributed by deep learning or even end-to-end techniques which have been utilized to enhance a wide range of application scenarios such as intelligent speech interaction, chatbot or conversational artificial intelligence (AI). For speech synthesis, deep learning based techniques can leverage a large scale of <text, speech> pairs to learn effective feature representations to bridge the gap between text and speech, thus better characterizing the properties of events. To better understand the research dynamics in the speech synthesis field, this paper firstly introduces the traditional speech synthesis methods and highlights the importance of the acoustic modeling from the composition of the statistical parametric speech synthesis (SPSS) system. It then gives an overview of the advances on deep learning based speech synthesis, including the end-to-end approaches which have achieved start-of-the-art performance in recent years. Finally, it discusses the problems of the deep learning methods for speech synthesis, and also points out some appealing research directions that can bring the speech synthesis research into a new frontier.
APA, Harvard, Vancouver, ISO, and other styles
29

Syrdal, Ann K. "Development of a female voice for a concatenative synthesis text‐to‐speech system." Journal of the Acoustical Society of America 92, no. 4 (October 1992): 2477. http://dx.doi.org/10.1121/1.404427.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Tiomkin, Stas, David Malah, Slava Shechtman, and Zvi Kons. "A Hybrid Text-to-Speech System That Combines Concatenative and Statistical Synthesis Units." IEEE Transactions on Audio, Speech, and Language Processing 19, no. 5 (July 2011): 1278–88. http://dx.doi.org/10.1109/tasl.2010.2089679.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Williams, Briony. "Word stress assignment in a text-to-speech synthesis system for British English." Computer Speech & Language 2, no. 3-4 (September 1987): 235–72. http://dx.doi.org/10.1016/0885-2308(87)90011-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Zacniewski, Artur, and Tadeusz Bodnar. "Text Spotting in the Wild with Embedded Device." Scientific Journal of Polish Naval Academy 217, no. 2 (June 1, 2019): 81–95. http://dx.doi.org/10.2478/sjpna-2019-0014.

Full text
Abstract:
Abstract Detecting and recognizing text in natural scenes (e.g. streets, restaurants, shops, etc.) could be a part of an artificial intelligence system, especially with regard to the speech synthesis system. Properly detected text is passed to a recognition stage and then to the speech synthesis system, which translates text to speech. Research is carried out for the ‘Toucan Eye’ project — embedded device with artificial intelligence system able to help people with impaired sight. Due to constrained resources and abilities of embedded devices, criteria for text spotting must be met. First criterion is quality of detected and recognized regions with text and the second is time spent on both operations. Particular stages of the system and chosen methods of text spotting under aforementioned constraints are presented.
APA, Harvard, Vancouver, ISO, and other styles
33

Bajracharya, Roop Shree Ratna, Santosh Regmi, Bal Krishna Bal, and Balaram Prasain. "Building a natural sounding Text-To-Speech system for the Nepali language: research and development challenges and solutions." Gipan 4 (December 31, 2019): 106–16. http://dx.doi.org/10.3126/gipan.v4i0.35461.

Full text
Abstract:
Text-to-Speech (TTS) synthesis has come far from its primitive synthetic monotone voices to more natural and intelligible sounding voices. One of the direct applications of a natural sounding TTS systems is the screen reader applications for the visually impaired and the blind community. The Festival Speech Synthesis System uses a concatenative speech synthesis method together with the unit selection process to generate a natural sounding voice. This work primarily gives an account of the efforts put towards developing a Natural sounding TTS system for Nepali using the Festival system. We also shed light on the issues faced and the solutions derived which can be quite overlapping across other similar under-resourced languages in the region.
APA, Harvard, Vancouver, ISO, and other styles
34

Valizada, Alakbar, Sevil Jafarova, Emin Sultanov, and Samir Rustamov. "Development and Evaluation of Speech Synthesis System Based on Deep Learning Models." Symmetry 13, no. 5 (May 7, 2021): 819. http://dx.doi.org/10.3390/sym13050819.

Full text
Abstract:
This study concentrates on the investigation, development, and evaluation of Text-to-Speech Synthesis systems based on Deep Learning models for the Azerbaijani Language. We have selected and compared state-of-the-art models-Tacotron and Deep Convolutional Text-to-Speech (DC TTS) systems to achieve the most optimal model. Both systems were trained on the 24 h speech dataset of the Azerbaijani language collected and processed from the news website. To analyze the quality and intelligibility of the speech signals produced by two systems, 34 listeners participated in an online survey containing subjective evaluation tests. The results of the study indicated that according to the Mean Opinion Score, Tacotron demonstrated better results for the In-Vocabulary words; however, DC TTS indicated a higher performance of the Out-Of-Vocabulary words synthesis.
APA, Harvard, Vancouver, ISO, and other styles
35

Xu, Xiaona, Li Yang, Yue Zhao, and Hui Wang. "End-to-End Speech Synthesis for Tibetan Multidialect." Complexity 2021 (January 25, 2021): 1–8. http://dx.doi.org/10.1155/2021/6682871.

Full text
Abstract:
The research on Tibetan speech synthesis technology has been mainly focusing on single dialect, and thus there is a lack of research on Tibetan multidialect speech synthesis technology. This paper presents an end-to-end Tibetan multidialect speech synthesis model to realize a speech synthesis system which can be used to synthesize different Tibetan dialects. Firstly, Wylie transliteration scheme is used to convert the Tibetan text into the corresponding Latin letters, which effectively reduces the size of training corpus and the workload of front-end text processing. Secondly, a shared feature prediction network with a cyclic sequence-to-sequence structure is built, which maps the Latin transliteration vector of Tibetan character to Mel spectrograms and learns the relevant features of multidialect speech data. Thirdly, two dialect-specific WaveNet vocoders are combined with the feature prediction network, which synthesizes the Mel spectrum of Lhasa-Ü-Tsang and Amdo pastoral dialect into time-domain waveform, respectively. The model avoids using a large number of Tibetan dialect expertise for processing some time-consuming tasks, such as phonetic analysis and phonological annotation. Additionally, it can directly synthesize Lhasa-Ü-Tsang and Amdo pastoral speech on the existing text annotation. The experimental results show that the synthesized speech of Lhasa-Ü-Tsang and Amdo pastoral dialect based on our proposed method has better clarity and naturalness than the Tibetan monolingual model.
APA, Harvard, Vancouver, ISO, and other styles
36

Raskind, Marshall H., and Eleanor Higgins. "Effects of Speech Synthesis on the Proofreading Efficiency of Postsecondary Students with Learning Disabilities." Learning Disability Quarterly 18, no. 2 (May 1995): 141–58. http://dx.doi.org/10.2307/1511201.

Full text
Abstract:
This study investigated the effects of speech synthesis on the proofreading efficiency of postsecondary students with learning disabilities. Subjects proofread self-generated written language samples under three conditions: (a) using a speech synthesis system that simultaneously highlighted and “spoke” words on a computer monitor, (b) having the text read aloud to them by another person, and (c) receiving no assistance. Using the speech synthesis system enabled subjects to detect a significantly higher percentage of total errors than either of the other two proofreading conditions. In addition, subjects were able to locate a significantly higher percentage of capitalization, spelling, usage and typographical errors under the speech synthesis condition. However, having the text read aloud by another person significantly outperformed the other conditions in finding “grammar-mechanical” errors. Results are discussed with regard to underlying reasons for the overall superior performance of the speech synthesis system and the implications of using speech synthesis as a compensatory writing aid for postsecondary students with learning disabilities.
APA, Harvard, Vancouver, ISO, and other styles
37

Chabchoub, Abdelkader, Salah Alahmadi, Adnan Cherif, and Wahid Barkouti. "Di-Diphone Arabic Speech Synthesis Concatenation." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 3, no. 2 (October 30, 2012): 218–22. http://dx.doi.org/10.24297/ijct.v3i2a.2810.

Full text
Abstract:
This work describes the new Arabic Text-to-speech (TTS) synthesis system. This system based on di-Diphone concatenation with TD-PSOLA modifier synthesizer. The quality of a synthesized speech is improved by analyzing the spectrum features of voice source in various F0 ranges and timbres in detail and new unites concatenation. It generates speech synthesis based on analysis and estimation of formant by classifying the voice source into different types. The developed model enhances the quality of the naturalness, and the intelligibility of speech synthesis in various speaking environment.
APA, Harvard, Vancouver, ISO, and other styles
38

Modi, Rohan. "Transcript Anatomization with Multi-Linguistic and Speech Synthesis Features." International Journal for Research in Applied Science and Engineering Technology 9, no. VI (June 20, 2021): 1755–58. http://dx.doi.org/10.22214/ijraset.2021.35371.

Full text
Abstract:
Handwriting Detection is a process or potential of a computer program to collect and analyze comprehensible input that is written by hand from various types of media such as photographs, newspapers, paper reports etc. Handwritten Text Recognition is a sub-discipline of Pattern Recognition. Pattern Recognition is refers to the classification of datasets or objects into various categories or classes. Handwriting Recognition is the process of transforming a handwritten text in a specific language into its digitally expressible script represented by a set of icons known as letters or characters. Speech synthesis is the artificial production of human speech using Machine Learning based software and audio output based computer hardware. While there are many systems which convert normal language text in to speech, the aim of this paper is to study Optical Character Recognition with speech synthesis technology and to develop a cost effective user friendly image based offline text to speech conversion system using CRNN neural networks model and Hidden Markov Model. The automated interpretation of text that has been written by hand can be very useful in various instances where processing of great amounts of handwritten data is required, such as signature verification, analysis of various types of documents and recognition of amounts written on bank cheques by hand.
APA, Harvard, Vancouver, ISO, and other styles
39

Rojc, Matej, and Zdravko Kačič. "Time and space-efficient architecture for a corpus-based text-to-speech synthesis system." Speech Communication 49, no. 3 (March 2007): 230–49. http://dx.doi.org/10.1016/j.specom.2007.01.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Khan, Najeeb Ullah, and Jung‐Chul Lee. "Optimal state duration assignment in hidden Markov model‐based text‐to‐speech synthesis system." Electronics Letters 51, no. 12 (June 2015): 941–43. http://dx.doi.org/10.1049/el.2015.0539.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Luong, Hieu Thi, Shinji Takaki, SangJin Kim, and Junichi Yamagishi. "A DNN-based text-to-speech synthesis system using speaker, gender, and age codes." Journal of the Acoustical Society of America 140, no. 4 (October 2016): 2962. http://dx.doi.org/10.1121/1.4969152.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Chalamandaris, Aimilios, Sotiris Karabetsos, Pirros Tsiakoulis, and Spyros Raptis. "A unit selection text-to-speech synthesis system optimized for use with screen readers." IEEE Transactions on Consumer Electronics 56, no. 3 (August 2010): 1890–97. http://dx.doi.org/10.1109/tce.2010.5606343.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Sangeetha, Sudhakar, and Sekar Jothilakshmi. "Syllable based text to speech synthesis system using auto associative neural network prosody prediction." International Journal of Speech Technology 17, no. 2 (September 24, 2013): 91–98. http://dx.doi.org/10.1007/s10772-013-9210-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Bettayeb, Nadjla, and Mhania Guerti. "Speech Synthesis System for the Holy Quran Recitation." International Arab Journal of Information Technology 18, no. 1 (December 31, 2020): 8–15. http://dx.doi.org/10.34028/iajit/18/1/2.

Full text
Abstract:
This paper aims to develop a Text-To-Speech (TTS) synthesis system for the holly quran recitation, to properly helps reciters and facilitates its use. In this work, the unit selection method is adopted and improved to reach a good speech quality. The proposed approach consists mainly of two steps. In the first one, an Expert System (ES) module is integrated by employing Arabic, Quran language, phonetic and phonological features. This part was considered as a preselection to optimize the synthesis algorithm's speed. The second step is the final selection of units by minimizing a concatenation cost function and a forward-backward dynamic programming search. The system is evaluated by native and non-native Arabic speakers. The results show that the goal of a correct Quran recitation by respecting its reading rules was reached, with 97 % of speech intelligibility and 72.13% of naturalness
APA, Harvard, Vancouver, ISO, and other styles
45

_, Arifin, Surya Sumpeno, Mochamad Hariadi, and Arry Maulana Syarif. "Development of Indonesian Text-to-Audiovisual Synthesis System Using Syllable Concatenation Approach to Support Indonesian Learning." International Journal of Emerging Technologies in Learning (iJET) 12, no. 02 (February 28, 2017): 166. http://dx.doi.org/10.3991/ijet.v12i02.6384.

Full text
Abstract:
This study aims to develop of Indonesian Text-to-Audiovisual synthesis system using syllable concatenation approach to support Indonesian learning. This system can visualize the syllable pronunciation synchronized with the speech signal so that it can provide a realistic illustration of the articulator movement when each phoneme is pronounced. Syllable concatenation approach is used to realize a realistic visualization by assembling articulation and coarticulation in the form of syllables. In the development of the system, we have recorded speech database in the syllables form which refers to the patterns of syllables in Indonesian. The syllable concatenation approach is used to concatenate viseme of each phoneme, and to form the visualization of syllable pronunciations. It is synchronized with the corresponding speech from the speech database. Evaluation of this system is conducted based on a "lips-reading" of the 10 Indonesian sentences entered into the system. Ratings are based on the degree of correspondence between the syllable pronunciation and the speech produced. Assessment of all respondents is calculated using MOS (Mean Opinion Score). The calculation results show that the Indonesian text-to-audiovisual system has produced the pronunciation visualization more realistic and smoother.
APA, Harvard, Vancouver, ISO, and other styles
46

Zhiyong Wu, H. M. Meng, Hongwu Yang, and Lianhong Cai. "Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System." IEEE Transactions on Audio, Speech, and Language Processing 17, no. 8 (November 2009): 1567–76. http://dx.doi.org/10.1109/tasl.2009.2023161.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Liu, Yifan, and Jin Zheng. "Es-Tacotron2: Multi-Task Tacotron 2 with Pre-Trained Estimated Network for Reducing the Over-Smoothness Problem." Information 10, no. 4 (April 9, 2019): 131. http://dx.doi.org/10.3390/info10040131.

Full text
Abstract:
Text-to-speech synthesis is a computational technique for producing synthetic, human-like speech by a computer. In recent years, speech synthesis techniques have developed, and have been employed in many applications, such as automatic translation applications and car navigation systems. End-to-end text-to-speech synthesis has gained considerable research interest, because compared to traditional models the end-to-end model is easier to design and more robust. Tacotron 2 is an integrated state-of-the-art end-to-end speech synthesis system that can directly predict closed-to-natural human speech from raw text. However, there remains a gap between synthesized speech and natural speech. Suffering from an over-smoothness problem, Tacotron 2 produced ’averaged’ speech, making the synthesized speech sounds unnatural and inflexible. In this work, we first propose an estimated network (Es-Network), which captures general features from a raw mel spectrogram in an unsupervised manner. Then, we design Es-Tacotron2 by employing the Es-Network to calculate the estimated mel spectrogram residual, and setting it as an additional prediction task of Tacotron 2, to allow the model focus more on predicting the individual features of mel spectrogram. The experience shows that compared to the original Tacotron 2 model, Es-Tacotron2 can produce more variable decoder output and synthesize more natural and expressive speech.
APA, Harvard, Vancouver, ISO, and other styles
48

Hill, David R., Craig R. Taube-Schock, and Leonard Manzara. "Low-level articulatory synthesis: A working text-to-speech solution and a linguistic tool." Canadian Journal of Linguistics/Revue canadienne de linguistique 62, no. 3 (June 21, 2017): 371–410. http://dx.doi.org/10.1017/cnj.2017.15.

Full text
Abstract:
AbstractA complete text-to-speech system has been created by the authors, based on a tube resonance model of the vocal tract and a development of Carré’s “Distinctive Region Model”, which is in turn based on the formant-sensitivity findings of Fant and Pauli (1974), to control the tube. In order to achieve this goal, significant long-term linguistic research has been involved, including rhythm and intonation studies, as well as the development of low-level articulatory data and rules to drive the model, together with the necessary tools, parsers, dictionaries and so on. The tools and the current system are available under a General Public License, and are described here, with further references in the paper, including samples of the speech produced, and figures illustrating the system description.
APA, Harvard, Vancouver, ISO, and other styles
49

Sutanti, Putu Asri Sri, and Gst Ayu Vida Mastrika Giri. "Low Filtering Method for Noise Reduction at Text to Speech Application." JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) 8, no. 3 (January 25, 2020): 339. http://dx.doi.org/10.24843/jlk.2020.v08.i03.p17.

Full text
Abstract:
Technological developments have greatly encouraged various researchers to develop several studies in the IT field. One branch of research in the IT field is sound synthesis. Some text-to-speech applications, are usually quite difficult to form and are less flexible in replacing existing types of sound. In addition, sometimes accent or how someone speaks is not well represented, so it is quite difficult to form a text-to-speech application by using the desired sound like user voice or other sounds. From the above problems, this research propose an application that can change text into sound or text-to-speech which is more flexible and in accordance with the wishes of the user. From the results of testing that has been done, this system has an accuracy of 70%.
APA, Harvard, Vancouver, ISO, and other styles
50

Janyoi, Pongsathon, and Pusadee Seresangtakul. "Isarn Dialect Speech Synthesis using HMM with syllable-context features." ECTI Transactions on Computer and Information Technology (ECTI-CIT) 12, no. 2 (November 29, 2018): 81–89. http://dx.doi.org/10.37936/ecti-cit.2018122.108607.

Full text
Abstract:
This paper describes the Isarn speech synthesis system, which is a regional dialect spoken in the Northeast of Thailand. In this study, we focus to improve the prosody generation of the system by using the additional context features. In order to develop the system, the speech parameters (Mel-ceptrum and fundamental frequencies of phoneme within different phonetic contexts) were modelled using Hidden Markov Models (HMM). Synthetic speech was generated by converting the input text into context-dependent phonemes. Speech parameters were generated from the trained HMM, according to the context-dependent phonemes, and were then synthesized through a speech vocoder. In this study, systems were trained using three different feature sets: basic contextual features, tonal, and syllable-context features. Objective and subjective tests were conducted to determine the performance of the proposed system. The results indicated that the addition of the syllable-context features significantly improved the naturalness of synthesized speech.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography