Relevant bibliographies by topics / Text-to-speech synthesis module

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Text-to-speech synthesis module'

Author: Grafiati

Published: 3 June 2025

Last updated: 1 August 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Text-to-speech synthesis module.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Text-to-speech synthesis module"

SPROAT, RICHARD. "Multilingual text analysis for text-to-speech synthesis." Natural Language Engineering 2, no. 4 (1996): 369–80. http://dx.doi.org/10.1017/s1351324997001654.

Full text

Abstract:

We present a model of text analysis for text-to-speech (TTS) synthesis based on (weighted) finite state transducers, which serves as the text analysis module of the multilingual Bell Labs TTS system. The transducers are constructed using a lexical toolkit that allows declarative descriptions of lexicons, morphological rules, numeral-expansion rules, and phonological rules, inter alia. To date, the model has been applied to eight languages: Spanish, Italian, Romanian, French, German, Russian, Mandarin and Japanese.

APA, Harvard, Vancouver, ISO, and other styles

Hu, Weixin, and Xianyou Zhu. "A real-time voice cloning system with multiple algorithms for speech quality improvement." PLOS ONE 18, no. 4 (2023): e0283440. http://dx.doi.org/10.1371/journal.pone.0283440.

Full text

Abstract:

With the development of computer technology, speech synthesis techniques are becoming increasingly sophisticated. Speech cloning can be performed as a subtask of speech synthesis technology by using deep learning techniques to extract acoustic information from human voices and combine it with text to output a natural human voice. However, traditional speech cloning technology still has certain limitations; excessively large text inputs cannot be adequately processed, and the synthesized audio may include noise artifacts like breaks and unclear phrases. In this study, we add a text determinatio

APA, Harvard, Vancouver, ISO, and other styles

Yu, Hong Zhi, Jin Xi Zhang, Guang Rong Shan, and Ning Ma. "Research on Tibetan Language Synthesis System Front-End Text Processing Technology Based on HMM." Applied Mechanics and Materials 411-414 (September 2013): 308–12. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.308.

Full text

Abstract:

The standardization of the text, word segmentation, the basic stitching unit divided for rhythm analysis and pronunciation conversion is an important content of the speech synthesis system front-end text processing modules. Lhasa Tibetan language and voice characteristics proposed the implementation of a set of Tibetan speech synthesis text analysis module to analyze and describe the Lhasa Tibetan language layer information and maps voice layer. The completion of the study is to lay a solid foundation for further Tibetan speech synthesis system.

APA, Harvard, Vancouver, ISO, and other styles

Phukon, Debasish. "A Deep Learning Approach for ASL Recognition and Text-to-Speech Synthesis using CNN." International Journal for Research in Applied Science and Engineering Technology 11, no. 8 (2023): 2135–43. http://dx.doi.org/10.22214/ijraset.2023.55528.

Full text

Abstract:

Abstract: Sign language is a visual language that is used by the deaf and hard-of-hearing community to communicate. However, sign language is not universally understood by non-signers, which can create communication barriers for the deaf and hard-ofhearing individuals. In this paper, we present a novel application for American Sign Language (ASL) to text to speech conversion using deep learning techniques. Our app aims to bridge the communication gap between hearing-impaired individuals who use ASL as their primary mode of communication and individuals who do not understand ASL. The app compri

APA, Harvard, Vancouver, ISO, and other styles

Wu, Chung Hsien. "Unit selection module and method of chinese text-to-speech synthesis." Journal of the Acoustical Society of America 127, no. 5 (2010): 3294. http://dx.doi.org/10.1121/1.3432304.

Full text

APA, Harvard, Vancouver, ISO, and other styles

A, Arulprakash, Synthiya M, and Rajabhusanam C. "Tamil Speech Synthesizer App for Android: Text Processing Module Enhancement." Indian Journal of Science and Technology 16, no. 7 (2023): 485–91. https://doi.org/10.17485/IJST/v16i7.2165.

Full text

Abstract:

ABSTRACT <strong>Objectives:</strong> Designing dynamic computer systems that are effective, efficient, simple, and satisfying to use is becoming extremely important in this age of information and communication technology. Text to Speech or Speech Synthesis is one of the many methods being investigated by researchers to improve Human-Computer Interaction. The goal here is to improve the text processing component of the Tamil voice synthesizer by including a text normalizer and loan word identification that is efficient and reliable. <strong>Methods:</strong> Text normalization i

APA, Harvard, Vancouver, ISO, and other styles

Zhao, Wei, and Zheng Yang. "An Emotion Speech Synthesis Method Based on VITS." Applied Sciences 13, no. 4 (2023): 2225. http://dx.doi.org/10.3390/app13042225.

Full text

Abstract:

People and things can be connected through the Internet of Things (IoT), and speech synthesis is one of the key technologies. At this stage, end-to-end speech synthesis systems are capable of synthesizing relatively realistic human voices, but the current commonly used parallel text-to-speech suffers from loss of useful information during the two-stage delivery process, and the control features of the synthesized speech are monotonous, with insufficient expression of features, including emotion, leading to emotional speech synthesis becoming a challenging task. In this paper, we propose a new

APA, Harvard, Vancouver, ISO, and other styles

Gudi, Ganga, Mallamma V. Reddy, and Hanumanthappa M. "Enhancing Kannada text-to-speech and braille conversion with deep learning for the visually impaired." Scientific Temper 16, Spl-1 (2025): 48–52. https://doi.org/10.58414/scientifictemper.2025.16.spl-1.06.

Full text

Abstract:

Advancements in assistive technology have greatly improved accessibility for visually impaired individuals, enabling seamless interaction with textual content. This research introduces a novel approach that converts Kannada text into both speech and Braille, promoting multilingual accessibility. The proposed system incorporates a support vector machine (SVM) for Kannada text-to-Braille conversion and a deep learning-based text-to-speech (TTS) model for speech synthesis. The Braille translation module accurately maps Kannada characters to their respective Braille representations using SVM class

APA, Harvard, Vancouver, ISO, and other styles

MALLISHWARI, N. "Implementation of the Image Text to Speech Conversion in the Desired Language by Translating with Raspberry Pi." International Scientific Journal of Engineering and Management 04, no. 06 (2025): 1–9. https://doi.org/10.55041/isjem04635.

Full text

Abstract:

ABSTRACT: The main problem in communication is language bias between the communicators. This device basically can be used by people who do not know English and want it to be translated to their native language. The novelty component of this research work is the speech output which is available in 53 different languages translated from English. This paper is based on a prototype which helps user to hear the contents of the text images in the desired language. It involves extraction of text from the image and converting the text to translated speech in the user desired language. This is done wit

APA, Harvard, Vancouver, ISO, and other styles

Popovic, Branislav, Dragan Knezevic, Milan Secujski, and Darko Pekar. "Automatic prosody generation in a text-to-speech system for Hebrew." Facta universitatis - series: Electronics and Energetics 27, no. 3 (2014): 467–77. http://dx.doi.org/10.2298/fuee1403467p.

Full text

Abstract:

The paper presents the module for automatic prosody generation within a system for automatic synthesis of high-quality speech based on arbitrary text in Hebrew. The high quality of synthesis is due to the high accuracy of automatic prosody generation, enabling the introduction of elements of natural sentence prosody of Hebrew. Automatic morphological annotation of text is based on the application of an expert algorithm relying on transformational rules. Syntactic-prosodic parsing is also rule based, while the generation of the acoustic representation of prosodic features is based on classifica

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Text-to-speech synthesis module"

Siniša, Suzić. "Parametarska sinteza ekspresivnog govora." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2019. https://www.cris.uns.ac.rs/record.jsf?recordId=110631&source=NDLTD&language=en.

Full text

Abstract:

U disertaciji su opisani postupci sinteze ekspresivnog govorakorišćenjem parametarskih pristupa. Pokazano je da se korišćenjemdubokih neuronskih mreža dobijaju bolji rezultati nego korišćenjemskrivenix Markovljevih modela. Predložene su tri nove metode zasintezu ekspresivnog govora korišćenjem dubokih neuronskih mreža:metoda kodova stila, metoda dodatne obuke mreže i arhitekturazasnovana na deljenim skrivenim slojevima. Pokazano je da se najboljirezultati dobijaju korišćenjem metode kodova stila. Takođe jepredložana i nova metoda za transplantaciju emocija/stilovabazirana na deljenim skrivenim

APA, Harvard, Vancouver, ISO, and other styles

Van, Niekerk Daniel Rudolph. "Automatic speech segmentation with limited data / by D.R. van Niekerk." Thesis, North-West University, 2009. http://hdl.handle.net/10394/3978.

Full text

Abstract:

The rapid development of corpus-based speech systems such as concatenative synthesis systems for under-resourced languages requires an efﬁcient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility, while automation of this process has only been satisfactorily demonstrated on large corpora of a select few languages by employing techniques requiring extensive and specialised resources. In this work we cons

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Text-to-speech synthesis module"

Dutoit, Thierry, and Yannis Stylianou. Text-to-Speech Synthesis. Edited by Ruslan Mitkov. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0017.

Full text

Abstract:

This article gives an introduction to state-of-the-art text-to-speech (TTS) synthesis systems, showing both the natural language processing and the digital signal processing problems involved. Text-to-speech (TTS) synthesis is the art of designing talking machines. The article begins with brief user-oriented description of a general TTS system and comments on its commercial applications. It then gives a functional diagram of a modern TTS system, highlighting its components. It describes its morphosyntactic module. Furthermore, it examines why sentence-level phonetization cannot be achieved by

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Text-to-speech synthesis module"

Samyuktha, K., Shevani V. J., S. Swetha, Yalini Sri N., and P. Manohari. "AI and Quantum Network Application in Business and Medicine, Deep Voice Synthesis, and Personalized Narration." In Advances in Computational Intelligence and Robotics. IGI Global, 2024. https://doi.org/10.4018/979-8-3693-8135-9.ch019.

Full text

Abstract:

This work offers a novel deep learning-based method for voice synthesis with an emphasis on customized audio generation from text documents and personalized narration. The suggested system is divided into many modules, the first of which is text pre-processing to guarantee input that is clean and structured. Understanding the text's syntactic and semantic elements is aided by further linguistic analysis. In order to create phonemes for phoneme synthesis—a process in which deep learning models, AI and quantum produce speech sounds—written text must first be converted into phonemes. These phonem

APA, Harvard, Vancouver, ISO, and other styles

López-González, Erika,. "Wizard based on natural language processing for Java programming language." In Computer Technology and Innovation. ECORFAN, 2023. http://dx.doi.org/10.35429/h.2023.13.53.68.

Full text

Abstract:

Smart assistants are a technology that has become very popular today, due to the multiple functions they have, and somehow allow a natural interaction between devices and human beings. The objective is to develop an assistant based on NLP (Natural Language Processing) focused on the Java programming language, with the feature of guiding the user on the use of this language from reliable and validated sources, through speech recognition and speech synthesis by the assistant, allowing communication between the software and the user through natural language, in order to make the time in the consu

APA, Harvard, Vancouver, ISO, and other styles

Nurk Tõnis. "Creation of HMM-based Speech Model for Estonian Text-to-Speech Synthesis." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2012. https://doi.org/10.3233/978-1-61499-133-5-162.

Full text

Abstract:

The article describes the creation of Hidden Markov Model based speech models for both male and female voice for Estonian text-to-speech synthesis. A brief overview of text-to-speech synthesis process is given, focusing on statistical parametric synthesis in particular. System HTS is employed to generate voice models. The creation of speech corpus of Institute of the Estonian Language is analyzed. The process of adapting Estonian-related training data and linguistic specification to HTS is described, as well as experiments carried out on data from different speakers, subcorpora and linguistic

APA, Harvard, Vancouver, ISO, and other styles

Singh, Harman, Parminder Singh, and Manjot Kaur Gill. "Statistical Parametric Speech Synthesis for Punjabi Language using Deep Neural Network." In SCRS CONFERENCE PROCEEDINGS ON INTELLIGENT SYSTEMS. Soft Computing Research Society, 2021. http://dx.doi.org/10.52458/978-93-91842-08-6-41.

Full text

Abstract:

In recent years, speech technology gets very advanced, due to which speech synthesis becomes an interesting area of study for researchers. Text-To-Speech (TTS) system generates the speech from the text by using a synthesized technique like concatenative, formant, articulatory, Statistical Parametric Speech Synthesis (SPSS) etc. The Deep Neural Network (DNN) based SPSS for the Punjabi language is used in this research work. The database used for this research works contains 674 audio files and a single text file containing 674 sentences. This database was created at the Language Technologies In

APA, Harvard, Vancouver, ISO, and other styles

Min, Zeping, Qian Ge, and Zhong Li. "CAMP: A Unified Data Solution for Mandarin Speech Recognition Tasks." In Advances in Transdisciplinary Engineering. IOS Press, 2023. http://dx.doi.org/10.3233/atde230552.

Full text

Abstract:

Speech recognition, the transformation of spoken language into written text, is becoming increasingly vital across a broad range of applications. Despite the advancements in end-to-end Neural Network (NN) based speech recognition systems, the requirement for large volumes of annotated audio data tailored to specific scenarios remains a significant challenge. To address this, we introduce a novel approach, the Character Audio Mix-up (CAMP), which synthesizes scenario-specific audio data for Mandarin at a significantly reduced cost and effort. This method concatenates the audio segments of each

APA, Harvard, Vancouver, ISO, and other styles

Mihkla Meelis, Hein Indrek, and Kiissel Indrek. "Self-Reading Texts and Books." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2018. https://doi.org/10.3233/978-1-61499-912-6-79.

Full text

Abstract:

The rise of e-books, the cumulative digitisation of written library materials and the advancement of speech technology have reached a stage enabling library services and e-books to be read out loud to customers in synthetic speech and paper books (either published or still in print) to be delivered in the audio form. The user environment of the digital archive Digar of the Estonian National Library includes a special reading machine capable of producing an audio version of electronic texts in Estonian (books, magazines etc). The application of Elisa Raamat provides access to more than 2500 Est

APA, Harvard, Vancouver, ISO, and other styles

Koziol Wojciech, Wojtowicz Hubert, Sikora Kazimierz, and Wajs Wieslaw. "A System for Visualization of the Polish Sign Language Gestures." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2013. https://doi.org/10.3233/978-1-61499-262-2-70.

Full text

Abstract:

The paper describes a method for acquiring and visualizing the Polish Sign Language gestures along with mimic sub code. The software allowing visualization of sign language gestures is one of the modules of the system for the translation of texts written in the Polish language into appropriate messages of the sign language. Proper understanding of the information communicated in the sign language requires the information to be presented in the multipath manner. In addition to the ideographic communication, i.e. gestures of the sign language alone, also lip movement (many deaf people are able t

APA, Harvard, Vancouver, ISO, and other styles

Xu, Zhe, David John, and Anthony C. Boucouvalas. "Fuzzy Logic Usage in Emotion Communication of Human Machine Interaction." In Encyclopedia of Human Computer Interaction. IGI Global, 2006. http://dx.doi.org/10.4018/978-1-59140-562-7.ch036.

Full text

Abstract:

As the popularity of the Internet has expanded, an increasing number of people spend time online. More than ever, individuals spend time online reading news, searching for new technologies, and chatting with others. Although the Internet was designed as a tool for computational calculations, it has now become a social environment with computer-mediated communication (CMC). Picard and Healey (1997) demonstrated the potential and importance of emotion in human-computer interaction, and Bates (1992) illustrated the roles that emotion plays in user interactions with synthetic agents. Is emotion co

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Text-to-speech synthesis module"

Almahmood, Mohamed, Fatema Albalooshi, and Hesham Al-Ammal. "An Investigation in Bahraini Dialect Text to Speech Synthesis Models and Datasets." In 2024 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). IEEE, 2024. https://doi.org/10.1109/3ict64318.2024.10824238.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Huang, Rongjie, Max W. Y. Lam, Jun Wang, et al. "FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/577.

Full text

Abstract:

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the

APA, Harvard, Vancouver, ISO, and other styles

Ye, Zhenhui, Zhou Zhao, Yi Ren, and Fei Wu. "SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/620.

Full text

Abstract:

The recent progress in non-autoregressive text-to-speech (NAR-TTS) has made fast and high-quality speech synthesis possible. However, current NAR-TTS models usually use phoneme sequence as input and thus cannot understand the tree-structured syntactic information of the input sequence, which hurts the prosody modeling. To this end, we propose SyntaSpeech, a syntax-aware and light-weight NAR-TTS model, which integrates tree-structured syntactic information into the prosody modeling modules in PortaSpeech. Specifically, 1) We build a syntactic graph based on the dependency tree of the input sent

APA, Harvard, Vancouver, ISO, and other styles

Moya, Marcel Granero, Penny Karanasou, Sri Karlapati, et al. "A Comparative Analysis of Pretrained Language Models for Text-to-Speech." In 12th ISCA Speech Synthesis Workshop (SSW2023). ISCA, 2023. http://dx.doi.org/10.21437/ssw.2023-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Cornell, Samuele, Jordan Darefsky, Zhiyao Duan, and Shinji Watanabe. "Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition." In Synthetic Data’s Transformative Role in Foundational Speech Models. ISCA, 2024. http://dx.doi.org/10.21437/syndata4genai.2024-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rossenbach, Nick, Sakriani Sakti, and Ralf Schlüter. "On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition." In Synthetic Data’s Transformative Role in Foundational Speech Models. ISCA, 2024. http://dx.doi.org/10.21437/syndata4genai.2024-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Low, Phuay Hui, and Saeed Vaseghi. "Application of microprosody models in text to speech synthesis." In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-108.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Schimbinschi, Florin, Christian Walder, Sarah M. Erfani, and James Bailey. "SynthNet: Learning to Synthesize Music End-to-End." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/467.

Full text

Abstract:

We consider the problem of learning a mapping directly from annotated music to waveforms, bypassing traditional single note synthesis. We propose a specific architecture based on WaveNet, a convolutional autoregressive generative model designed for text to speech. We investigate the representations learned by these models on music and concludethat mappings between musical notes and the instrument timbre can be learned directly from the raw audio coupled with the musical score, in binary piano roll format.Our model requires minimal training data (9 minutes), is substantially better in quality a

APA, Harvard, Vancouver, ISO, and other styles

Kayyar, Kishor, Christian Dittmar, Nicola Pia, and Emanuel Habets. "Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests." In 12th ISCA Speech Synthesis Workshop (SSW2023). ISCA, 2023. http://dx.doi.org/10.21437/ssw.2023-30.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Prasanna, Dhruv, Avinash Nithyashree, Namith V. Shetty, Praharsha Kosuri, and Pavan A C. "Speech to Speech Translation for English and Hindi with Speaker Preservation." In 2nd International Conference on Emerging Applications of Artificial Intelligence, Machine Learning and Cybersecurity. AIJR Publisher, 2025. https://doi.org/10.21467/proceedings.178.7.

Full text

Abstract:

This paper presents an advanced speech to speech translation system designed to facilitate accurate communication between English and Hindi speakers with near real time responses while preserving the original voice of the speaker. The system uses a cascaded architecture consisting of Automatic Speech Recognition (ASR), Machine Translation (MT), and Text to Speech (TTS) components. The resulting system is able to accurately translate between English speech and Hindi speech and vice versa. The techniques shown attempt to tackle the difficulties brought on by the different language structures and

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Text-to-speech synthesis module'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Text-to-speech synthesis module"

Dissertations / Theses on the topic "Text-to-speech synthesis module"

Books on the topic "Text-to-speech synthesis module"

Book chapters on the topic "Text-to-speech synthesis module"

Conference papers on the topic "Text-to-speech synthesis module"