Academic literature on the topic 'Text-to-speech synthesis module'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Text-to-speech synthesis module.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Text-to-speech synthesis module"

1

SPROAT, RICHARD. "Multilingual text analysis for text-to-speech synthesis." Natural Language Engineering 2, no. 4 (1996): 369–80. http://dx.doi.org/10.1017/s1351324997001654.

Full text
Abstract:
We present a model of text analysis for text-to-speech (TTS) synthesis based on (weighted) finite state transducers, which serves as the text analysis module of the multilingual Bell Labs TTS system. The transducers are constructed using a lexical toolkit that allows declarative descriptions of lexicons, morphological rules, numeral-expansion rules, and phonological rules, inter alia. To date, the model has been applied to eight languages: Spanish, Italian, Romanian, French, German, Russian, Mandarin and Japanese.
APA, Harvard, Vancouver, ISO, and other styles
2

Hu, Weixin, and Xianyou Zhu. "A real-time voice cloning system with multiple algorithms for speech quality improvement." PLOS ONE 18, no. 4 (2023): e0283440. http://dx.doi.org/10.1371/journal.pone.0283440.

Full text
Abstract:
With the development of computer technology, speech synthesis techniques are becoming increasingly sophisticated. Speech cloning can be performed as a subtask of speech synthesis technology by using deep learning techniques to extract acoustic information from human voices and combine it with text to output a natural human voice. However, traditional speech cloning technology still has certain limitations; excessively large text inputs cannot be adequately processed, and the synthesized audio may include noise artifacts like breaks and unclear phrases. In this study, we add a text determinatio
APA, Harvard, Vancouver, ISO, and other styles
3

Yu, Hong Zhi, Jin Xi Zhang, Guang Rong Shan, and Ning Ma. "Research on Tibetan Language Synthesis System Front-End Text Processing Technology Based on HMM." Applied Mechanics and Materials 411-414 (September 2013): 308–12. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.308.

Full text
Abstract:
The standardization of the text, word segmentation, the basic stitching unit divided for rhythm analysis and pronunciation conversion is an important content of the speech synthesis system front-end text processing modules. Lhasa Tibetan language and voice characteristics proposed the implementation of a set of Tibetan speech synthesis text analysis module to analyze and describe the Lhasa Tibetan language layer information and maps voice layer. The completion of the study is to lay a solid foundation for further Tibetan speech synthesis system.
APA, Harvard, Vancouver, ISO, and other styles
4

Phukon, Debasish. "A Deep Learning Approach for ASL Recognition and Text-to-Speech Synthesis using CNN." International Journal for Research in Applied Science and Engineering Technology 11, no. 8 (2023): 2135–43. http://dx.doi.org/10.22214/ijraset.2023.55528.

Full text
Abstract:
Abstract: Sign language is a visual language that is used by the deaf and hard-of-hearing community to communicate. However, sign language is not universally understood by non-signers, which can create communication barriers for the deaf and hard-ofhearing individuals. In this paper, we present a novel application for American Sign Language (ASL) to text to speech conversion using deep learning techniques. Our app aims to bridge the communication gap between hearing-impaired individuals who use ASL as their primary mode of communication and individuals who do not understand ASL. The app compri
APA, Harvard, Vancouver, ISO, and other styles
5

Wu, Chung Hsien. "Unit selection module and method of chinese text-to-speech synthesis." Journal of the Acoustical Society of America 127, no. 5 (2010): 3294. http://dx.doi.org/10.1121/1.3432304.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

A, Arulprakash, Synthiya M, and Rajabhusanam C. "Tamil Speech Synthesizer App for Android: Text Processing Module Enhancement." Indian Journal of Science and Technology 16, no. 7 (2023): 485–91. https://doi.org/10.17485/IJST/v16i7.2165.

Full text
Abstract:
ABSTRACT <strong>Objectives:</strong>&nbsp;Designing dynamic computer systems that are effective, efficient, simple, and satisfying to use is becoming extremely important in this age of information and communication technology. Text to Speech or Speech Synthesis is one of the many methods being investigated by researchers to improve Human-Computer Interaction. The goal here is to improve the text processing component of the Tamil voice synthesizer by including a text normalizer and loan word identification that is efficient and reliable.&nbsp;<strong>Methods:</strong>&nbsp;Text normalization i
APA, Harvard, Vancouver, ISO, and other styles
7

Zhao, Wei, and Zheng Yang. "An Emotion Speech Synthesis Method Based on VITS." Applied Sciences 13, no. 4 (2023): 2225. http://dx.doi.org/10.3390/app13042225.

Full text
Abstract:
People and things can be connected through the Internet of Things (IoT), and speech synthesis is one of the key technologies. At this stage, end-to-end speech synthesis systems are capable of synthesizing relatively realistic human voices, but the current commonly used parallel text-to-speech suffers from loss of useful information during the two-stage delivery process, and the control features of the synthesized speech are monotonous, with insufficient expression of features, including emotion, leading to emotional speech synthesis becoming a challenging task. In this paper, we propose a new
APA, Harvard, Vancouver, ISO, and other styles
8

Gudi, Ganga, Mallamma V. Reddy, and Hanumanthappa M. "Enhancing Kannada text-to-speech and braille conversion with deep learning for the visually impaired." Scientific Temper 16, Spl-1 (2025): 48–52. https://doi.org/10.58414/scientifictemper.2025.16.spl-1.06.

Full text
Abstract:
Advancements in assistive technology have greatly improved accessibility for visually impaired individuals, enabling seamless interaction with textual content. This research introduces a novel approach that converts Kannada text into both speech and Braille, promoting multilingual accessibility. The proposed system incorporates a support vector machine (SVM) for Kannada text-to-Braille conversion and a deep learning-based text-to-speech (TTS) model for speech synthesis. The Braille translation module accurately maps Kannada characters to their respective Braille representations using SVM class
APA, Harvard, Vancouver, ISO, and other styles
9

MALLISHWARI, N. "Implementation of the Image Text to Speech Conversion in the Desired Language by Translating with Raspberry Pi." International Scientific Journal of Engineering and Management 04, no. 06 (2025): 1–9. https://doi.org/10.55041/isjem04635.

Full text
Abstract:
ABSTRACT: The main problem in communication is language bias between the communicators. This device basically can be used by people who do not know English and want it to be translated to their native language. The novelty component of this research work is the speech output which is available in 53 different languages translated from English. This paper is based on a prototype which helps user to hear the contents of the text images in the desired language. It involves extraction of text from the image and converting the text to translated speech in the user desired language. This is done wit
APA, Harvard, Vancouver, ISO, and other styles
10

Popovic, Branislav, Dragan Knezevic, Milan Secujski, and Darko Pekar. "Automatic prosody generation in a text-to-speech system for Hebrew." Facta universitatis - series: Electronics and Energetics 27, no. 3 (2014): 467–77. http://dx.doi.org/10.2298/fuee1403467p.

Full text
Abstract:
The paper presents the module for automatic prosody generation within a system for automatic synthesis of high-quality speech based on arbitrary text in Hebrew. The high quality of synthesis is due to the high accuracy of automatic prosody generation, enabling the introduction of elements of natural sentence prosody of Hebrew. Automatic morphological annotation of text is based on the application of an expert algorithm relying on transformational rules. Syntactic-prosodic parsing is also rule based, while the generation of the acoustic representation of prosodic features is based on classifica
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Text-to-speech synthesis module"

1

Siniša, Suzić. "Parametarska sinteza ekspresivnog govora." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2019. https://www.cris.uns.ac.rs/record.jsf?recordId=110631&source=NDLTD&language=en.

Full text
Abstract:
U disertaciji su opisani postupci sinteze ekspresivnog govorakorišćenjem parametarskih pristupa. Pokazano je da se korišćenjemdubokih neuronskih mreža dobijaju bolji rezultati nego korišćenjemskrivenix Markovljevih modela. Predložene su tri nove metode zasintezu ekspresivnog govora korišćenjem dubokih neuronskih mreža:metoda kodova stila, metoda dodatne obuke mreže i arhitekturazasnovana na deljenim skrivenim slojevima. Pokazano je da se najboljirezultati dobijaju korišćenjem metode kodova stila. Takođe jepredložana i nova metoda za transplantaciju emocija/stilovabazirana na deljenim skrivenim
APA, Harvard, Vancouver, ISO, and other styles
2

Van, Niekerk Daniel Rudolph. "Automatic speech segmentation with limited data / by D.R. van Niekerk." Thesis, North-West University, 2009. http://hdl.handle.net/10394/3978.

Full text
Abstract:
The rapid development of corpus-based speech systems such as concatenative synthesis systems for under-resourced languages requires an efficient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility, while automation of this process has only been satisfactorily demonstrated on large corpora of a select few languages by employing techniques requiring extensive and specialised resources. In this work we cons
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Text-to-speech synthesis module"

1

Dutoit, Thierry, and Yannis Stylianou. Text-to-Speech Synthesis. Edited by Ruslan Mitkov. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0017.

Full text
Abstract:
This article gives an introduction to state-of-the-art text-to-speech (TTS) synthesis systems, showing both the natural language processing and the digital signal processing problems involved. Text-to-speech (TTS) synthesis is the art of designing talking machines. The article begins with brief user-oriented description of a general TTS system and comments on its commercial applications. It then gives a functional diagram of a modern TTS system, highlighting its components. It describes its morphosyntactic module. Furthermore, it examines why sentence-level phonetization cannot be achieved by
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Text-to-speech synthesis module"

1

Samyuktha, K., Shevani V. J., S. Swetha, Yalini Sri N., and P. Manohari. "AI and Quantum Network Application in Business and Medicine, Deep Voice Synthesis, and Personalized Narration." In Advances in Computational Intelligence and Robotics. IGI Global, 2024. https://doi.org/10.4018/979-8-3693-8135-9.ch019.

Full text
Abstract:
This work offers a novel deep learning-based method for voice synthesis with an emphasis on customized audio generation from text documents and personalized narration. The suggested system is divided into many modules, the first of which is text pre-processing to guarantee input that is clean and structured. Understanding the text's syntactic and semantic elements is aided by further linguistic analysis. In order to create phonemes for phoneme synthesis—a process in which deep learning models, AI and quantum produce speech sounds—written text must first be converted into phonemes. These phonem
APA, Harvard, Vancouver, ISO, and other styles
2

López-González, Erika,. "Wizard based on natural language processing for Java programming language." In Computer Technology and Innovation. ECORFAN, 2023. http://dx.doi.org/10.35429/h.2023.13.53.68.

Full text
Abstract:
Smart assistants are a technology that has become very popular today, due to the multiple functions they have, and somehow allow a natural interaction between devices and human beings. The objective is to develop an assistant based on NLP (Natural Language Processing) focused on the Java programming language, with the feature of guiding the user on the use of this language from reliable and validated sources, through speech recognition and speech synthesis by the assistant, allowing communication between the software and the user through natural language, in order to make the time in the consu
APA, Harvard, Vancouver, ISO, and other styles
3

Nurk Tõnis. "Creation of HMM-based Speech Model for Estonian Text-to-Speech Synthesis." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2012. https://doi.org/10.3233/978-1-61499-133-5-162.

Full text
Abstract:
The article describes the creation of Hidden Markov Model based speech models for both male and female voice for Estonian text-to-speech synthesis. A brief overview of text-to-speech synthesis process is given, focusing on statistical parametric synthesis in particular. System HTS is employed to generate voice models. The creation of speech corpus of Institute of the Estonian Language is analyzed. The process of adapting Estonian-related training data and linguistic specification to HTS is described, as well as experiments carried out on data from different speakers, subcorpora and linguistic
APA, Harvard, Vancouver, ISO, and other styles
4

Singh, Harman, Parminder Singh, and Manjot Kaur Gill. "Statistical Parametric Speech Synthesis for Punjabi Language using Deep Neural Network." In SCRS CONFERENCE PROCEEDINGS ON INTELLIGENT SYSTEMS. Soft Computing Research Society, 2021. http://dx.doi.org/10.52458/978-93-91842-08-6-41.

Full text
Abstract:
In recent years, speech technology gets very advanced, due to which speech synthesis becomes an interesting area of study for researchers. Text-To-Speech (TTS) system generates the speech from the text by using a synthesized technique like concatenative, formant, articulatory, Statistical Parametric Speech Synthesis (SPSS) etc. The Deep Neural Network (DNN) based SPSS for the Punjabi language is used in this research work. The database used for this research works contains 674 audio files and a single text file containing 674 sentences. This database was created at the Language Technologies In
APA, Harvard, Vancouver, ISO, and other styles
5

Min, Zeping, Qian Ge, and Zhong Li. "CAMP: A Unified Data Solution for Mandarin Speech Recognition Tasks." In Advances in Transdisciplinary Engineering. IOS Press, 2023. http://dx.doi.org/10.3233/atde230552.

Full text
Abstract:
Speech recognition, the transformation of spoken language into written text, is becoming increasingly vital across a broad range of applications. Despite the advancements in end-to-end Neural Network (NN) based speech recognition systems, the requirement for large volumes of annotated audio data tailored to specific scenarios remains a significant challenge. To address this, we introduce a novel approach, the Character Audio Mix-up (CAMP), which synthesizes scenario-specific audio data for Mandarin at a significantly reduced cost and effort. This method concatenates the audio segments of each
APA, Harvard, Vancouver, ISO, and other styles
6

Mihkla Meelis, Hein Indrek, and Kiissel Indrek. "Self-Reading Texts and Books." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2018. https://doi.org/10.3233/978-1-61499-912-6-79.

Full text
Abstract:
The rise of e-books, the cumulative digitisation of written library materials and the advancement of speech technology have reached a stage enabling library services and e-books to be read out loud to customers in synthetic speech and paper books (either published or still in print) to be delivered in the audio form. The user environment of the digital archive Digar of the Estonian National Library includes a special reading machine capable of producing an audio version of electronic texts in Estonian (books, magazines etc). The application of Elisa Raamat provides access to more than 2500 Est
APA, Harvard, Vancouver, ISO, and other styles
7

Koziol Wojciech, Wojtowicz Hubert, Sikora Kazimierz, and Wajs Wieslaw. "A System for Visualization of the Polish Sign Language Gestures." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2013. https://doi.org/10.3233/978-1-61499-262-2-70.

Full text
Abstract:
The paper describes a method for acquiring and visualizing the Polish Sign Language gestures along with mimic sub code. The software allowing visualization of sign language gestures is one of the modules of the system for the translation of texts written in the Polish language into appropriate messages of the sign language. Proper understanding of the information communicated in the sign language requires the information to be presented in the multipath manner. In addition to the ideographic communication, i.e. gestures of the sign language alone, also lip movement (many deaf people are able t
APA, Harvard, Vancouver, ISO, and other styles
8

Xu, Zhe, David John, and Anthony C. Boucouvalas. "Fuzzy Logic Usage in Emotion Communication of Human Machine Interaction." In Encyclopedia of Human Computer Interaction. IGI Global, 2006. http://dx.doi.org/10.4018/978-1-59140-562-7.ch036.

Full text
Abstract:
As the popularity of the Internet has expanded, an increasing number of people spend time online. More than ever, individuals spend time online reading news, searching for new technologies, and chatting with others. Although the Internet was designed as a tool for computational calculations, it has now become a social environment with computer-mediated communication (CMC). Picard and Healey (1997) demonstrated the potential and importance of emotion in human-computer interaction, and Bates (1992) illustrated the roles that emotion plays in user interactions with synthetic agents. Is emotion co
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Text-to-speech synthesis module"

1

Almahmood, Mohamed, Fatema Albalooshi, and Hesham Al-Ammal. "An Investigation in Bahraini Dialect Text to Speech Synthesis Models and Datasets." In 2024 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). IEEE, 2024. https://doi.org/10.1109/3ict64318.2024.10824238.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Huang, Rongjie, Max W. Y. Lam, Jun Wang, et al. "FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/577.

Full text
Abstract:
Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the
APA, Harvard, Vancouver, ISO, and other styles
3

Ye, Zhenhui, Zhou Zhao, Yi Ren, and Fei Wu. "SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/620.

Full text
Abstract:
The recent progress in non-autoregressive text-to-speech (NAR-TTS) has made fast and high-quality speech synthesis possible. However, current NAR-TTS models usually use phoneme sequence as input and thus cannot understand the tree-structured syntactic information of the input sequence, which hurts the prosody modeling. To this end, we propose SyntaSpeech, a syntax-aware and light-weight NAR-TTS model, which integrates tree-structured syntactic information into the prosody modeling modules in PortaSpeech. Specifically, 1) We build a syntactic graph based on the dependency tree of the input sent
APA, Harvard, Vancouver, ISO, and other styles
4

Moya, Marcel Granero, Penny Karanasou, Sri Karlapati, et al. "A Comparative Analysis of Pretrained Language Models for Text-to-Speech." In 12th ISCA Speech Synthesis Workshop (SSW2023). ISCA, 2023. http://dx.doi.org/10.21437/ssw.2023-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Cornell, Samuele, Jordan Darefsky, Zhiyao Duan, and Shinji Watanabe. "Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition." In Synthetic Data’s Transformative Role in Foundational Speech Models. ISCA, 2024. http://dx.doi.org/10.21437/syndata4genai.2024-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Rossenbach, Nick, Sakriani Sakti, and Ralf Schlüter. "On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition." In Synthetic Data’s Transformative Role in Foundational Speech Models. ISCA, 2024. http://dx.doi.org/10.21437/syndata4genai.2024-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Low, Phuay Hui, and Saeed Vaseghi. "Application of microprosody models in text to speech synthesis." In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-108.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Schimbinschi, Florin, Christian Walder, Sarah M. Erfani, and James Bailey. "SynthNet: Learning to Synthesize Music End-to-End." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/467.

Full text
Abstract:
We consider the problem of learning a mapping directly from annotated music to waveforms, bypassing traditional single note synthesis. We propose a specific architecture based on WaveNet, a convolutional autoregressive generative model designed for text to speech. We investigate the representations learned by these models on music and concludethat mappings between musical notes and the instrument timbre can be learned directly from the raw audio coupled with the musical score, in binary piano roll format.Our model requires minimal training data (9 minutes), is substantially better in quality a
APA, Harvard, Vancouver, ISO, and other styles
9

Kayyar, Kishor, Christian Dittmar, Nicola Pia, and Emanuel Habets. "Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests." In 12th ISCA Speech Synthesis Workshop (SSW2023). ISCA, 2023. http://dx.doi.org/10.21437/ssw.2023-30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Prasanna, Dhruv, Avinash Nithyashree, Namith V. Shetty, Praharsha Kosuri, and Pavan A C. "Speech to Speech Translation for English and Hindi with Speaker Preservation." In 2nd International Conference on Emerging Applications of Artificial Intelligence, Machine Learning and Cybersecurity. AIJR Publisher, 2025. https://doi.org/10.21467/proceedings.178.7.

Full text
Abstract:
This paper presents an advanced speech to speech translation system designed to facilitate accurate communication between English and Hindi speakers with near real time responses while preserving the original voice of the speaker. The system uses a cascaded architecture consisting of Automatic Speech Recognition (ASR), Machine Translation (MT), and Text to Speech (TTS) components. The resulting system is able to accurately translate between English speech and Hindi speech and vice versa. The techniques shown attempt to tackle the difficulties brought on by the different language structures and
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!