Log in

Relevant bibliographies by topics / Speech-to-text (STT) / Journal articles

To see the other types of publications on this topic, follow the link: Speech-to-text (STT).

Journal articles on the topic 'Speech-to-text (STT)'

Author: Grafiati

Published: 4 June 2025

Last updated: 1 August 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speech-to-text (STT).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Duc, Chung Tran, Long Nguyen Duc, and Fadzil Hassan Mohd. "Development and testing of an FPT.AI-based voicebot." Bulletin of Electrical Engineering and Informatics 9, no. 6 (2020): 2388–95. https://doi.org/10.11591/eei.v9i6.2620.

Full text

Abstract:

In recent years, voicebot has become a popular communication tool between humans and machines. In this paper, we will introduce our voicebot integrating text-to-speech (TTS) and speech-to-text (STT) modules provided by FPT.AI. This voicebot can be considered as a critical improvement of a typical chatbot because it can respond to human’s queries by both text and speech. FPT Open Speech, LibriSpeech datasets, and music files were used to test the accuracy and performance of the STT module. For the TTS module, it was tested by using text on news pages in both Vietnamese and English. To tes

APA, Harvard, Vancouver, ISO, and other styles

2

Journal, IJSREM. "A Review on Speech-to-Text." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 03 (2024): 1–13. http://dx.doi.org/10.55041/ijsrem29004.

Full text

Abstract:

The current era represents the global apex of groundbreaking advances in artificial intelligence (AI) technology, especially in the field of speech-to-text (STT).This review article focuses on the development of human skills through smooth, natural language interaction between people and robots, providing a thorough overview and exploration of the impressive advancements made in recent years. The study outlines a model intended to reinvent human-computer interaction, highlighting its ability to translate spoken language into text and carry out commands via a conversational, dynamic interface.

APA, Harvard, Vancouver, ISO, and other styles

3

Barkovska, Olesia. "RESEARCH INTO SPEECH-TO-TEXT TRANFROMATION MODULE IN THE PROPOSED MODEL OF A SPEAKER’S AUTOMATIC SPEECH ANNOTATION." Innovative Technologies and Scientific Solutions for Industries, no. 4 (22) (December 31, 2022): 5–13. http://dx.doi.org/10.30837/itssi.2022.22.005.

Full text

Abstract:

The subject matter of the article is the module for converting the speaker’s speech into text in the proposed model of automatic annotation of the speaker’s speech, which has become more and more popular in Ukraine in the last two years, due to the active transition to an online form of communication and education as well as conducting workshops, interviews and discussing urgent issues. Furthermore, the users of personal educational platforms are not always able to join online meetings on time due to various reasons (one example can be a blackout), which explains the need to save the speakers’

APA, Harvard, Vancouver, ISO, and other styles

4

B, Mupini, Chaputsira S, and Sibanda Bk. "Survey on Speech to Text Modelling for the Shona Language." Survey on Speech to Text Modelling for the Shona Language 9, no. 1 (2024): 4. https://doi.org/10.5281/zenodo.10609671.

Full text

Abstract:

Conversion of speech to text (STT) for various applications is of huge interest, which involves technological approaches which are innovative that should be applied to accommodate spoken languages in Africa. However, African countries are falling behind on the embracing of STT technologies, with Automatic Speech Recognition (ASR) having been done for popular East African languages. This has always kept transcription at a minimum and has also resulted in a  retard in the use of many African languages on a world- wide scale, with another problem being that a single  African language ma

APA, Harvard, Vancouver, ISO, and other styles

5

Yang, Hui Jae, Eun-Byel Oh, and Jung-Mee Kim. "Comparison of Automatic Speech Recognition System for School-aged Children’s Narratives: Naver Clova Speech and Google Speech-to-Text." Communication Sciences & Disorders 28, no. 1 (2023): 30–38. http://dx.doi.org/10.12963/csd.23952.

Full text

Abstract:

Objectives: Language sample analysis (LSA) is a critical component of child language assessment. However, most clinicians consider LSA to be time consuming work. In particular, transcription is seen as an overwhelming task. Due to rapid technological advances, various automatic speech recognition systems have been developed. This study aimed to investigate the accuracy and the characteristics of two automatic speech recognition programs, Naver Clova Speech (Naver Clova) and Google Speech-to-Text (STT).Methods: A total of 40 school-aged children with typical development (TD) and children with l

APA, Harvard, Vancouver, ISO, and other styles

6

Dinata, Candra, Diyah Puspitaningrum, and Ernawati Erna. "IMPLEMENTASI TEKNIK DYNAMIC TIME WARPING (DTW) PADA APLIKASI SPEECH TO TEXT." JURNAL TEKNIK INFORMATIKA 10, no. 1 (2018): 49–58. http://dx.doi.org/10.15408/jti.v10i1.6816.

Full text

Abstract:

ABSTRAK Suara/ucapan adalah salah satu cara kita sebagai manusia untuk berkomunikasi dan mengekspresikan diri. Speech to text (ucapan ke text), merupakan salah satu bidang sains computer yaitu bidang pengolahan suara. Speech to text (STT) adalah penerjemahan kalimat (kata yang diucapkan) ke dalam text. STT merupakan proses pengolahan suatu sinyal suara, mengekstrak fitur dari sinyal suara tersebut yang selanjutkan dibandingkan dengan hasil ekstraksi dari sinyal suara yang lain untuk dapat dikenali persamaannya. Penelitian ini merancang dan membangun suatu program aplikasi Speech to Text yang m

APA, Harvard, Vancouver, ISO, and other styles

7

G, Thimmaraja Yadava, G. Nagaraja B, Yogesh Kumaran S, C. Ramachandra A, and M. Arun Kumar N. "Development of Small Vocabulary Continuous Speech-to-Text System for Kannada Language/Dialects." Indian Journal of Science and Technology 15, no. 45 (2022): 2476–81. https://doi.org/10.17485/IJST/v15i45.1884.

Full text

Abstract:

Abstract <strong>Objectives:</strong> To develop a speech-to-text (STT) system using Kaldi speech recognition toolkit for continuous Kannada language/dialects. <strong>Methods:</strong> A continuous Kannada speech data is collected from 100 speakers/farmers of Karnataka state in field. The lexicon/dictionary and set of phonemes for Kannada language/dialects are created and transcribed the collected speech data using transcriber tool. The ASR models are developed at different phoneme levels using Kaldi. <strong>Findings:</strong> In this work, an effort is made to devel

APA, Harvard, Vancouver, ISO, and other styles

8

Schwarz, Nikolai, Khia A. Johnson, and Molly Babel. "Exploring the variable efficacy of Google speech-to-text with spontaneous bilingual speech in Cantonese and English." Journal of the Acoustical Society of America 150, no. 4 (2021): A357. http://dx.doi.org/10.1121/10.0008580.

Full text

Abstract:

With the growth of Automatic Speech Recognition (ASR) and voice user interface software, it is important to test for efficacy across different language varieties and identify sources of bias. Recent work assessing ASR efficacy and bias implicates factors like race, gender, dialect, and age as leading to different efficacy rates. Multilingualism presents another source of variation that ASR systems must grapple with, ranging from code-switching to phonetic variation both within and across speakers. Thus, variable ASR performance is likely exacerbated for multilingual communities. Using a sponta

APA, Harvard, Vancouver, ISO, and other styles

9

p, Ms SANDHUSTA,. "Speech -To -Text Translation Using Hugging Face Model." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem30348.

Full text

Abstract:

The internet has revolutionized communication, but it's particularly challenging for those with hearing impairments. A multilingual speech-to-text conversion system using Hugging Face algorithms is being developed to assist these individuals in various tasks. This system, which uses advanced neural network models and natural language processing algorithms, will translate spoken language into text, enabling seamless communication and engagement in various contexts, thereby enhancing the quality of life for those with hearing impairments. Keywords—Speech-to-Text (STT)

APA, Harvard, Vancouver, ISO, and other styles

10

Tran, Duc Chung, Duc Long Nguyen, and Mohd Fadzil Hassan. "Development and testing of an FPT.AI-based voicebot." Bulletin of Electrical Engineering and Informatics 9, no. 6 (2020): 2388–95. http://dx.doi.org/10.11591/eei.v9i6.2620.

Full text

Abstract:

In recent years, voicebot has become a popular communication tool between humans and machines. In this paper, we will introduce our voicebot integrating text-to-speech (TTS) and speech-to-text (STT) modules provided by FPT.AI. This voicebot can be considered as a critical improvement of a typical chatbot because it can respond to human’s queries by both text and speech. FPT Open Speech, LibriSpeech datasets, and music files were used to test the accuracy and performance of the STT module. For the TTS module, it was tested by using text on news pages in both Vietnamese and English. To test the

APA, Harvard, Vancouver, ISO, and other styles

11

Kostek, Bozena. "Enunciation—An important factor in speech-to-text medical transcription systems." Journal of the Acoustical Society of America 156, no. 4_Supplement (2024): A126. https://doi.org/10.1121/10.0035339.

Full text

Abstract:

This study aims to explore the extent to which enunciation plays a crucial role in a speech-to-text (STT) system, especially when dealing with medical terminology. To achieve this, an audio dataset was recorded containing Polish medical terms and spoken diagnoses pronounced by healthcare professionals, including general practitioners and specialists in various fields such as cardiology, pulmonology, and radiology. The next step involved comprehensive acoustical and lexical analyses of the audio recordings. Features such as harmonic-to-noise ratio, spectral tilt, zero-crossing rate, formant dis

APA, Harvard, Vancouver, ISO, and other styles

12

Khasawneh, Mohmmad. "Study on the effectiveness of speech-to-text technology in supporting writing skills among special education students in Saudi Arabia." Journal of Infrastructure, Policy and Development 8, no. 6 (2024): 4611. http://dx.doi.org/10.24294/jipd.v8i6.4611.

Full text

Abstract:

This study aims to assess the efficacy of speech-to-text (STT) technology in improving the writing abilities of special education pupils in Saudi Arabia. A deliberate sample of 150 special education college students was selected, with participants randomly allocated to either an experimental group employing STT technology or a control group using traditional writing methods. The study utilized a comprehensive approach, which included standardized writing assessments, questionnaires, and statistical analyses such as t-tests, correlation, regression, ANOVA, and ANCOVA. The results demonstrate a

APA, Harvard, Vancouver, ISO, and other styles

13

Rodiah, Diana Tri Susetianingtias, and Eka Patriya. "Identifikasi Fitur Suara Menggunakan Model Convolutional Neural Network (CNN) pada Speech-to-Text (STT)." Decode: Jurnal Pendidikan Teknologi Informasi 4, no. 3 (2024): 809–20. https://doi.org/10.51454/decode.v4i3.631.

Full text

Abstract:

Identifikasi pola ucapan dilakukan untuk dapat mengenali kata yang diucapkan. Salah satu metode yang dapat digunakan untuk mengidentifikasi Speech-to-Text (STT) adalah dengan menggunakan Convolutional Neural Network (CNN). Penelitian ini menggunakan metode CNN untuk mengidentifikasi STT pada raw speech dari sejumlah 23000 data dari open dataset suara Kaggle. Tahap awal dilakukan resampling durasi, untuk mengambil data rekaman yang memiliki durasi yang cukup untuk masuk dalam proses selanjutnya yaitu inisialisasi frekuensi. Tahap ini mengubah frekuensi asli dari suara rekaman. Inisialisasi dila

APA, Harvard, Vancouver, ISO, and other styles

14

Baig, Prof Mirza Moiz. "An Automated Video Language Translator using STT-TTT-TTS Translation." International Journal for Research in Applied Science and Engineering Technology 13, no. 4 (2025): 5935–40. https://doi.org/10.22214/ijraset.2025.69786.

Full text

Abstract:

Advancements in Natural Language Processing (NLP) have significantly improved multilingual communication through machine translation, text-to-speech conversion, and cross-language information retrieval (CLIR) [1]-[5]. Various approaches, including rule-based and statistical models, enhance translation accuracy and language identification [6]-[8]. Neural machine translation (NMT) and deep learning techniques further refine speech recognition and sentiment analysis [9]- [12]. Structural differences in languages, such as Subject-Verb-Object (SVO) versus Subject-Object-Verb (SOV) order, influence

APA, Harvard, Vancouver, ISO, and other styles

15

Shavkatov, Olimboy. "AUDIO SPECTROGRAM TRANSFORMER (AST): ADVANTAGES OVER TRADITIONAL ALGORITHMS IN SPEECH-TO-TEXT (STT)." Sanitary-epidemiological welfare and public health committee of the Republic of Uzbekistan 2, no. 1 (2024): 182–88. http://dx.doi.org/10.62209/spj/vol3_iss3-4/art30.

Full text

Abstract:

Automatic Speech Recognition (ASR) has seen significant advancements in recent years, largely due to the development of deep learning models. One of the most notable advancements is the Spectrogram Transformer, a variant of the Transformer architecture tailored for audio processing tasks. In this paper, we review the Spectrogram Transformer and compare it with other traditional ASR algorithms. We discuss its benefits, such as improved performance on noisy audio and better modeling of long-range dependencies. Additionally, we explore its applications in various domains, including voice assistan

APA, Harvard, Vancouver, ISO, and other styles

16

Tiwari, Kartik. "Deep Learning Based TTS-STT Model with Transliteration for Indic Languages." International Journal for Research in Applied Science and Engineering Technology 9, no. 12 (2021): 2207–13. http://dx.doi.org/10.22214/ijraset.2021.39689.

Full text

Abstract:

Abstract: This paper introduces a new text-to-speech presentation from end-to-end (E2E-TTS) using toolkit called ESPnet-TTS, which is an open source extension. ESPnet speech processing tools kit. Various models come under ESPnet TTS TacoTron 2, Transformer TTS, and Fast Speech. This also provides recipes recommended by the Kaldi speech recognition tool kit (ASR). Recipes based on the composition combined with the ESPnet ASR recipe, which provides high performance. This toolkit also provides pre-trained models and samples of all recipes for users to use as a base .It works on TTS-STT and transl

APA, Harvard, Vancouver, ISO, and other styles

17

Lee, Chaeyoung, and Ji-hyung Kim. "A Study on the Effect of Pronunciation Learning Using STT Technology on Learners’ Pronunication Anxiety: A Case of French University Students Majoring in Korean Language." Korean Society of Bilingualism 95 (March 30, 2024): 149–75. https://doi.org/10.17296/korbil.2024..95.149.

Full text

Abstract:

This study investigates the effect of learning pronunciation with Speech-to-Text (STT) technology on KFL learners’ pronunciation anxiety. Using STT technology in a KFL speaking class significantly reduced beginner learners’ pronunciation anxiety. Specifically, anxiety related to speaking Korean in the classroom environment was greatly reduced, but anxiety related to peer comparison or communication with native Korean speakers was relatively less affected. Surveys and interviews with learners indicated that they had positive perceptions of the use of STT technology, particularly in terms of the

APA, Harvard, Vancouver, ISO, and other styles

18

Payton, Gaea M., Jen McLachlan, Brandy Weiss, and Mo Rahman. "Telephony Speech-To-Text: An Adequate Analog to Internet Protocol Caption Telephone Services." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 61, no. 1 (2017): 125–29. http://dx.doi.org/10.1177/1541931213601515.

Full text

Abstract:

Hearing loss is an invisible but significant barrier in daily life, including telephone conversations. Internet Protocol Caption Telephone Services (IP CTS) is a telecommunications relay service for an individual who can speak, but who has difficulty hearing over the telephone. An individual can use a telephone and an IP-enabled device to listen to the other party and simultaneously read transcriptions of the other party’s words. This article presents the results from a usability assessment of IP CTS devices and alternative speech recognition technologies to provide qualitative and quantitativ

APA, Harvard, Vancouver, ISO, and other styles

19

Amusa, Kamoli Akinwale, Tolulope Christiana Erinosho, Olufunke Olubusola Nuga, and Abdulmatin Olalekan Omotoso. "YorubaAI: Bridging Language Barrier with Advanced Language Models." Journal of Applied Artificial Intelligence 6, no. 1 (2025): 39–52. https://doi.org/10.48185/jaai.v6i1.1474.

Full text

Abstract:

YorubaAI addresses the digital divide caused by language barriers, particularly for Yoruba language speakers who struggle to interact with advanced large language models (LLMs) like GPT-4, which primarily support high-resource languages. This study develops a system, named YorubaAI, for seamless communication in Yoruba language with LLMs. The YorubaAI enables users to input and receive responses in Yoruba language, both in text and audio formats. To achieve this, a speech-to-text (STT) model is fine-tuned for automatic Yoruba language speech recognition while a text-to-speech (TTS) model is em

APA, Harvard, Vancouver, ISO, and other styles

20

Kraft, Sanna, Vibeke Rønneberg, John Rack, Fredrik Thurfjell та Åsa Wengelin. "Exploring transcription processes when children with and without reading and writing difﬁculties produce written text using speech recognition". L1-Educational Studies in Language and Literature 23 (1 липня 2023): 1–28. http://dx.doi.org/10.21248/l1esll.2023.23.1.427.

Full text

Abstract:

The aim of this study was to investigate composition and error-correction processes, and their relationship with production rate, in children, age 10-12, with and without reading and writing difficulties using speech-to-text (STT) to write expository texts in Swedish. Measures of individual abilities: working memory, spelling ability and decoding ability, and the ability to interact with the STT tool under optimal conditions (STT success rate) were collected. For both those with and without difficulties, neither working memory, nor spelling or decoding ability predicted burst length nor

APA, Harvard, Vancouver, ISO, and other styles

21

Thai, Kamakshi. "AUDIO EMAIL NAVIGATOR USING NATURAL LANGUAGE PROCESSING(NLP)." International Scientific Journal of Engineering and Management 04, no. 05 (2025): 1–9. https://doi.org/10.55041/isjem03459.

Full text

Abstract:

ABSTRACT: In the modern era of technology, email communication is essential, and efficiency in managing emails is crucial for users. Audio Email Navigator, a web-based application designed to enhance email accessibility through voice-based interaction. The system enables users to compose, read, and manage emails using speech commands, providing a hands-free experience. Leveraging speech-to-text and text-to-speech technologies, the platform ensures seamless communication without the need for manual typing. It integrates natural language processing for accurate voice recognition and secure authe

APA, Harvard, Vancouver, ISO, and other styles

22

Thai, Kamakshi. "A SURVEY PAPER ON AUDIO EMAIL NAVIGATOR USING NATURAL LANGUAGE PROCESSING(NLP)." International Scientific Journal of Engineering and Management 04, no. 05 (2025): 1–9. https://doi.org/10.55041/isjem03684.

Full text

Abstract:

ABSTRACT In the modern era of technology, email communication is essential, and efficiency in managing emails is crucial for users. Audio Email Navigator, a web-based application designed to enhance email accessibility through voice-based interaction. The system enables users to compose, read, and manage emails using speech commands, providing a hands-free experience. Leveraging speech-to-text and text-to-speech technologies, the platform ensures seamless communication without the need for manual typing. It integrates natural language processing for accurate voice recognition and secure authen

APA, Harvard, Vancouver, ISO, and other styles

23

Alen Thomas, Alen Thomas, Rosu J. Edanad Rosu J Edanad, Sourav J. Raju Sourav J Raju, Suchitra NT Suchitra NT, and Syeatha Merlin Thampy Syeatha Merlin Thampy. "A Comprehensive Review of Speech-ToText Technologies: Incorporating Translation, Summarization, And Beyond." International Journal of Advances in Engineering and Management 06, no. 12 (2024): 164–68. https://doi.org/10.35629/5252-0612164168.

Full text

Abstract:

—Speech-to-text technologies have transformed the way humans interact with computers by allowing spoken lan- guage to be transcribed effortlessly into written form. In addition to transcribing, incorporating translation and summarization features boosts the accessibility and usefulness of speech-to-text systems in various languages and fields. This paper investigates the changing STT technology landscape, focusing on its role in live communication, processing multiple languages, and creat- ing content. The paper emphasizes the possibility of advanced applications in education, business, and ac

APA, Harvard, Vancouver, ISO, and other styles

24

Pawar, Prof Roshani V. "The Way to make Blind People Use E-Mail System: Voice Based E-Mail Generating System Using Artificial Intelligence." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 07, no. 10 (2023): 1–11. http://dx.doi.org/10.55041/ijsrem26574.

Full text

Abstract:

In today's world, effective communication is essential for connecting with others. Communication technologies play a vital role in improving social and personal interactions. When combined with the internet, these technologies make communication convenient. However, individuals who are physically challenged face difficulties in utilizing such technologies due to visual and physical impairments. While there have been advancements in technology, they are often inaccessible to these individuals. This paper aims to address this issue by creating an email system that is user-friendly and accessible

APA, Harvard, Vancouver, ISO, and other styles

25

Iounousse, Jawad, and Omar Temsamani. "Development of an intelligent virtual assistant for digitalization of Moroccan agriculture." ITM Web of Conferences 69 (2024): 01003. https://doi.org/10.1051/itmconf/20246901003.

Full text

Abstract:

This paper presents the design, development, and implementation of an innovative text-to-text chatbot system aimed at digitalizing the agriculture sector in Morocco, with a focus on supporting Darija-speaking farmers. The project also encompasses the curation of a comprehensive database to facilitate future fine-tuning of Speech-to-Text (STT) and Text-to-Speech (TTS) models in Darija. The project’s primary objective is the development of chatbot capable of responding to farmers’ text queries in Darija, providing them with instant access to critical agricultural information and support. Concurr

APA, Harvard, Vancouver, ISO, and other styles

26

Barkovska, Olesia, Heorhii Ivashchenko, Dmytro Rosinskiy, and Daniil Zakharov. "EDUCATIONAL TRAINING SIMULATOR FOR MONITORING READING TECHNIQUE AND SPEED BASED ON SPEECH-TO-TEXT (STT) METHODS." Information Technologies and Learning Tools 103, no. 5 (2024): 21–38. http://dx.doi.org/10.33407/itlt.v103i5.5647.

Full text

Abstract:

Роботу присвячено актуальному питанню – гейміфікації навчального процесу школярів молодших класів із застосуванням цифрових мобільних пристроїв на прикладі розробки симулятора навчального тренажера для контролю техніки та швидкості читання. Практична новизна дослідження полягає в можливості оптимізувати процес контролю техніки читання дітей з різними вадами мовлення, а саме – ліспінгом, ротацизмом та дислалією, що є невирішеною проблемою для існуючих комп’ютерних лінгвістичних моделей. Практична значущість полягає в тому, що використання інформаційно-комунікаційних технологій та гейміфікація п

APA, Harvard, Vancouver, ISO, and other styles

27

B , Dr Uma. "Voice Based Email System for the Visually Challenged." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 03 (2025): 1–9. https://doi.org/10.55041/ijsrem40594.

Full text

Abstract:

- In today's digital world, email is a crucial form of communication, but it can be difficult for those who are blind or visually impaired to access and use it. The idea of a voice-based email system created exclusively for those with visual impairments is presented in this abstract. The objective is to provide a welcoming atmosphere that enables people who are blind or visually impaired to use computers on their own to send and receive emails. The suggested solution has cutting-edge components designed with visually impaired users in mind. By integrating a screen reader, users may convert tex

APA, Harvard, Vancouver, ISO, and other styles

28

Bhardwaj,, Deepanshu. "Translation of English Videos to Indian Regional Languages." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem30431.

Full text

Abstract:

In spite of many languages being spoken in India, it is difficult for the people to understand Indian regional languages like English, Gujrati, Kannada, Tamil, Telugu, Punjabi, Malayalam, etc. The recognition and synthesis of speech are prominent emerging technologies in natural language processing and communication domains. This paper aims to leverage the open-source applications of these technologies, machine translation, text-to-speech system (TTS), and speech-to-text system (STT) to convert available online resources to Indian languages. This application takes an English language video as

APA, Harvard, Vancouver, ISO, and other styles

29

Husaima Mailanchy T K, Suha Narghees A S, Taniya T S, Vishnu M S, and Dr. L.C. Manikandan. "A Review on Technologies for Group-Aware Malayalam Conversational AI." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 11, no. 1 (2025): 2037–44. https://doi.org/10.32628/cseit2511125.

Full text

Abstract:

This review paper explores the foundational technologies required to develop a group-aware Conversational AI for the Malayalam-English bilingual community. The objective of the project is to create an AI system capable of interacting naturally in group settings, dynamically recognizing and responding to multiple speakers in real-time. The key components of this system include voice separation, which isolates individual speakers’ voices in noisy environments, speech-to-text (STT), which accurately transcribes Malayalam speech that may contain English phrases, and text-to-speech (TTS), which syn

APA, Harvard, Vancouver, ISO, and other styles

30

Choi, Young-Sang, and S. Abdieva. "A STUDY ON THE SENTIMENT ANALYSIS (POSITIVE, NEGATIVE) OF WORDS APPEARING IN KYRGYZ NEWS BY APPLYING THE DEEP LEARNING-BASED NLP (NATURAL LANGUAGE PROCESSING) TECHNIQUES FOR STUDENTS PRACTICE." Herald of KSUCTA, №3, 2021, no. 3-2021 (September 27, 2021): 372–80. http://dx.doi.org/10.35803/1694-5298.2021.3.372-380.

Full text

Abstract:

This study is theoretical on the sentiment analysis field of deep learning-based natural language processing, which is the world's advanced technology, namely data collection and preprocessing stage, tokenizing stage, Sentiment Dictionary construction stage, positive and negative word extraction stage through sentiment analysis, deep learning introduces major contents and related technologies such as model configuration, execution stage, and data visualization stage. In addition, speech processing technology performed in the data collection stage, STT (Speech to Text) and TTS (Text to Speech)

APA, Harvard, Vancouver, ISO, and other styles

31

Vera, Diego, and Ángel Espezua. "STT: Un sistema de apoyo a la transcripción de audiencias fiscales usando Vosk." Revista de investigación de Sistemas e Informática 14, no. 1 (2021): 83–88. http://dx.doi.org/10.15381/risi.v14i1.21864.

Full text

Abstract:

Las audiencias son de suma importancia dentro del sistema penal peruano y la información que es tratada aquí es importante para la resolución de un caso. Muchas veces se requiere de esta información a corto plazo, pero la transcripción manual de estas audiencias puede llevar bastante tiempo debido a que estas son de muchas horas de duración. En la actualidad existen modelos de transcripción pre entrenados que pueden realizar este trabajo en unos minutos, con esto ahorrar mucho tiempo y tener la información requerida casi al instante, pero no están implementados en un sistema de libre uso ni es

APA, Harvard, Vancouver, ISO, and other styles

32

S.Oviyan, V. Jaya Prakash, S. Praveen Kumar, R.Vishva, and V. Thiruppathy Kesavan. "AI Powered Chatbot for College Information and Student Support." Asian Journal of Research in Computer Science 18, no. 6 (2025): 67–78. https://doi.org/10.9734/ajrcos/2025/v18i6680.

Full text

Abstract:

This study presents the development of an AI-powered chatbot tailored to assist college students with academic, campus-related, and personal development queries. Designed as a student-support software prototype, the system integrates natural language processing (NLP), text-to-speech (TTS), speech-to-text (STT), and machine learning models to deliver real-time information and interactive responses. Key features include course recommendations, timetable access, placement updates, NPTEL/hackathon alerts, library book availability, and image-based building identification. The chatbot also supports

APA, Harvard, Vancouver, ISO, and other styles

33

D, Nandhini. "SIGNBRIDGE." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem46610.

Full text

Abstract:

ABSTRACT- AI-Powered Sign Language & Lip Reading Translator is an AI-driven platform designed to bridge communication barriers for individuals with hearing and speech impairments by converting sign language into text/audio and utilizing lip-reading- based speech recognition for enhanced accessibility. By integrating deep learning, computer vision, and NLP, it ensures real-time, highly accurate communication. The platform features AI-Powered Sign Language Conversion to recognize and translate hand gestures and a Lip Reading Translator to convert lip movements into text/audio. Additionally,

APA, Harvard, Vancouver, ISO, and other styles

34

Al Noman, Md Abdullah. "ENGINEERING TERMS TALKING DICTIONARY." International Journal of Innovative Engineering 01, no. 01 (2023): 23–28. https://doi.org/10.60044/ijie.v1i1.9.

Full text

Abstract:

The present study introduces a practical tool for translating diverse technical terminology in the field of engineering. The device has the capability to comprehend the requirements of its users and, if the relevant information is available within its database, it provides an explanation of the assigned task. Furthermore, in the event that the response requiring a reply is not present within the database, the dictionary prompts the user to indicate whether they wish to document the response. Consequently, the dictionary's database has the potential to be enhanced. The Raspberry Pi has been uti

APA, Harvard, Vancouver, ISO, and other styles

35

Meenakshi, M. Maragadhavalli. "Leveraging AI Technologies for Personalized Learning Support in Dyslexic Students." International Journal for Research in Applied Science and Engineering Technology 12, no. 12 (2024): 2192–97. https://doi.org/10.22214/ijraset.2024.65734.

Full text

Abstract:

Dyslexia, a prevalent learning difficulty affecting reading, writing, and spelling, requires specialized interventionsthat traditional educational systems often lack. This paper proposes an AI-assisted learning platform aimed at addressing the diverse needs of dyslexic students. The system first tests for dyslexia, classifying students into three zones—low, moderate, and high— based on severity. Using Natural language processing (NLP), Machine learning, speech-to-text (STT), and Text-to-speech (TTS) these are some of the AI technologies, the platform provides personalized learning practices ta

APA, Harvard, Vancouver, ISO, and other styles

36

Yadav, Hemant, and Rajiv Ratn Shah. "Mask-Net: Learning Context Aware Invariant Features Using Adversarial Forgetting (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (2023): 16374–75. http://dx.doi.org/10.1609/aaai.v37i13.27047.

Full text

Abstract:

Training a robust system, e.g., Speech to Text (STT), requires large datasets. Variability present in the dataset, such as unwanted nuances and biases, is the reason for the need for large datasets to learn general representations. In this work, we propose a novel approach to induce invariance using adversarial forgetting (AF). Our initial experiments on learning invariant features such as accent on the STT task achieve better generalizations in terms of word error rate (WER) compared to traditional models. We observe an absolute improvement of 2.2% and 1.3% on out-of-distribution and in-distr

APA, Harvard, Vancouver, ISO, and other styles

37

Antony, Alka, Fariz Shanavas, Jovlin Elsa Ninan, Rithunath V Peter, and Angel Thankam Thomas. "Empowering the Visually Impaired: A Voice-Based Email Management System." International Journal of Advances in Engineering and Management 06, no. 12 (2024): 66–70. https://doi.org/10.35629/5252-06126670.

Full text

Abstract:

The research focuses on developing a voice-based email system to assist visually impaired users in sending and receiving emails independently. Existing systems, like screen readers, pose challenges when it comes to attaching files or navigating complex email interfaces. To address these issues, the proposed solution leverages Speech-to-Text (STT) and Text- toSpeech (TTS) technologies, along with Microsoft Speech Recognition, to allow users to manage emails and attachments through voice commands. This system simplifies tasks such as file searches, email composition, and sending attachments, red

APA, Harvard, Vancouver, ISO, and other styles

38

Sabour, Adel, Abdeltawab Hendawi, and Mohamed Ali. "Arabic Diacritic-Aware Text-Audio Segmentation and Alignment Model (DASAM)." Elkawnie 10, no. 1 (2024): 1. http://dx.doi.org/10.22373/ekw.v10i1.23637.

Full text

Abstract:

Abstract: This paper introduces the Diacritic-Aware Segmentation and Alignment Model for Arabic (DASAM). Diacritics are vital for pronunciation and meaning in the Arabic language but are often ignored by current speech recognition systems. DASAM is designed for word-level segmentation and alignment in unseen audio and associating them with diacritic-marked Arabic text. The DASAM approach uses linguistic analysis based on intonation rules. DASAM then applies Dynamic Time Warping (DTW) to match the reference audio word with its position in the unseen sentence audio. The model outputs a list of w

APA, Harvard, Vancouver, ISO, and other styles

39

Pattenshatti, Ms Jaya. "Smart Email System for Visually Impaired." International Journal for Research in Applied Science and Engineering Technology 13, no. 5 (2025): 4062–69. https://doi.org/10.22214/ijraset.2025.71170.

Full text

Abstract:

In an increasingly digital world, email remains a pivotal medium for communication; however, its accessibility for visually impaired individuals continues to pose significant challenges. This presents a unified framework for a Voice-Based Email System tailored specifically for users with visual disabilities, integrating insights from recent advancements in speech recognition, natural language processing (NLP), and human-computer interaction. The proposed system aims to deliver a seamless, hands-free email experience by employing Speech-to-Text (STT) and Text-to-Speech (TTS) technologies, along

APA, Harvard, Vancouver, ISO, and other styles

40

Wahyutama, Aria Bisma, and Mintae Hwang. "Auto-Scoring Feature Based on Sentence Transformer Similarity Check with Korean Sentences Spoken by Foreigners." Applied Sciences 13, no. 1 (2022): 373. http://dx.doi.org/10.3390/app13010373.

Full text

Abstract:

This paper contains the development of a training service for foreigners to help them increase their ability to speak Korean. The service developed in this paper is implemented in the form of a mobile application that shows specific Korean sentences to the user for them to record themselves speaking the sentence. The objective is to generate the score automatically based on how similar the recorded voice with the actual sentence using Speech-To-Text (STT) engines and Sentence Transformers. The application is developed by selecting the four most commonly known STT engines with similar features,

APA, Harvard, Vancouver, ISO, and other styles

41

Isaac, Samson, Khalid Haruna, Muhammad Aminu Ahmad, and Rabi Mustapha. "DEEP REINFORCEMENT LEARNING WITH HIDDEN MARKOV MODEL FOR SPEECH RECOGNITION." JOURNAL OF TECHNOLOGY & INNOVATION 3, no. 1 (2023): 01–05. http://dx.doi.org/10.26480/jtin.01.2023.01.05.

Full text

Abstract:

Nowadays, many applications uses speech recognition especially the field of computer science and electronics, Speech Recognition (SR) is the interpretation of words spoken into a text. It is also known as Speech-To-Text (STT) or Automatic-Speech-Recognition(ASR), or just Word-Recognition(WR). The Hidden-Markov-Model (HMM) is a type of Markov model, which means that the future state of the model depends on the current state, not on the entire history of the system and the goal of HMM is to learn a sequence of hidden states from a set of known states. The Long-Short-Time-Memory (LSTM) network is

APA, Harvard, Vancouver, ISO, and other styles

42

Rane, Kirti, Tanaya Bagwe,, Shruti Chaudhari, Ankita Kale, and Gayatri Deore. "Enhancing En-X Translation: A Chrome Extension-Based Approach to Indic Language Models." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 03 (2025): 1–9. https://doi.org/10.55041/ijsrem42782.

Full text

Abstract:

Language translation is the lifeblood of any communication that crosses linguistic boundaries. Recent trends in the domain of neural machine translation (NMT) are already superior to the old traditions. In such circumstances, the works done by Prahwini et al. (2024) and Vandan Mujadia et al. (2024) highlight the application of NMT for resource-constrained Indian languages. In view of many challenges like parallel corpus scarcity, we present a real-time adaptable translation model that works on the Fairseq framework. It provides high-accuracy translations for Assamese, Gujarati, Kannada, Bengal

APA, Harvard, Vancouver, ISO, and other styles

43

Kim, Jeung Deok. "The Effects of Reading Aloud on Korean University EFL Learners’ Pronunciation Accuracy and Learning Perceptions : A Speech-to-Text (STT)-Based Analysis." Journal of Language Sciences 32, no. 2 (2025): 1–28. https://doi.org/10.14384/kals.2025.32.2.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Singh, Shailendra. "Model for Converting PDF to Audio Format (Listen Your Book)." International Journal for Research in Applied Science and Engineering Technology 9, no. VII (2021): 3203–6. http://dx.doi.org/10.22214/ijraset.2021.36522.

Full text

Abstract:

The present paper has introduced an innovative and efficient technique that enables user to hear the contents of text images instead of reading through them. In the current world, there is a great increase in the utilization of digital technology and multiple methods are available for the people to capture images. such images may contain important textual content that the user may need to edit or store digitally. It merges the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS). This can be done using Optical Character Recognition with the use of Tesseract OCR E

APA, Harvard, Vancouver, ISO, and other styles

45

Barkovska, Olesia, Vladyslav Kholiev, and Vladyslav Lytvynenko. "STUDY OF NOISE REDUCTION METHODS IN THE SOUND SEQUENCE WHEN SOLVING THE SPEECH-TO-TEXT PROBLEM." Advanced Information Systems 6, no. 1 (2022): 48–54. http://dx.doi.org/10.20998/2522-9052.2022.1.08.

Full text

Abstract:

The subject of this research is noise reduction methods in the sound sequence as a part of the proposed speech-to-text (STT) module for converting a verbal lecture or a lesson into a written text form on digital educational platforms. The goal is to investigate the influence of noise reduction methods on the operation of the acoustic signal recognition system. 3 methods of noise reduction were considered for integration in the proposed acoustic artifact recognition system and for the researching: spectral subtraction method; fast Fourier transform; Wiener filter with software modeling of every

APA, Harvard, Vancouver, ISO, and other styles

46

V. Krishnam Raju, K., and V. N. S. Manaswini. "Analyzing Call Data Through Live Calls Using Sphinx Tool." International Journal of Engineering & Technology 7, no. 3.31 (2018): 93. http://dx.doi.org/10.14419/ijet.v7i3.31.18273.

Full text

Abstract:

For Improving the Business growth, the Business people try to know the customer's intension about their products. One of the best methods of collecting customer's feedback is telephone or mobile survey where customer service representatives(CSR) can interact with customers through phone calls and also record to analyze the customer's call data. The main issue of call data analysis through recorded files is a large amount of storage is required to store the audio files. This results increased costs, maintaining the hardware and software systems and manage a database system. In this paper we can

APA, Harvard, Vancouver, ISO, and other styles

47

Son, Jeong, and Lee. "An Audification and Visualization System (AVS) of an Autonomous Vehicle for Blind and Deaf People Based on Deep Learning." Sensors 19, no. 22 (2019): 5035. http://dx.doi.org/10.3390/s19225035.

Full text

Abstract:

When blind and deaf people are passengers in fully autonomous vehicles, an intuitive and accurate visualization screen should be provided for the deaf, and an audification system with speech-to-text (STT) and text-to-speech (TTS) functions should be provided for the blind. However, these systems cannot know the fault self-diagnosis information and the instrument cluster information that indicates the current state of the vehicle when driving. This paper proposes an audification and visualization system (AVS) of an autonomous vehicle for blind and deaf people based on deep learning to solve thi

APA, Harvard, Vancouver, ISO, and other styles

48

Uriah Sampaga, Andrea Louise J. Toledo, Mikayla Assyria L. Dela Peret, Luisito M. Genodiala, Sheika Rania D. Aguilar, and Gellie Anne M. Antoja. "Real-Time Vision-Based Sign Language Bilateral Communication Device for Signers and Non-Signers using Convolutional Neural Network." World Journal of Advanced Research and Reviews 18, no. 3 (2023): 934–43. http://dx.doi.org/10.30574/wjarr.2023.18.3.1169.

Full text

Abstract:

The use of sign language is an important means of communication for individuals with hearing and speech impairments, but communication barriers can still arise due to differences in grammatical rules across different sign languages. In an effort to address these barriers, this study aimed to develop a real-time two-way communication device that uses image processing and recognition systems to translate two-handed Filipino Sign Language (FSL) gestures and facial expressions into speech; the system can recognize gestures that correspond to specific words and phrases. Specifically, the researcher

APA, Harvard, Vancouver, ISO, and other styles

49

Uriah, Sampaga, Louise J. Toledo Andrea, Assyria L. Dela Peret Mikayla, et al. "Real-Time Vision-Based Sign Language Bilateral Communication Device for Signers and Non-Signers using Convolutional Neural Network." World Journal of Advanced Research and Reviews 18, no. 3 (2023): 934–43. https://doi.org/10.5281/zenodo.8434776.

Full text

Abstract:

The use of sign language is an important means of communication for individuals with hearing and speech impairments, but communication barriers can still arise due to differences in grammatical rules across different sign languages. In an effort to address these barriers, this study aimed to develop a real-time two-way communication device that uses image processing and recognition systems to translate two-handed Filipino Sign Language (FSL) gestures and facial expressions into speech; the system can recognize gestures that correspond to specific words and phrases. Specifically, the researcher

APA, Harvard, Vancouver, ISO, and other styles

50

Baburaj, Karthik, Navaneeth Kattil Madathil, and Roshini Barkur. "NLP Based Voice Assistant Usage on Consumer Shopping." Scientific Temper 16, Spl-2 (2025): 46–50. https://doi.org/10.58414/scientifictemper.2025.16.spl-2.08.

Full text

Abstract:

In the digital age, convenience and accessibility have become paramount considerations for users at a personal level, particularly for those with diverse needs. The objective of this study is to introduce a Voice-Based Food Ordering Application designed to revolutionize the food ordering experience for tech-savvy individuals, with a special focus on differently abled and visually impaired customers. Leveraging cutting-edge technologies, such as Natural Language Processing (NLP), Speech Recognition, Google APIs, Text-to-Speech (TTS), and speech-to-text (STT), the application enables users to se

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!