To see the other types of publications on this topic, follow the link: Google Translate's text-to-speech API.

Journal articles on the topic 'Google Translate's text-to-speech API'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Google Translate's text-to-speech API.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Varuzhan, Harutyun Baghdasaryan. "Clean and Noisy Datasets Generation for DeepSpeech Open-Source Speech-To-Text Engine Based on Google Translate API." Journal of Scientific and Engineering Research 8, no. 2 (2021): 23–25. https://doi.org/10.5281/zenodo.10574882.

Full text
Abstract:
<strong>Abstract</strong> Speech-to-text engines use both clean and noisy datasets to train models for best performance. But for some languages (for example, Armenian language) there is no enough data for training. The purpose of this article is to design a tool that can generate both clean and noisy(additive white Gaussian noise(AWGN) and real-world noise(RWN)) datasets for DeepSpeech speech-to-text engine using Google Translate's text-to-speech API feature that can convert text to normal and slow speech.
APA, Harvard, Vancouver, ISO, and other styles
2

Wydyanto. "Converting Image Text to Speech Using Raspberry Pi." Journal of Information Systems Engineering and Management 10, no. 34s (2025): 651–58. https://doi.org/10.52783/jisem.v10i34s.5860.

Full text
Abstract:
This project aims to develop a system that can convert image text into speech using Raspberry Pi. This system will use optical character recognition (OCR) technology to extract text from images, which will then be converted into speech using text-to-speech software. (TTS). (TTS). This will provide a valuable tool for individuals with visual impairments or those who have difficulty reading text in images. This research explores a Raspberry Pi-based device that can translate English into 53 dialects, using a camera module, OCR motor, Google Speech API, and Microsoft Translator. This feature is accessible to visually impaired individuals and those who do not speak English. Raspberry Pi, with the Tesseract OCR engine, Google Voice API, Microsoft Translator, and camera board, enables real-time text translation on images, making it more accessible and inclusive for users with visual impairments or language barriers.
APA, Harvard, Vancouver, ISO, and other styles
3

Singh, Mr Akarsh. "AI Tool/Mobile App for Indian Sign Language (ISL)." International Journal for Research in Applied Science and Engineering Technology 13, no. 5 (2025): 2819–24. https://doi.org/10.22214/ijraset.2025.71056.

Full text
Abstract:
Abstract: This project proposes an AI-powered Sign Language Generator for Audio-Visual Content in English/Hindi that leverages cutting-edge technologies to bridge this communication gap. The system captures spoken language using advanced speech recognition techniques provided by the Google Speech Recognition API, transcribing speech into text with high accuracy. When inputs are in Hindi, the system employs the Google Translate API to convert the text into English, ensuring a standardized vocabulary that maps to ISL gestures.
APA, Harvard, Vancouver, ISO, and other styles
4

Nenny, Anggraini, Kurniawan Angga, Kesuma Wardhani Luh, and Hakiem Nashrul. "Speech Recognition Application for the Speech Impaired using the Android-based Google Cloud Speech API." TELKOMNIKA Telecommunication, Computing, Electronics and Control 16, no. 6 (2018): 2733–39. https://doi.org/10.12928/TELKOMNIKA.v16i6.9638.

Full text
Abstract:
Those who are speech impaired (tunawicara in the Indonesian language) suffer from abnormalities in their delivery (articulation) of the language as well their voice in normal speech, resulting in difficulty in communicating verbally within their environment. Therefore, an application is required that can help and facilitate conversations for communication. In this research, the authors have developed a speech recognition application that can recognise speech of the speech impaired, and can translate into text form with input in the form of sound detected on a smartphone. By using the Google Cloud Speech Application Programming Interface (API), this allows converting audio to text, and it is also user-friendly to use such APIs. The Google Cloud Speech API integrates with Google Cloud Storage for data storage. Although research into speech recognition to text has been widely practiced, this research try to develop speech recognition, specially for speech impaired&#39;s speech, as well as perform a likelihood calculation to see the factor of tone, pronunciation, and speech speed in speech recognition. The test was conducted by mentioning the digits 1 through 10. The experimental results showed that the recognition rate for the speech impaired is about 80%, while the recognition rate for normal speech is 100%.
APA, Harvard, Vancouver, ISO, and other styles
5

Kale, Shubham, Basavaraj Kumbhar, Hitesh Patil, Karan Jaitpal, and Dr Swati Sinha. "IoT based Smart Book Reader for Visually Impaired." International Journal for Research in Applied Science and Engineering Technology 11, no. 4 (2023): 4535–39. http://dx.doi.org/10.22214/ijraset.2023.51236.

Full text
Abstract:
Abstract: This paper presents an IoT- based smart book reader for visually impaired individuals, using a Raspberry Pi computer and various software tools. The system allows users to capture images of printed material using a camera, which are then processed using Tesseract OCR software to extract text. The extracted text is then translated from English to Marathi using the Google Cloud Translation API, and converted to speech using the Google Text-to-Speech API. The system is designed to be operated using a single hardware button, making it easy and intuitive for users with visual impairments. The proposed system offers a low-cost and portable solution for visually impaired individuals to access printed material, and has the potential to improve their quality of life.
APA, Harvard, Vancouver, ISO, and other styles
6

N, KEERTHI, CHETANA P. SUTHAR, and LEKHANA E. "REAL-TIME ACCENT TRANSLATOR." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 01 (2025): 1–9. https://doi.org/10.55041/ijsrem40714.

Full text
Abstract:
This paper introduces the "Real-Time Accent Translator," a lightweight and accessible web-based application that bridges the gap between multilingual communication and accent adaptation. Built on Flask, the system integrates Google Translate and Google Text-to-Speech (gTTS) APIs to provide seamless translation and speech synthesis. Users can input text in a source language, specify the target language, and optionally adjust accents for languages such as English. The application translates the text, synthesizes speech with the desired accent, and provides an audio output in real time. The architecture is designed for simplicity, leveraging third-party APIs to ensure rapid deployment and scalability without the need for extensive computational resources. This paper discusses the technical implementation, including API integration, RESTful communication, and real-time audio generation. Additionally, the potential use cases of the system are highlighted, including cross-cultural communication, language learning, and accessibility for non-native speakers.
APA, Harvard, Vancouver, ISO, and other styles
7

B, SAMBAVI. "Developing A Software That Can Translate Resource Material and Other Texts from English to Other Indian Regional Language." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem47848.

Full text
Abstract:
ABSTRACT—The linguistic diversity of India is both a cultural asset and a major communication challenge. With 22 official languages and hundreds of dialects, the need for efficient translation tools is critical. While global solutions like Google Translate provide multilingual translation capabilities, there is still a shortage of localized, accessible, and user-friendly applications specifically for Indian languages.This work presents the Indian Language Translator, a desktop application based on Python that translates text in real-time between English and prominent Indian languages through a simple graphical user interface (GUI) developed using Tkinter and the Google Translate API. The project's motivation, system design, implementation approaches, evaluation, and its potential impacts are discussed in the research. Future work directions for offline translation, speech integration, and further language support are also explored. Keywords—Indian Languages, Language Translation, GUI Development, Machine Translation, Tkinter, Google Translate API, Python Applications, Natural Language Processing (NLP)
APA, Harvard, Vancouver, ISO, and other styles
8

S. Nikkam, Pushpalatha. "Voice To Sign Language Conversion." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem48637.

Full text
Abstract:
ABSTRACT True incapacity can be seen as the inability to speak, where individuals with speech impairments struggle to communicate verbally or through hearing. To bridge this gap, many rely on sign language, a visual method of communication that uses hand gestures. Although sign language has become more widespread, interaction between those who sign and those who don't can still pose challenges. As communication has grown to be an essential part of daily life, sign language serves as a crucial tool for those with speech and hearing difficulties. Recent advances in computer vision and deep learning have significantly enhanced the ability to recognize gestures and movements. While American Sign Language (ASL) has been thoroughly researched, Indian Sign Language (ISL) remains underexplored. Our proposed approach focuses on recognizing 4972 static hand gestures representing 24 English alphabets (excluding J and Z) in ISL. The project aims to build a deep learning-based system that translates these gestures into text using the "Google Text-to-Speech" API, thereby enabling better interaction between signers and non-signers. Using a dataset from Kaggle and a custom Convolutional Neural Network (CNN), our method achieved a 99% accuracy rate. Key Words: Convolutional Neural Network; Google text to speech API; Indian signing.
APA, Harvard, Vancouver, ISO, and other styles
9

Gopal D. Upadhye. "Multilingual Language Translator Using ML." Advances in Nonlinear Variational Inequalities 28, no. 2s (2024): 37–46. https://doi.org/10.52783/anvi.v28.2512.

Full text
Abstract:
This research paper explores the design and development of a Python-based multilingual translation application that leverages various libraries for a robust, user-friendly experience. The application integrates multiple functions such as text translation, speech-to-text, text-to-speech, and PDF text extraction, using libraries like Pyttsx3, PyPDF2, Speech Recognition, Tkinter, and the Google Translate API. The system allows for real-time translations, enhancing communication across different languages and improving accessibility. It enables users to convert PDF content into translated text and provides voice-based input for ease of use, particularly for users with physical limitations. The application’s performance in translation accuracy, speech recognition, and ease of use has been thoroughly tested, yielding positive user feedback. Furthermore, the modular design of the system allows for easy scalability and adaptability for future improvements, such as integrating more languages and enhancing voice recognition. This project demonstrates the effective use of Python’s rich library ecosystem in creating a comprehensive tool to bridge language barriers in various personal, academic, and professional contexts.
APA, Harvard, Vancouver, ISO, and other styles
10

Shabrina, Marwati Maryam, Arik Aranta, and Budi Irmawati. "RANCANG BANGUN ALGORITMA KONVERSI SUARA BERBAHASA INDONESIA MENJADI TEKS LATIN BERBAHASA SASAK MENGGUNAKAN METODE DICTIONARY BASED." Jurnal Teknologi Informasi, Komputer, dan Aplikasinya (JTIKA ) 6, no. 1 (2024): 364–75. http://dx.doi.org/10.29303/jtika.v6i1.371.

Full text
Abstract:
As time goes by, the use of the Sasak language among the people of Lombok is decreasing. In fact, the Sasak language is the identity of the island of Lombok which needs to be preserved as a heritage for the younger generation. The increasingly rapid development of technology has encouraged the emergence of innovation in creating various inventions that can facilitate human activities. One innovation that can be developed is speech to text technology. This technology can recognize human voices and then convert them into text. This is of interest to the author in designing a system that implements Google’s speech to text API to translate Indonesian words or sentences into Sasak. The translation from Indonesian to Sasak was carried out by applying a dictionary based system to produce an appropriate translation. The testing process was carried out by translating 25 sentences taken from the Sasak-Indonesian Dictionary and consisting of 117 words. In this research, there were two stages of testing carried out. The first test was carried out to determine the accuracy of the results of the Indonesian translation into Sasak using the dictionary based method. The second test was carried out to determine the accuracy of the Google Speech API in recognizing voice input and then converting it into text. From the first test, the system accuracy results in translating Indonesian to Sasak using the dictionary based method were 100% and the error rate was 0%. Meanwhile, from the second test, the results showed that the system could implement the Google Speech API to translate Indonesian words or sentences into Sasak with an accuracy of 99.14%.
APA, Harvard, Vancouver, ISO, and other styles
11

Khete, Tushar, and Aditya Bakshi. "Autonomous Assistance System for Visually Impaired using Tesseract OCR & gTTS." Journal of Physics: Conference Series 2327, no. 1 (2022): 012065. http://dx.doi.org/10.1088/1742-6596/2327/1/012065.

Full text
Abstract:
Abstract OCR makes machines to recognize text automatically. It not only saves retrieval times but also helps in digitalization of important documents. In this paper different techniques are used for converting textual matter from an paper document or an image into machine readable form (OCR text output) and then into audio output using gTTS (Google Text-to Speech, a Python library and CLI tool to interface with Google Translates text-to-speech API). In proposed model, the content writes spoken mp3 data to a file, a file-like object (byte string) for further audio manipulation with flexible pre-processing and tokenizing. The approach results into a graspable and comprehensible text output &amp; then fed into audio output for inference with 99 % accuracy. This paper outlines the stages of development, the major challenges with some interesting result comparison with different state of approaches.
APA, Harvard, Vancouver, ISO, and other styles
12

Yip, Kang Qin, Pey Yun Goh, and Lee Ying Chong. "Social Messaging Application with Translation and Speech-to-Text Transformation." Journal of Informatics and Web Engineering 3, no. 2 (2024): 169–87. http://dx.doi.org/10.33093/jiwe.2023.3.2.13.

Full text
Abstract:
Unlike traditional SMS or MMS, messaging apps offer a broader range of data transmission capabilities. The application utilizes a WIFI or internet connection and enables users to exchange information through various means such as text, voice, and multimedia files. However, popular messaging applications such as WeChat, Telegram, and WhatsApp have limitations in language translation and file uploading size. Thus, this project aims to address these limitations by developing a social messaging application that serves as a comprehensive communication tool. The application will facilitate both written and verbal communication by providing translation services for various languages, including voice messages. The proposed application intends to act as a versatile platform, functioning as a translator while enabling seamless communication between users in different languages. Translation accuracy and BLEU metric are applied to evaluate the efficacy of the enhanced social messaging application. The proposed application is able to translate voice and written messages into another language with the help of Google translation API as well as Speech to text API. The BLEU average score between English and Malay is 0.94 but the translation between Malay and Chinese is 0.82, Chinese and English is 0.70. Though not perfect, the proposed application can enhance the current social messaging application with the speech-to-text feature and message translation feature. Last but not least, a concluding remark is provided to further improve the application in future.
APA, Harvard, Vancouver, ISO, and other styles
13

Albertus, Michael, and Muliady Muliady. "Pengaturan Fan Speed dan Suhu Air Conditioner Melalui Ucapan Dengan Layanan Google Assistant API." TESLA: Jurnal Teknik Elektro 21, no. 2 (2020): 170. http://dx.doi.org/10.24912/tesla.v21i2.7189.

Full text
Abstract:
Air conditioner control using speech recognition is made to help people with disabilities that unable to operate remote control physically but have verbal abilities. Verbal control allows disabled people adjust temperatures and fan speed. Speech received by Respeaker 2-mics Pi HAT module, converted into wav, processed by Google Assistant API with Natural Language Processing algorithm by categorizing words into their types, subject, predicate, object, and description to facilitate Google Assistant API. Words matched with command in command text database Raspberry Pi 3 to enable local commands modulates signal in form of space-coded signal on GPIO, transmitted through infrared transmitter to control the Air Conditioner. Infrared database obtained by receiving infrared signal through infrared receiver that have been coded by LIRC into pulse space, calling function is created, compared in command text database.The infrared light distance from the infrared transmitter can be sent to air conditioner up to 600 cm with β NPN 2N2222A transistor worth 257, Resistor base value is 1500 Ohm, and Resistor collector value is 6.2 Ohm. Speech to text experiments with background sound intensity 35-40 dB, respondent’s sound intensity 50-70dB, and the respondent’s distance to the microphone 40-50 cm. System can recognize respondent’s speeches with success rate above 50%. The word “High” in fan speed speech experiments cannot be detected by the system, so it is necessary to add other word to be recognized. The system can receive Google Translate speech and only got one failure.ABSTRAK:Perangkat elektronik air conditioner dengan pengendalian pengenalan ucapan untuk membantu penyandang tunadaksa yang tidak mampu mengendalikan remote secara fisik tetapi memiliki kemampuan verbal. Pengendalian dengan cara verbal memungkinkan penyandang tunadaksa untuk mengubah suhu dan mengatur fan speed air conditioner. Ucapan diterima modul Respeaker 2-mics Pi HAT dikonversi menjadi format wav kemudian diolah oleh Google Assistant API dengan algoritma Natural Language Processing yaitu mengategorikan kata menjadi jenisnya, subjek, predikat, objek, dan keterangannya untuk mempermudah pencarian pada kamus Google Assistant API. Kata tersebut dibandingkan dengan perintah ucapan pada commandtextdatabaseRaspberry Pi 3 yang mengaktifkan local command dan memodulasi sinyal space-coded signal pada GPIO, ditransmisikan melalui infrared transmitter untuk mengatur air conditioner. Infrareddatabase diperoleh melalui penerimaan cahaya infrared dan dikodekan menjadi pulse space oleh software LIRC menjadi fungsi pemanggilan, dipasangkan dengan perintah ucapan commandtextdatabase. Jarak cahaya infrared dari infrared transmitter dapat dikirimkan ke air conditioner hingga sejauh 600 cm dengan transistor NPN 2N2222A bernilai 257, nilai Resistor base sebesar 1500 Ohm, dan nilai Resistor collector sebesar 6,2 Ohm. Uji cobaspeech to text dengan kondisi intensitas background sound 35-40 dB, intensitas suara responden 50-70 dB, dan jarak responden ke microphone 40-50 cm. Sistem yang direalisasi mampu mengenali ucapan yang diberikan responden dengan keberhasilan di atas 50%. Ucapan “High” pada pengujian ucapan fan speed tidak dapat dideteksi oleh sistem, oleh karena itu perlu ditambahkan ucapan suhu agar ucapan dikenal. Sistem mampu menerima ucapan Google Translate dan hanya mendapatkan satu kali kegagalan deteksi ucapan
APA, Harvard, Vancouver, ISO, and other styles
14

Polepaka, Sanjeeva, Varikuppala Prashanth Kumar, S. Umesh Chandra, Hema Nagendra Sri Krishna, and Gaurav Thakur. "Automated Caption Generation for Video Call with Language Translation." E3S Web of Conferences 430 (2023): 01025. http://dx.doi.org/10.1051/e3sconf/202343001025.

Full text
Abstract:
In the modern era, virtual communication between individuals is common. Many people’s lives have been made simpler in a number of circumstances by providing subtitles, generating automated captions for social media videos, and language translation from a source language to a targeted language. Both are included, which offers face-to-face translated captions during video conversations. React is used for application development. To send the data, socket programming is utilized. Context is understood and translated using Google translate API and speech recognition modules. With OpenAI and Whisper, captions are generated. This paper will directly create the translated user voice rather than translating the text and creating subtitles.
APA, Harvard, Vancouver, ISO, and other styles
15

IKANI, Lucy Hassana. "Text to Speech Synthesis System in Yoruba Language." International Journal of Advances in Scientific Research and Engineering (ijasre) 5, no. 10 (2019): 180–91. https://doi.org/10.31695/IJASRE.2019.33568.

Full text
Abstract:
<em>Previous and existing studies convert texts to speech (English Language) without consideration for indigenous languages. This project work is an effort to design a speech extension that can convert the highlighted text displayed by a web browser to speech considering the indigenous Yoruba language through a designed software tool specifically attached to Mozilla internet as a plug-in. This software tool allows the user to highlight the text on the web page with the mouse. The selected text provides an input to the speech program that runs in the back end. The objective of this project work is to design a program that will help people with weak eyesight, those that lack in pronunciation skills and those learning the Yoruba language as an additional language skill; it is also designed to solve problems that exist when using the manual system which include difficulty in reading and misinterpretation of information. The methodology used to convert the text to speech are the API (Google-translate, speech&rsquo;.org and ttsyoruba.com) and the programming language used for the web is JavaScript. A text to speech extension is an application that converts the highlighted texts into spoken words by analyzing and processing the text in the Yoruba language. The use of the text to speech is to read out the highlighted text to the user in the Yoruba language which can be saved as an audio file.</em>
APA, Harvard, Vancouver, ISO, and other styles
16

Harum, Norharyati Binti, Nur’aliah Izzati M. S. K, Nurul Akmar Emran, et al. "A Development of Multi-Language Interactive Device using Artificial Intelligence Technology for Visual Impairment Person." International Journal of Interactive Mobile Technologies (iJIM) 15, no. 19 (2021): 79. http://dx.doi.org/10.3991/ijim.v15i19.24139.

Full text
Abstract:
&lt;p class="0abstract"&gt;The issue of lacking reference books in braille in most public building is crucial, especially public places like libraries, museum and others. The visual impairment or blind people is not getting the information like we normal vision do. Therefore, a multi languages reading device for visually impaired is built and designed to overcome the limitation of reference books in public places. Some research regarding current product available is done to develop a better reading device. This reading device is an improvement from previous project which only focuses on single language which is not suitable for public places. This reading device will take a picture of the book using 5MP Pi camera, Google Vision API will extract the text, and Google Translation API will detect the language and translated to desired language based on push buttons input by user. Google Text-to-Speech will convert the text to speech and the device will read out aloud in through audio output like speaker or headphones. A few testings have been made to test the functionality and accuracy of the reading device. The testings are functionality, performance test and usability test. The reading device passed most of the testing and get a score of 91.7/100 which is an excellent (A) rating&lt;strong&gt;.&lt;/strong&gt;&lt;/p&gt;
APA, Harvard, Vancouver, ISO, and other styles
17

Vasu, Kapil, Dheeraj, Chauhan Ritik, and Singh Amardeep. "Real-Time Language Translator with Sign Language Recognition: A Multi-Modal Approach." International Journal of Innovative Science and Research Technology (IJISRT) 10, no. 2 (2025): 567–70. https://doi.org/10.5281/zenodo.14921228.

Full text
Abstract:
This research explores the development of a real-time language translation system integrating speech recognition, text-based translation, and sign language recognition. The system employs Google Translate API for multilingual translation, MediaPipe Hands for sign language recognition, and SpeechRecognition for real-time voice input. The study aims to bridge communication gaps between spoken, written, and signed languages. The paper presents implementation details, experimental results, and future scope for improvement. Findings indicate promising accuracy in sign recognition and speech translation, highlighting the potential for real-world application in accessibility and communication enhancement.
APA, Harvard, Vancouver, ISO, and other styles
18

Khairani, Dewi, Tabah Rosyadi, Arini Arini, Imam Luthfi Rahmatullah, and Fauzan Farhan Antoro. "Enhancing Speech-to-Text and Translation Capabilities for Developing Arabic Learning Games: Integration of Whisper OpenAI Model and Google API Translate." JURNAL TEKNIK INFORMATIKA 17, no. 2 (2024): 203–12. http://dx.doi.org/10.15408/jti.v17i2.41240.

Full text
Abstract:
This study tackles language barriers in computer-mediated communication by developing an application that integrates OpenAI’s Whisper ASR model and Google Translate machine translation to enable real-time, continuous speech transcription and translation and the processing of video and audio files. The application was developed using the Experimental method, incorporating standards for testing and evaluation. The integration expanded language coverage to 133 languages and improved translation accuracy. Efficiency was enhanced through the use of greedy parameters and the Faster Whisper model. Usability evaluations, based on questionnaires, revealed that the application is efficient, effective, and user-friendly, though minor issues in user satisfaction were noted. Overall, the Speech Translate application shows potential in facilitating transcription and translation for video content, especially for language learners and individuals with disabilities. Additionally, this study introduces an Arabic learning game incorporating an Artificial Neural Network using the CNN algorithm. Focusing on the “Speaking” skill, the game applies to voice and image extraction techniques, achieving a high accuracy rate of 95.52%. This game offers an engaging and interactive method for learning Arabic, a language often considered challenging. The incorporation of Artificial Neural Network technology enhances the effectiveness of the learning game, providing users with a unique and innovative language learning experience. By combining voice and image extraction techniques, the game offers a comprehensive approach to enjoyably improving Arabic speaking skills.
APA, Harvard, Vancouver, ISO, and other styles
19

Nurakhov, Y. S., and A. E. Kami. "INFORMATION SYSTEM FOR PEOPLE WITH HEARING IMPAIRMENT." PHYSICO-MATHEMATICAL SERIES 335, no. 1 (2021): 54–59. http://dx.doi.org/10.32014/2021.2224-5294.8.

Full text
Abstract:
The article presents the development of an information system for recognizing voice into text for people with hearing impairments, which makes it possible to improve the quality of life and interaction in society with other people. The device, software, functional blocks and subsystems of the information system are described. Examples of possible application and placement of the system in various spheres of public life are given. One of the types of implementation of the voice recognition information system is described. The development and creation of prototypes of a device for people with hearing impairments is considered. In the course of the research, the Google Speech Api technology was selected for speech recognition. In addition, this article presents a software and hardware complex that allows you to translate speech into text and then display it on the screen. Arduino UNO-based devices were chosen to achieve the goal. All information is processed on the smartphone of people with hearing impairments, which is sent to the device via Bluetooth with Arduino.
APA, Harvard, Vancouver, ISO, and other styles
20

Nurakhov, Y. S., and A. E. Kami. "INFORMATION SYSTEM FOR PEOPLE WITH HEARING IMPAIRMENT." PHYSICO-MATHEMATICAL SERIES 335, no. 1 (2021): 54–59. http://dx.doi.org/10.32014/2021.2518-1726.8.

Full text
Abstract:
The article presents the development of an information system for recognizing voice into text for people with hearing impairments, which makes it possible to improve the quality of life and interaction in society with other people. The device, software, functional blocks and subsystems of the information system are described. Examples of possible application and placement of the system in various spheres of public life are given. One of the types of implementation of the voice recognition information system is described. The development and creation of prototypes of a device for people with hearing impairments is considered. In the course of the research, the Google Speech Api technology was selected for speech recognition. In addition, this article presents a software and hardware complex that allows you to translate speech into text and then display it on the screen. Arduino UNO-based devices were chosen to achieve the goal. All information is processed on the smartphone of people with hearing impairments, which is sent to the device via Bluetooth with Arduino.
APA, Harvard, Vancouver, ISO, and other styles
21

Lailatul Hasanah and Budi Yuniarto. "Early Study of LLM Implementation in Survey Interviews." Jurnal Aplikasi Statistika & Komputasi Statistik 17, no. 1 (2025): 12–22. https://doi.org/10.34123/jurnalasks.v17i1.792.

Full text
Abstract:
Introduction/Main Objectives: This research aims to conduct a preliminary study into the use of LLMs for extracting information to fill out questionnaires in survey interviews. Background Problems: BPS-Statistics Indonesia used paper-based questionnaires for interviews and is recently utilizing the Computer Assisted Personal Interviewing (CAPI) method. However, the CAPI method has some drawbacks. Enumerators must input data into the device, which can be burdensome and prone to errors. Novelty: This study uses a large language model (LLM) to extract information from survey interviews. Research Methods: This study utilizes a text-to-speech application to translate interview results into text. Translation accuracy is measured by the Word Error Rate (WER). Then the text was extracted using the ChatGPT 3.5 Turbo model. GPT-3.5 Turbo is part of the GPT family of algorithms developed by OpenAI. Finding/Results: The extraction results are formatted into a JSON file, which is intended to be used for automatic filling into the database and then evaluated using precision, recall, and F1-score. Based on research conducted by utilizing the Speech Recognition API by Google and the ChatGPT 3.5 Turbo model, an average WER of 10% was obtained in speech recognition and an average accuracy of 76.16% in automatic data extraction.
APA, Harvard, Vancouver, ISO, and other styles
22

Guda, Kavitha Reddy. "Morse code Translator Using Eye Blinks." International Journal for Research in Applied Science and Engineering Technology 10, no. 6 (2022): 2705–8. http://dx.doi.org/10.22214/ijraset.2022.44536.

Full text
Abstract:
Abstract: A Morse code Translator which translates Morse code into speech and text of any language chosen by the user, Morse code is given as input ,it can be either a recorded video or live feed of a person who is blinking eyes in a sequence .OpenCV is used to take Morse code input ,Mediapipe a google api detects face and maps facial landmarks, These landmarks are used to map various eye co-ordinates and then considered to formulate Eye Aspect ratios ,which in turn determines eye blink .Morse code which has been already loaded in the form of dictionary is going to map with the morse code given through video or cam ,and find the Alphabet assigned to it.
APA, Harvard, Vancouver, ISO, and other styles
23

Mane, Chaitali. "AI Sign Language Detection for Disabled People & Chatbot." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem45854.

Full text
Abstract:
Abstract - This project presents an intelligent, multi-functional system designed to empower and support individuals with disabilities, especially those with hearing or speech impairments. The solution integrates real-time Sign Language Detection with an AI-powered assistant to facilitate seamless communication, access to essential services, and daily digital interactions. Built using HTML, CSS, JavaScript, and machine learning techniques, this innovative platform bridges the communication gap between differently-abled users and the digital world. The system uses computer vision models to recognize sign language gestures, particularly the English alphabet, enabling users to communicate through hand signs. The recognized signs are then translated into text or voice using Text-to-Speech (TTS) modules. To extend usability, this feature is integrated into a user-friendly graphical interface that allows users to interact with the system visually and audibly. A major highlight of the project is its Jarvis-inspired voice assistant, which responds to the wake word "Hey Jarvis." Once activated, the assistant can perform various tasks such as opening websites, fetching real-time information (e.g., weather, news, etc.), and responding to queries using a smart chatbot module. The chatbot fetches accurate responses from the Google Search API to ensure up-to-date and contextually relevant answers.
APA, Harvard, Vancouver, ISO, and other styles
24

Anisya, Kirana, Yustina Heny Wardhani, Yang Agita Rindi, and Sari Mubaroh. "Perancangan Aplikasi Pendukung Komunikasi Bagi Wisatawan Asing Yang Berkunjung Ke Indonesia." Technologia : Jurnal Ilmiah 16, no. 2 (2025): 283. https://doi.org/10.31602/tji.v16i2.17972.

Full text
Abstract:
Perbedaan bahasa menjadi salah satu tantangan utama yang dihadapi wisatawan asing saat berkunjung ke Indonesia. Kendala komunikasi ini seringkali menghambat interaksi antara wisatawan dan penduduk lokal, serta mengurangi kenyamanan wisatawan. Selain itu, banyak aplikasi penerjemah seperti Google Translate yang masih memiliki keterbatasan dalam hal akurasi dan efisiensi penggunaannya di wilayah dengan koneksi internet tidak stabil. Oleh karena itu, solusi yang lebih komprehensif dan mudah digunakan sangat diperlukan. Penelitian ini bertujuan untuk mengembangkan aplikasi Android yang mengintegrasikan teknologi API Google Translate untuk penerjemahan bahasa Inggris dan Indonesia, dilengkapi dengan fitur rekomendasi wisata berbasis konten. Aplikasi ini dirancang menggunakan metode SDLC Waterfall yang mencakup tahap analisis, desain, coding, dan pengujian. Hasil pengujian menunjukkan bahwa aplikasi yang dikembangkan mampu memfasilitasi komunikasi dua arah melalui fitur penerjemahan langsung dan menyediakan rekomendasi wisata berdasarkan provinsi. Fitur tambahan seperti speech-to-text meningkatkan kenyamanan pengguna dalam berinteraksi dengan aplikasi. Pengujian menggunakan metode Blackbox membuktikan bahwa semua fungsi aplikasi berjalan dengan baik dan sesuai kebutuhan. Namun, penelitian ini masih memiliki keterbatasan dalam hal cakupan data wisata dan ketergantungan pada koneksi internet. Pengembangan lebih lanjut disarankan untuk memperluas cakupan fitur dan meningkatkan fungsionalitas offline.
APA, Harvard, Vancouver, ISO, and other styles
25

N, Sudiksha. "Talking Fingers: Bridging the Communication Gap through Real-Time Speech-to-Indian Sign Language Translation." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 01 (2025): 1–9. https://doi.org/10.55041/ijsrem40875.

Full text
Abstract:
"Talking Fingers" is an innovative initiative to be developed to facilitate communication between hearing and non-hearing individuals by building a web-based system that can translate spoken language into Indian Sign Language (ISL). Being an essential means of communication among millions in India, ISL remains underdeveloped by technologies that are dominated by American and British Sign Languages. Current tools rely on the basic word-by-word translation with no contextual or grammatical accuracy. The proposed system will thus integrate speech recognition, NLP, and ISL visuals for real-time, context-aware translations. Spoken input will be converted into text through the Google Speech API and then processed using NLP techniques to segment meaningful phrases. The matched phrases are matched with the ISL visual representations, which may be in the form of videos or GIFs, in a comprehensive database. A fallback mechanism ensures seamless communication by spelling out words letter by letter when specific ISL visuals are unavailable. This platform serves as scalable and adaptable solutions for different public and educational spaces, bridging the communication gap for the deaf and hard-of-hearing community. With emphasis on ISL and incorporation of advanced technologies, "Talking Fingers" delivers an inclusive and robust solution, enabling users and bringing greater inclusivity in communication. Keywords: Indian Sign Language (ISL), Natural Language Processing (NLP), Speech-to-Sign Translation, Communication Accessibility, Real-time Translation, Sign Language Automation
APA, Harvard, Vancouver, ISO, and other styles
26

Aitha, Sreevarsha, Meghana Mummali, Meera Alphy Dr., and K. Shirisha Mr. "Language Translator Application using Google Translate API." Recent Innovations in Wireless Network Security 6, no. 3 (2024): 41–45. https://doi.org/10.5281/zenodo.13363877.

Full text
Abstract:
<em>The Language Translator Application is a cutting-edge tool designed to facilitate seamless communication across different languages. By leveraging the powerful capabilities of the Google Translation API, this application offers robust translation services that cater to a wide array of languages and use cases. Built using Python, the application combines simplicity and efficiency, providing users with a reliable solution for their translation needs. This project presents the development of a language translator application using the Google Translation API, implemented in Python. The primary objective is to create a user-friendly and efficient tool capable of translating text between multiple languages, leveraging the powerful capabilities of Google's machine learning models. The application is designed to accommodate a wide range of languages supported by the Google Translation API, ensuring broad usability for diverse linguistic needs. The translator application is built using Python, chosen for its simplicity and extensive library support. The Google Cloud Translation API is integrated into the application to handle the translation tasks, providing accurate and real-time translation services. The application architecture is structured to include a user interface that accepts text input, processes the request via the API, and displays the translated text output. Key features of the application include automatic language detection, support for over 100 languages, and the ability to handle text inputs of varying lengths. Additionally, the application is designed with a focus on performance, reliability, and ease of use, making it accessible for both casual users and those requiring frequent translations for professional purposes.</em>
APA, Harvard, Vancouver, ISO, and other styles
27

Orochi, Orlunwo Placida, and Ledesi Kabari. "Text-to-Speech Recognition using Google API." International Journal of Computer Applications 183, no. 15 (2021): 18–20. http://dx.doi.org/10.5120/ijca2021921474.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Niño, Ana. "Pronunciation Practice with Google Translate TTS and ASR Functions." CALICO Journal 42, no. 1 (2025): 69–93. https://doi.org/10.3138/calico-2024-1223.

Full text
Abstract:
This article examines the impact of the use of Google Translate's (GT) text-to-speech and automatic speech recognition functions on the practice of pronunciation, intonation, and oral production at the beginner's level. Utilizing a self-study questionnaire, the acquisition of pronunciation, intonation, and oral production was evaluated, and students were asked about the effectiveness of this tool for the independent learning of Spanish as a foreign language. Since there are not many investigations on this topic, the results of this study are of significance and indicate that the self-directed use of GT through carefully designed learning activities for practicing a specific set of second language pronunciation features at a particular language level can aid independent practice, thus increasing students’ phonetic and phonological awareness, helping perception, aiding memorization, and improving language confidence.
APA, Harvard, Vancouver, ISO, and other styles
29

Husni, H., Arif Muntasa, Sigit Susanto Putro, and Zulfi Osman. "Cross-Language Tourism News Retrieval System Using Google Translate API on SEBI Search Engine." Elinvo (Electronics, Informatics, and Vocational Education) 8, no. 1 (2023): 113–20. http://dx.doi.org/10.21831/elinvo.v8i1.55851.

Full text
Abstract:
Cross-Language Information Retrieval (CLIR) is responsible for retrieving information stored in a language different from the language of the query provided by the user. Some translation methods commonly used in CLIR are Dictionary, Parallel corpora, Comparable corpora, Machine translator, Ontology, and Transitive-based. The query must be translated to the target language, followed by preprocessing and calculating the similarity between the query and all documents in the corpus. The problem is the time and accuracy of query translation. Moreover, the queries are not written as complete sentences according to certain language rules. Stemming, for example, every language has its own method. Indonesian has basic words and affixes in the form of prefixes, suffixes, infixes, and confixes, while English only has suffixes. Stemming takes a long time in text processing. In the Indonesian search engine (SEBI), the provision of cross-language tourism news retrieval is realized using the Google Translate API, which translates the Query and all documents into English, Porter's stemming technique to convert each term to its general form, and cosine similarity to calculate similarity. This approach can deliver cross-language tourism news instantly while increasing the precision and efficiency of the SEBI search engine, although some improvements are needed to provide a more precise and efficient similarity computation.
APA, Harvard, Vancouver, ISO, and other styles
30

Papin, Kevin, and Walcir Cardoso. "Assessing the Pedagogical Potential of Google Translate's Speech Capabilities: Focus on French Pronunciation." CALICO Journal 42, no. 1 (2025): 1–24. https://doi.org/10.3138/calico-2025-0117.

Full text
Abstract:
As the capabilities of web-based machine translation develop, online translators such as Google Translate (GT) have attracted computer-assisted language learning (CALL) researchers’ attention for their potential to aid second/foreign language (L2) instruction. Using its built-in text-to-speech (TTS) and automatic speech recognition (ASR) features, GT can be used for L2 pronunciation practice. The aim of this study (part of a larger project investigating L2 learners’ use of speech technologies in homework settings) is to examine the impact of self-regulated pronunciation practice using GT's TTS and ASR features on the development of French liaison (the re-syllabification of latent consonants when they appear in consonant-plus-vowel contexts across words, e.g., /z/ in tes amis [te.za.mi] “your friends”). Participants were 20 adult beginner learners of French studying at an English-speaking university in Canada. Their phonological development (i.e., awareness, perception, and production) was assessed before (pretest) and after (immediate and delayed posttests) the completion of a semi-autonomous, GT-based pronunciation practice. The results of the analysis of variance (ANOVA, the statistical method used) indicate that the proposed treatment led to a statistically significant improvement in liaison production between the pretest and the delayed posttest, while phonological awareness and perception remained unaffected, probably due to a ceiling effect.
APA, Harvard, Vancouver, ISO, and other styles
31

Screen, Ben. "Machine translation and Welsh: Analysing free statistical machine translation for the professional translation of an under-researched language pair." Journal of Specialised Translation, no. 28 (July 25, 2017): 317–44. https://doi.org/10.26034/cm.jostrans.2017.244.

Full text
Abstract:
This article reports on a key-logging study carried out to test the benefits of post-editing Machine Translation (MT) for the professional translator within a hypothetico-deductive framework, contrasting the outcomes of a number of variables which are inextricably linked to the professional translation process. Given the current trend of allowing the professional translator to connect to Google Translate services within the main Translation Memory (TM) systems via an API, a between-groups design is utilised in which cognitive, technical and temporal effort are gauged between translation and post-editing the statistical MT engine Google Translate. The language pair investigated is English and Welsh. Results show no statistical difference between post-editing and translation in terms of processing time. Using a novel measure of cognitive effort focused on pauses, the cognitive effort exerted by post-editors and translators was, however, found to be statistically different. Results also show that a complex relationship exists between post-editing, translation and technical effort, in that aspects of text production processes were seen to be eased by post-editing. Finally, a bilingual review by two different translators found little difference in quality between the translated and post-edited texts, and that both sets of texts were acceptable according to accuracy and fidelity.
APA, Harvard, Vancouver, ISO, and other styles
32

Zubaidi, Ariyan, Affandy Akbar, and Ario Yudo Husodo. "Implementasi Google Speech API Pada Aplikasi Koreksi Hapalan Al-Qur'an Berbasis Android." Jurnal Teknologi Informasi, Komputer, dan Aplikasinya (JTIKA ) 1, no. 1 (2019): 1–8. http://dx.doi.org/10.29303/jtika.v1i1.8.

Full text
Abstract:
Muroja’ah is a method of repeating new memorization and old memorization that be heard to other people. This method is very popular in Indonesia. In order to make the learning process isn’t boring, it is required a learning media that can be used anytime, one of them are using Android application. The research describes how to build a Qur’an recitation correction application based on Android and integrate it with Google Speech so that application built more interesting and interactive with users. Google Speech be integrate as voice input media, users requested recite Qur’an verses then Google Speech will convert their voice to text. Text result obtained then matched with text in source code. Result of testing the accuracy of Google Speech implementation on this application it’s 100%, that means it’s been as expected.
APA, Harvard, Vancouver, ISO, and other styles
33

Afrianto, Irawan, Muhammad Fahmi Irfan, and Sufa Atin. "Aplikasi Chatbot Speak English Media Pembelajaran Bahasa Inggris Berbasis Android." Komputika : Jurnal Sistem Komputer 8, no. 2 (2019): 99–109. http://dx.doi.org/10.34010/komputika.v8i2.2273.

Full text
Abstract:
Kemampuan Bahasa Inggris yang baik tentu akan menjadi modal kompetitif, baik dalam bidang pendidikan maupun pekerjaan.Teknologi chatbot adalah merupakan teknologi yang dapat dimanfaatkaan sebagai solusi untuk permasalahan dalam lingkup edukasi. Beberapa penelitian telah membuktikan pemanfaatan chatbot sebagai media belajar, terutama pembelajaran Bahasa Inggris. Disamping itu, sumber daya untuk mendukung pengembangan teknologi chatbot sudah cukup banyak tersedia, seperti API yang mempermudah untuk pembuatan chatbot seperti Dialogflow API yang digunakan pada penelitian ini. Metode yang digunakan pada penelitian ini ialah berlatih dan evaluasi mandiri. Adapun tools pendukung ialah Languagetool API untuk koreksi grammar, Google Speech Recogntion dan Google Text-to-Speech menyediakan antarmuka voice chat. Penelitian ini membahas tentang pemanfaatan teknologi chatbot, tepatnya dengan membangun sebuah aplikasi android sebagai media latihan percakapan Bahasa Inggris. Adapun hasil pengujian menyimpulkan bahwa 63,96% pengguna menyatakan setuju dengan fitur yang dikembangkan pada aplikasi tersebut. Pengguna setuju, sistem yang telah dibangun dapat menjadi solusi untuk permasalahan tersebut, yakni sebagai media latihan percakapan Bahasa Inggris, selain itu pengguna juga terbantu dengan adanya fitur-fitur seperti voice chat, koreksi kesalahan, log harian untuk berlatih dan evaluasi mandiri.&#x0D; Kata Kunci – Chatbot; Pembelajaran Bahasa Inggris; Dialogflow API; Languangetool API; Google Text-To-Speech.
APA, Harvard, Vancouver, ISO, and other styles
34

JOTHISH C, Dr. "Language Translator Tool to Convert English to Hindi." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem47810.

Full text
Abstract:
To improve translation accuracy, our approach Abstract - This paper introduces a web-based English- to-Hindi translation system designed to enhance translation accuracy for government and official documents. The system integrates multiple translation functionalities, including direct text translation, document translation (PDF, images, and text files), website content translation, and translation history management. By leveraging the Google Translate API along with a custom terminology dictionary, it ensures precise translations tailored to government-specific language. The system is built using the Flask framework with an SQLite database for efficient history tracking and incorporates Optical Character Recognition (OCR) for translating image-based text. The methodology involves data preprocessing, text extraction, API-based translation, and accuracy evaluation. Users can interact with an intuitive web interface to translate text, upload documents, and retrieve past translations. With a focus on usability and linguistic accuracy, this system demonstrates the effective integration of machine translation with domain-specific linguistic resources, contributing to improved translation quality in specialized fields. Key Words: Language translation, machine translation, Flask, OCR, Google Translate API, government terminology
APA, Harvard, Vancouver, ISO, and other styles
35

Basystiuk, Oleh, and Nataliya Melnykova. "Development of the Multimodal Handling Interface Based on Google API." Computer Design Systems. Theory and Practice 6, no. 1 (2024): 216–23. http://dx.doi.org/10.23939/cds2024.01.216.

Full text
Abstract:
Today, Artificial Intelligence is a daily routine, becoming deeply entrenched in our lives. One of the most popular and rapidly advancing technologies is speech recognition, which forms an integral part of the broader concept of multimodal data handling. Multimodal data encompasses voice, audio, and text data, constituting a multifaceted approach to understanding and processing information. This paper presents the development of a multimodal handling interface leveraging Google API technologies. The interface aims to facilitate seamless integration and management of diverse data modalities, including text, audio, and video, within a unified platform. Through the utilization of Google API functionalities, such as natural language processing, speech recognition, and video analysis, the interface offers enhanced capabilities for processing, analysing, and interpreting multimodal data. The paper discusses the design and implementation of the interface, highlighting its features and functionalities. Furthermore, it explores potential applications and future directions for utilizing the interface in various domains, including healthcare, education, and multimedia content creation. Overall, the development of the multimodal handling interface based on Google API represents a significant step towards advancing multimodal data processing and enhancing user experience in interacting with diverse data sources.
APA, Harvard, Vancouver, ISO, and other styles
36

Sumathi, Dr P. "Transign AI." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem46813.

Full text
Abstract:
Abstract - The project "Speech to Sign Language Converter" presents a user-friendly and inclusive solution aimed at bridging communication gaps for the hearing-impaired. Utilizing advanced speech-to-text algorithms, natural language processing (NLP), and emotion recognition, the system converts spoken audio into Indian Sign Language (ISL) representations. A key feature is the integration of Google APIs to retrieve contextual images that enhance the expressiveness of the sign output. By embedding emotional cues within sign gestures, the system provides a more humanized and accurate communication experience. This paper outlines the technical framework, implementation methodology, and potential impact of this AI-powered assistive tool. Keywords — Sign Language, Emotion Recognition, NLP, Google API, Speech to Text, Assistive Technology
APA, Harvard, Vancouver, ISO, and other styles
37

Khairunizam, Khairunizam, Danuri Danuri, and Jaroji Jaroji. "Aplikasi Pemutar Musik Menggunakan Speech Recognition." INOVTEK Polbeng - Seri Informatika 2, no. 2 (2017): 97. http://dx.doi.org/10.35314/isi.v2i2.196.

Full text
Abstract:
Intisari - Perkembangan teknologi pada saat ini semakin maju dan sangat pesat, terutama pada teknologi smartphone semakin canggih, oleh sebab itu produsen smartphone semakin banyak dipasaran dengan menawarkan berbagai fitur-fitur dari smartphone yang diproduksinya dan bermacam-macam jenis merek. Aplikasi pemutar musik masih mencari lagu dengan menggunakan ketik untuk mencari lagu, pembuatan aplikasi pemutar musik dibutuhkan data musik, editor android studio, library google speech API dan platform android membuat sebuah aplikasi pemutar musik menggunakan speech recognition. Aplikasi ini bekerja dengan cara melakukan pencarian menggunakan pengucapan suara yang akan diproses menjadi sebuah perintah yang akan langsung memutarkan music, tidak hanya dalam melakukan pencarian tetapi dalam mengontrol musik juga dapat menggunakan perintah suara diantaranya perintah sebelumnya, selanjutnya, berhenti, mainkan, dan keluar. Saran untuk tahap pengembangan aplikasi ini kedepannya dengan menambahkan fitur mengkelompokan lagu dengan kategori album, artis, lagu, genre. Kata Kunci- musik, Speech to text, online, library google speech API
APA, Harvard, Vancouver, ISO, and other styles
38

G,, Anvith. "Talking Fingers: A Multilingual Speech-to-Sign Language Converter." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 01 (2025): 1–9. https://doi.org/10.55041/ijsrem40782.

Full text
Abstract:
Communication is a basic human need, but thousands of people with hearing and speech impairments face limitations in everyday communication "Talking Fingers" is a modern assistive technology tool that transfers spoken or written language to Indian Sign Language (ISL). With multilingual capabilities, the tool provides technologies such as Google ML Kit for language recognition, MyMemory API for translation, ISL grammar services for more than one language and several languages accessible at a time Translated ISL will, a it provides a simple and powerful communication channel. The system outlines the ability to blend artificial intelligence (AI) and language generation to promote inclusion and empower individuals with disabilities
APA, Harvard, Vancouver, ISO, and other styles
39

Soesanto, Daniel, Budi Hartanto, and Melisa. "Meeting Assistant System Berbasis Teknologi Speech-to-Text." Teknika 10, no. 1 (2021): 1–7. http://dx.doi.org/10.34148/teknika.v10i1.307.

Full text
Abstract:
Kegiatan rapat merupakan hal yang dilakukan secara berkala di setiap organisasi. Rapat tersebut diselenggarakan dengan tujuan untuk menyampaikan perkembangan tugas setiap divisi yang akan dicatat oleh seorang notulis. Pencatatan tersebut digunakan sebagai arsip organisasi yang biasanya disebut notulen rapat. Namun terdapat beberapa kendala yang dihadapi dalam penyusunan notulen rapat. Salah satu kendala tersebut misalnya notulis berhalangan hadir dalam rapat sehingga dibutuhkan orang yang menggantikan posisi notulis. Kendala lainnya adalah kecepatan mengetik atau menulis dari notulis yang belum tentu dapat menyamai kecepatan pembahasan dalam rapat, sehingga ada resiko informasi yang terlewat. Oleh sebab itu, dikembangkan sebuah sistem asisten rapat berbasis teknologi speech-to-text dari Google Cloud API. Tujuan dari pengembangan sistem ini adalah untuk mempermudah proses pembuatan notulen dengan melakukan ekstraksi kata-kata penting dari suatu rapat. Hasil pengujian menunjukkan bahwa sistem asisten rapat dapat menangani perekaman dan pengambilan percakapan penting secara ekstraktif. Sistem dapat mencapai akurasi tertinggi sebesar 81,3% ketika kualitas percakapan peserta memiliki struktur kalimat yang baik serta konteks pembahasan agenda terstruktur dengan baik.
APA, Harvard, Vancouver, ISO, and other styles
40

Tiwari, Rishin, Saloni Birthare, and Mr Mayank Lovanshi. "Audio to Sign Language Converter." International Journal for Research in Applied Science and Engineering Technology 10, no. 11 (2022): 206–11. http://dx.doi.org/10.22214/ijraset.2022.47271.

Full text
Abstract:
Abstract: The hearing and speech disabled people have a communication problem with other people. It is hard for such individuals to express themselves since everyone is not familiar with the sign language. The aim of this paper is to design a system which is helpful for the people with hearing / speech disabilities and convert a voice in Indian sign language (ISL). The task of learning a sign language can be cumbersome for people so this paper proposes a solution to this problem using speech recognition and image processing. The Sign languages have developed a means of easy communication primarily for the deaf and hard of hearing people. In this work we propose a real time system that recognizes the voice input through Pyaudio, SPHINX and Google speech recognition API and converts it into text, followed by sign language output of text which is displayed on the screen of the machine in the form of series of images or motioned video by the help of various python libraries.
APA, Harvard, Vancouver, ISO, and other styles
41

Rajesh, Rajesh. "AI-Based PDF Translator." International Scientific Journal of Engineering and Management 04, no. 05 (2025): 1–7. https://doi.org/10.55041/isjem03470.

Full text
Abstract:
ABSTRACT: Automated document translation plays a critical role in overcoming language barriers and facilitating seamless communication across global industries. This project harnesses the power of Natural Language Processing (NLP) and Optical Character Recognition (OCR) to extract, translate, and reconstruct text from PDF documents while preserving their original layout and formatting. By utilizing Transformer-based models such as GPT and the Google Translate API, alongside robust text extraction tools, the system delivers accurate and efficient multilingual translations. The methodology incorporates Python libraries including PyMuPDF, pdfplumber, Tesseract-OCR, and the OpenAI API to manage text recognition, translation, and reformatting processes. This AI-driven solution aims to enhance accessibility, foster global collaboration, and streamline multilingual document workflows across diverse sectors. Keywords: PDF Translation, Natural Language Processing (NLP), Optical Character Recognition (OCR), Transformer Models, Multilingual Document Processing.
APA, Harvard, Vancouver, ISO, and other styles
42

Choi, Jungyoon, Haeyoung Gill, Soobin Ou, and Jongwoo Lee. "CCVoice: Voice to Text Conversion and Management Program Implementation of Google Cloud Speech API." KIISE Transactions on Computing Practices 25, no. 3 (2019): 191–97. http://dx.doi.org/10.5626/ktcp.2019.25.3.191.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Shi, Guo You, Shuang Liu, and Peng Chen. "Design and Implementation of E-Learning System for Code-Switching Bilingual Teaching." Applied Mechanics and Materials 241-244 (December 2012): 2886–90. http://dx.doi.org/10.4028/www.scientific.net/amm.241-244.2886.

Full text
Abstract:
To support code-switching bilingual teaching in computer major courses, an online e-Leaning system based on model-view-controller (MVC for short) architecture is designed and implemented using object-oriented analysis and design method. Here, Struts, Spring and Hibernate are chosen as development frameworks to implement MVC model. Google Translate API as a tool that automatically translates text from one language to another language is adopted to support web pages with multiple languages. Based on various teaching materials such as video or audio or flash, it is easy for students to finish self-study and for teachers to finish supplementary teaching. With this e-Learning system, it is convenient to arouse students’ interests of computer major courses, improving students’ English expression skills of speaking and writing and enhancing students’ international employment competence.
APA, Harvard, Vancouver, ISO, and other styles
44

Jaiswal, A. A., Abhinav Nema, Akshay Mokalwar, Kalyani Paraye, and Prajwal Shivarkar. "Vision Pro– A Lens Application." International Journal of Computer Science and Mobile Computing 11, no. 1 (2022): 209–13. http://dx.doi.org/10.47760/ijcsmc.2022.v11i01.028.

Full text
Abstract:
Object and text detection are very common in day to day life. Real world applications such as Google Lens, Bixby Vision and many other apps are available in our mobile phone. The single app which can Translate, Search and detect text in real time is popular in past years. Our Vision Pro is similar to them and basically an image recognition software which can translate the text, detect handwritten text, search for that text, and search for the objects. This technology uses Camera from the device and built on neural network. The idea of Vision Pro mainly comes from the Google Lens. It uses the Convolution Neural Network along with some major python libraries for translation, Tensorflow or Keras for model training and OpenCV for reading image and processing it. Also Vision API can also be used to make this type of program.
APA, Harvard, Vancouver, ISO, and other styles
45

Ms Shiwani Gupta, Mohd Haider, and Md Shabbir. "Development of an AI-Powered Voice Assistant: Enhancing Speech Recognition and User Interaction." International Journal of Scientific Research in Science, Engineering and Technology 12, no. 3 (2025): 717–27. https://doi.org/10.32628/ijsrset251296.

Full text
Abstract:
Artificial Intelligence (AI) and Natural Language Processing (NLP) have significantly transformed human- computer interaction, enabling intelligent systems to process voice commands efficiently. Voice assistants, such as Google Assistant, Amazon Alexa, and Apple Siri, have set industry benchmarks, but they still face challenges in real-time response accuracy, handling ambient noise, and offline functionality. This paper presents the development of a custom AI -powered voice assistant, focusing on improving listening abilities, noise filtration, and command execution efficiency. The proposed system integrates Google Speech Recognition API for real-time speech-to-text conversion, pyttsx3 for text-to- speech synthesis, and natural language processing techniques to interpret and execute commands. Unlike cloud-dependent voice assistants, this system provides offline capabilities for essential commands, ensuring flexibility and usability even in low- connectivity environments. Experimental results demonstrate that the assistant achieves over 91% speech recognition accuracy in controlled environments, with an average command execution time of less than one second. Future enhancements include deep learning- based NLP models, real-time wake-word detection, and a graphical user interface for better interaction. The proposed system serves as a foundation for customizable, intelligent, and efficient AI-powered voice assistants.
APA, Harvard, Vancouver, ISO, and other styles
46

Dr., Mamatha S. K. "Sign-Verse – Sign Language Recognition System Using Convolutional Neural Networks." Advancement in Image Processing and Pattern Recognition 8, no. 2 (2025): 8–16. https://doi.org/10.5281/zenodo.15173169.

Full text
Abstract:
<em>Sign-Verse presents a cutting-edge Sign Language Detection System (SLDS) driven by Convolutional Neural Networks (CNNs) with the goal of assisting the deaf and hard-of-hearing community in overcoming communication obstacles. Using Google Text-to-Speech API and state-of-the-art technology, the system converts hand gestures into text and speech formats with ease. It also has an emergency services capability that sends distress signals via emails to pre-designated emergency contacts in response to certain motions. Sign-Verse turns standard communication methods on their head, making it a revolutionary tool that improves safety and reach for individuals using sign language. Real-time gesture interpretation, multi-modal output display, and quick emergency reaction mechanisms are some of its key aspects, which highlight how important it is to promote inclusion and wellbeing.</em>
APA, Harvard, Vancouver, ISO, and other styles
47

Rani, Dr Yatu, Ms Gurminder Kaur, Harsh Rana, Sagar ., and Nikhil . "JARVIS: A Virtual Assistant." International Journal for Research in Applied Science and Engineering Technology 11, no. 2 (2023): 751–53. http://dx.doi.org/10.22214/ijraset.2023.49111.

Full text
Abstract:
Abstract: As we know Python is an emerging language so it becomes easy to write a script for Voice Assistant in Python.The instructions for the assistant can be handled as per the requirement of user. Speech recognition is the process of converting speech into text. This is commonly used in voice assistants like Alexa, Siri, etc. In Python there is an API called Speech Recognition which allows us to convert speech into text. It was an interesting task to make my own assistant. It became easier to send emails without typing any word, Searching on Google without opening the browser, andperforming many other daily tasks like playing music, opening your favorite IDE with the help of a single voice command. In the current scenario, advancement in technologies are such that they can perform any task with same effectiveness or can say more effectively than us. I realized that the concept of AI in every field is decreasing human effort and saving time.
APA, Harvard, Vancouver, ISO, and other styles
48

Abdulkareem, Ademola, Tobiloba E. Somefun, Oji K. Chinedum, and Felix Agbetuyi. "Design and implementation of speech recognition system integrated with internet of things." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 2 (2021): 1796. http://dx.doi.org/10.11591/ijece.v11i2.pp1796-1803.

Full text
Abstract:
The process of speech recognition is such that a speech signal from a client or user is received by the system through a microphone, then the system analyses this signal and extracts useful information from the signal which is converted to text. This study focuses on the design and implementation of a speech recognition system integrated with internet of thing (IoT) to control electrical appliances and door with raspberry pi as a core element. To design the speech recognition system, digital signal processing (DSP) technique and hidden Markov model were fully considered for processing, extraction and high predictive accuracy of the system. The Google application programming interface (API) was used as a cloud server to store command and give the system to assess to the internet. With 150 speech samples on the system, a high level of accuracy of over 80% was obtained.
APA, Harvard, Vancouver, ISO, and other styles
49

Ademola, Abdulkareem, E. Somefun Tobiloba, K. Chinedum Oji, and Agbetuyi Felix. "Design and implementation of speech recognition system integrated with internet of things." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 2 (2021): 1796–803. https://doi.org/10.11591/ijece.v11i2.pp1796-1803.

Full text
Abstract:
The process of speech recognition is such that a speech signal from a client or user is received by the system through a microphone, then the system analyses this signal and extracts useful information from the signal which is converted to text. This study focuses on the design and implementation of a speech recognition system integrated with internet of thing (IoT) to control electrical appliances and door with raspberry pi as a core element. To design the speech recognition system, digital signal processing (DSP) technique and hidden Markov model were fully considered for processing, extraction and high predictive accuracy of the system. The Google application programming interface (API) was used as a cloud server to store command and give the system to assess to the internet. With 150 speech samples on the system, a high level of accuracy of over 80% was obtained.
APA, Harvard, Vancouver, ISO, and other styles
50

Ahmed, Hanzala. "Video Summarizer." International Journal for Research in Applied Science and Engineering Technology 13, no. 3 (2025): 224–27. https://doi.org/10.22214/ijraset.2025.67234.

Full text
Abstract:
This project focuses on creating a "Video Summarizer" that leverages advanced technologies to generate concise summaries and visual previews of videos. The process begins by converting the video input into audio using Python libraries such as MoviePy, Speech Recognition. The audio is then transcribed into text, which is summarized using the Google Gemini API to produce clear and succinct summaries. Additionally, the system generates visual previews of the video by extracting text using Pytesseract and applying structural similarity metrics to identify key frames, offering a visual snapshot of the content. The project includes a Flask-based backend and a user-friendly interface built with HTML, CSS, and JavaScript. The backend is powered by Node.js and Express to handle API requests. This integrated approach enables users to quickly understand the essence of video content while preserving its visual context, making it an invaluable tool for viewers with limited time and content analysts.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography