To see the other types of publications on this topic, follow the link: Automatic speech recognition system (ASR).

Journal articles on the topic 'Automatic speech recognition system (ASR)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Automatic speech recognition system (ASR).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Galatang, Danny Henry, and Suyanto Suyanto. "Syllable-Based Indonesian Automatic Speech Recognition." International Journal on Electrical Engineering and Informatics 12, no. 4 (2020): 720–28. http://dx.doi.org/10.15676/ijeei.2020.12.4.2.

Full text
Abstract:
The syllable-based automatic speech recognition (ASR) systems commonly perform better than the phoneme-based ones. This paper focuses on developing an Indonesian monosyllable-based ASR (MSASR) system using an ASR engine called SPRAAK and comparing it to a phoneme-based one. The Mozilla DeepSpeech-based end-to-end ASR (MDSE2EASR), one of the state-of-the-art models based on character (similar to the phoneme-based model), is also investigated to confirm the result. Besides, a novel Kaituoxu SpeechTransformer (KST) E2EASR is also examined. Testing on the Indonesian speech corpus of 5,439 words sh
APA, Harvard, Vancouver, ISO, and other styles
2

Janai, Siddhanna, Shreekanth T., Chandan M., and Ajish K. Abraham. "Speech-to-Speech Conversion." International Journal of Ambient Computing and Intelligence 12, no. 1 (2021): 184–206. http://dx.doi.org/10.4018/ijaci.2021010108.

Full text
Abstract:
A novel approach to build a speech-to-speech conversion (STSC) system for individuals with speech impairment dysarthria is described. STSC system takes impaired speech having inherent disturbance as input and produces a synthesized output speech with good pronunciation and noise free utterance. The STSC system involves two stages, namely automatic speech recognition (ASR) and automatic speech synthesis. ASR transforms speech into text, while automatic speech synthesis (or text-to-speech [TTS]) performs the reverse task. At present, the recognition system is developed for a small vocabulary of
APA, Harvard, Vancouver, ISO, and other styles
3

Singh, Moirangthem Tiken. "Automatic Speech Recognition System: A Survey Report." Science & Technology Journal 4, no. 2 (2016): 152–55. http://dx.doi.org/10.22232/stj.2016.04.02.10.

Full text
Abstract:
This paper presents a report on an Automatic Speech Recognition System (ASR) for different Indian language under different accent. The paper is a comparative study of the performance of system developed which uses Hidden Markov Model (HMM) as the classifier and Mel-Frequency Cepstral Coefficients (MFCC) as speech features.
APA, Harvard, Vancouver, ISO, and other styles
4

Hamidi, Mohamed, Hassan Satori, Ouissam Zealouk, and Naouar Laaidi. "Estimation of ASR Parameterization for Interactive System." International Journal of Natural Computing Research 10, no. 1 (2021): 28–40. http://dx.doi.org/10.4018/ijncr.2021010103.

Full text
Abstract:
In this study, the authors explore the integration of speaker-independent automatic Amazigh speech recognition technology into interactive applications to extract data remotely from a distance database. Based on the combined interactive voice response (IVR) and automatic speech recognition (ASR) technologies, the authors built an interactive speech system to allow users to interact with the interactive system through voice commands. The hidden Markov models (HMMs), Gaussian mixture models (GMMs), and Mel frequency spectral coefficients (MFCCs) are used to develop a speech system based on the t
APA, Harvard, Vancouver, ISO, and other styles
5

Tiwari, Sonal Anilkumar. "A Fundamental of Automatic Speech Recognition and Speech Database." International Journal for Research in Applied Science and Engineering Technology 9, no. 9 (2021): 1020–27. http://dx.doi.org/10.22214/ijraset.2021.38094.

Full text
Abstract:
Abstract: This can be quite interesting when we think that we commanding something to in-animated objects. Yes it is possible with the help of ASR systems. Speech recognition system is a system that can make humans to talk with machineries. Nowadays speech recognition is such a technique that without it, a person cannot do any of his work properly. People get addicted of it. And it has become a habit for humans like we use mobile phones but when we want to type something, then we immediately can pass the voice commands. With which our Efforts are reduced, as well as a lot of our time. Keywords
APA, Harvard, Vancouver, ISO, and other styles
6

Salaja, Rosemary T., Ronan Flynn, and Michael Russell. "A Life-Based Classifier for Automatic Speech Recognition." Applied Mechanics and Materials 679 (October 2014): 189–93. http://dx.doi.org/10.4028/www.scientific.net/amm.679.189.

Full text
Abstract:
Research in speech recognition has produced different approaches that have been used for the classification of speech utterances in the back-end of an automatic speech recognition (ASR) system. As speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. This paper proposes a new back-end classifier that is based on artificial life (ALife) and describes how the proposed classifier can be used in a speech recognition system.
APA, Harvard, Vancouver, ISO, and other styles
7

Bhardwaj, Vivek, Vinay Kukreja, and Amitoj Singh. "Usage of Prosody Modification and Acoustic Adaptation for Robust Automatic Speech Recognition (ASR) System." Revue d'Intelligence Artificielle 35, no. 3 (2021): 235–42. http://dx.doi.org/10.18280/ria.350307.

Full text
Abstract:
Most of the automatic speech recognition (ASR) systems are trained using adult speech due to the less availability of the children's speech dataset. The speech recognition rate of such systems is very less when tested using the children's speech, due to the presence of the inter-speaker acoustic variabilities between the adults and children's speech. These inter-speaker acoustic variabilities are mainly because of the higher pitch and lower speaking rate of the children. Thus, the main objective of the research work is to increase the speech recognition rate of the Punjabi-ASR system by reduci
APA, Harvard, Vancouver, ISO, and other styles
8

Ali, Mohammed Hasan, Mustafa Musa Jaber, Sura Khalil Abd, et al. "Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System." Applied Sciences 12, no. 3 (2022): 1091. http://dx.doi.org/10.3390/app12031091.

Full text
Abstract:
Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech. However, traditional systems have low performance due to a noisy environment. In addition to this, accents and local differences negatively affect the ASR system’s performance while analyzing speech signals. A precise speech recognition system was developed to improve the system performance to ove
APA, Harvard, Vancouver, ISO, and other styles
9

Jamil, Bushra, Saba Sultan, and Humaira Ijaz. "Design and Development of an Acoustic-Based Recongnition System Using DNN." Sukkur IBA Journal of Computing and Mathematical Sciences 8, no. 1 (2024): 32–42. http://dx.doi.org/10.30537/sjcms.v8i1.1400.

Full text
Abstract:
Automatic speech recognition is a process of using computers to convert voice signals produced by human speech into reasonable format i.e. text or command that conveys the same meaning as the speaker intended to do. Many researchers are working on various languages including English and other European languages like Spanish, German, and French etc. to develop an automated system for speech recognition (ASR). However, researchers on the development of ASR for the Urdu language have put very little effort. We have developed an Urdu speech recognition system using Deep Neural Network (DNN) on our
APA, Harvard, Vancouver, ISO, and other styles
10

Singh, Satyanand. "High level speaker specific features modeling in automatic speaker recognition system." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 2 (2020): 1859. http://dx.doi.org/10.11591/ijece.v10i2.pp1859-1867.

Full text
Abstract:
Spoken words convey several levels of information. At the primary level, the speech conveys words or spoken messages, but at the secondary level, the speech also reveals information about the speakers. This work is based on the high-level speaker-specific features on statistical speaker modeling techniques that express the characteristic sound of the human voice. Using Hidden Markov model (HMM), Gaussian mixture model (GMM), and Linear Discriminant Analysis (LDA) models build Automatic Speaker Recognition (ASR) system that are computational inexpensive can recognize speakers regardless of what
APA, Harvard, Vancouver, ISO, and other styles
11

Satyanand, Singh, and Singh Pragya. "High level speaker specific features modeling in automatic speaker recognition system." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 2 (2020): 1859–67. https://doi.org/10.11591/ijece.v10i2.pp1859-1867.

Full text
Abstract:
Spoken words convey several levels of information. At the primary level, the speech conveys words or spoken messages, but at the secondary level, the speech also reveals information about the speakers. This work is based on the high-level speaker-specific features on statistical speaker modeling techniques that express the characteristic sound of the human voice. Using Hidden Markov model (HMM), Gaussian mixture model (GMM), and Linear Discriminant Analysis (LDA) models build Automatic Speaker Recognition (ASR) system that are computational inexpensive can recognize speakers regardless of what
APA, Harvard, Vancouver, ISO, and other styles
12

Dua, Mohit, Rajesh Kumar Aggarwal, and Mantosh Biswas. "Optimizing Integrated Features for Hindi Automatic Speech Recognition System." Journal of Intelligent Systems 29, no. 1 (2018): 959–76. http://dx.doi.org/10.1515/jisys-2018-0057.

Full text
Abstract:
Abstract An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC,
APA, Harvard, Vancouver, ISO, and other styles
13

Tong, Fuchuan, Tao Li, Dexin Liao, et al. "The XMUSPEECH System for Accented English Automatic Speech Recognition." Applied Sciences 12, no. 3 (2022): 1478. http://dx.doi.org/10.3390/app12031478.

Full text
Abstract:
In this paper, we present the XMUSPEECH systems for Track 2 of the Interspeech 2020 Accented English Speech Recognition Challenge (AESRC2020). Track 2 is an Automatic Speech Recognition (ASR) task where the non-native English speakers have various accents, which reduces the accuracy of the ASR system. To solve this problem, we experimented with acoustic models and input features. Furthermore, we trained a TDNN-LSTM language model for lattice rescoring to obtain better results. Compared with our baseline system, we achieved relative word error rate (WER) improvements of 40.7% and 35.7% on the d
APA, Harvard, Vancouver, ISO, and other styles
14

Qian, Zhaopeng, and Kejing Xiao. "A Survey of Automatic Speech Recognition for Dysarthric Speech." Electronics 12, no. 20 (2023): 4278. http://dx.doi.org/10.3390/electronics12204278.

Full text
Abstract:
Dysarthric speech has several pathological characteristics, such as discontinuous pronunciation, uncontrolled volume, slow speech, explosive pronunciation, improper pauses, excessive nasal sounds, and air-flow noise during pronunciation, which differ from healthy speech. Automatic speech recognition (ASR) can be very helpful for speakers with dysarthria. Our research aims to provide a scoping review of ASR for dysarthric speech, covering papers in this field from 1990 to 2022. Our survey found that the development of research studies about the acoustic features and acoustic models of dysarthri
APA, Harvard, Vancouver, ISO, and other styles
15

Raval, Deepang, Vyom Pathak, Muktan Patel, and Brijesh Bhatt. "Improving Deep Learning based Automatic Speech Recognition for Gujarati." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 3 (2022): 1–18. http://dx.doi.org/10.1145/3483446.

Full text
Abstract:
We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transfo
APA, Harvard, Vancouver, ISO, and other styles
16

Bhattacharjee, Mrinmoy, Petr Motlicek, Srikanth Madikeri, et al. "Minimum effort adaptation of automatic speech recognition system in air traffic management." European Journal of Transport and Infrastructure Research 24, no. 4 (2025): 133–53. https://doi.org/10.59490/ejtir.2024.24.4.7531.

Full text
Abstract:
Advancements in Automatic Speech Recognition (ASR) technology is exemplified by ubiquitous voice assistants such as Siri and Alexa. Researchers have been exploring the application of ASR for Air Traffic Management (ATM) systems. Initial prototypes utilized ASR to pre-fill aircraft radar labels and achieved a technological readiness level before industrialization (TRL6). However, accurately recognizing infrequently used but highly informative domain-specific vocabulary is still an issue. This includes waypoint names specific to each airspace region and unique airline designators, e.g., “dexon”
APA, Harvard, Vancouver, ISO, and other styles
17

R.V, Shalini, Sangamithra G, Shamna A.S, Priyadharshini B, and Raguram M. "Digital Prescription for Hospital Database Management using ASR." International Journal of Computer Communication and Informatics 6, no. 1 (2024): 58–69. http://dx.doi.org/10.34256/ijcci2414.

Full text
Abstract:
According to American Medical Association (AMA), handwritten prescriptions are associated with larger risk of pharmaceutical errors when compared to electronic prescriptions. The solution to this problem is to create a digital prescription. This application leverages the usage of automated speech recognition (ASR) technology with digital prescription to make flawless and legible prescriptions. Automatic speech recognition reduces transcribing errors and speeds up prescription processing as well as ensures smooth interface with hospital database management by translating spoken instructions int
APA, Harvard, Vancouver, ISO, and other styles
18

Kawahara, Tatsuya. "Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 2 (2012): 2224–28. http://dx.doi.org/10.1609/aaai.v26i2.18962.

Full text
Abstract:
This article describes a new automatic transcription system in the Japanese Parliament which deploys our automatic speech recognition (ASR) technology. To achieve high recognition performance in spontaneous meeting speech, we have investigated an efficient training scheme with minimal supervision which can exploit a huge amount of real data. Specifically, we have proposed a lightly-supervised training scheme based on statistical language model transformation, which fills the gap between faithful transcripts of spoken utterances and final texts for documentation. Once this mapping is trained, w
APA, Harvard, Vancouver, ISO, and other styles
19

Liang, Qiyu. "Automatic speech recognition technology: History, applications and improvements." Applied and Computational Engineering 65, no. 1 (2024): 180–84. http://dx.doi.org/10.54254/2755-2721/65/20240493.

Full text
Abstract:
In todays world, automatic speech recognition(ASR) has been an important part of artificial intelligence. It has been recognized as an extremely difficult highly challenging high-tech topic. It mainly converts the vocabulary content in human speech into computer-readable input, which is generally understandable text content, and may also be binary encoding or character sequences. Since the 1950s, ASR has been continuously developing from simple systems for pronunciation of 10 English numbers to the rise of multiple frameworks and different neural networks. The process of ASR is constantly beco
APA, Harvard, Vancouver, ISO, and other styles
20

K, Pavan Raju, Sri Krishna A, and Murali M. "AUTOMATIC SPEECH RECOGNITION SYSTEM USING MFCC-BASED LPC APPROACH WITH BACK PROPAGATED ARTIFICIAL NEURAL NETWORKS." ICTACT Journal on Soft Computing 10, no. 4 (2020): 2153–59. https://doi.org/10.21917/ijsc.2020.0306.

Full text
Abstract:
Over the previous years, a marvelous quantity of study was performed by utilizing the artificial intelligence based deep learning approaches for the speech recognition applications. The automatic speech recognition (ASR) facing the problems in as preprocessing, feature extraction and classification stages mostly, thus solving these problems is mandatory to improve the classification accuracy of speech processing. To solve these issues, an advanced speech recognition methodology has developed by utilizing the Spectral Subtraction (SS) method of denoising with the combination of Mel-frequency Ce
APA, Harvard, Vancouver, ISO, and other styles
21

Sirora, Leslie Wellington, and Mainford Mutandavari. "A Deep Learning Automatic Speech Recognition Model for Shona Language." International Journal of Innovative Research in Computer and Communication Engineering 12, no. 09 (2024): 1–14. http://dx.doi.org/10.15680/ijircce.2024.1209019.

Full text
Abstract:
This study presented the development of a deep learning-based Automatic Speech Recognition (ASR) system for Shona, a low-resource language characterized by unique tonal and grammatical complexities. The research aimed to address the challenges posed by limited training data, a lack of labelled data, and the intricate tonal nuances present in Shona speech, with the objective of achieving significant improvements in recognition accuracy compared to traditional statistical models. Motivated by the limitations of existing approaches, the research addressed three key questions. Firstly, it explored
APA, Harvard, Vancouver, ISO, and other styles
22

Qian, Zhaopeng, Li Wang, Shaochuan Zhang, Chan Liu, and Haijun Niu. "Mandarin Electrolaryngeal Speech Recognition Based on WaveNet-CTC." Journal of Speech, Language, and Hearing Research 62, no. 7 (2019): 2203–12. http://dx.doi.org/10.1044/2019_jslhr-s-18-0313.

Full text
Abstract:
Purpose The application of Chinese Mandarin electrolaryngeal (EL) speech for laryngectomees has been limited by its drawbacks such as single fundamental frequency, mechanical sound, and large radiation noise. To improve the intelligibility of Chinese Mandarin EL speech, a new perspective using the automatic speech recognition (ASR) system was proposed, which can convert EL speech into healthy speech, if combined with text-to-speech. Method An ASR system was designed to recognize EL speech based on a deep learning model WaveNet and the connectionist temporal classification (WaveNet-CTC). This s
APA, Harvard, Vancouver, ISO, and other styles
23

Hasija, Taniya, Virender Kadyan, Kalpna Guleria, Abdullah Alharbi, Hashem Alyami, and Nitin Goyal. "Prosodic Feature-Based Discriminatively Trained Low Resource Speech Recognition System." Sustainability 14, no. 2 (2022): 614. http://dx.doi.org/10.3390/su14020614.

Full text
Abstract:
Speech recognition has been an active field of research in the last few decades since it facilitates better human–computer interaction. Native language automatic speech recognition (ASR) systems are still underdeveloped. Punjabi ASR systems are in their infancy stage because most research has been conducted only on adult speech systems; however, less work has been performed on Punjabi children’s ASR systems. This research aimed to build a prosodic feature-based automatic children speech recognition system using discriminative modeling techniques. The corpus of Punjabi children’s speech has var
APA, Harvard, Vancouver, ISO, and other styles
24

Hai, Xinhe, Kaviya Aranganadin, Cheng-Cheng Yeh, et al. "A Self-Evaluated Bilingual Automatic Speech Recognition System for Mandarin–English Mixed Conversations." Applied Sciences 15, no. 14 (2025): 7691. https://doi.org/10.3390/app15147691.

Full text
Abstract:
Bilingual communication is increasingly prevalent in this globally connected world, where cultural exchanges and international interactions are unavoidable. Existing automatic speech recognition (ASR) systems are often limited to single languages. However, the growing demand for bilingual ASR in human–computer interactions, particularly in medical services, has become indispensable. This article addresses this need by creating an application programming interface (API)-based platform using VOSK, a popular open-source single-language ASR toolkit, to efficiently deploy a self-evaluated bilingual
APA, Harvard, Vancouver, ISO, and other styles
25

Gellatly, Andrew W., and Thomas A. Dingus. "Speech Recognition and Automotive Applications: Using Speech to Perform in-Vehicle Tasks." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 42, no. 17 (1998): 1247–51. http://dx.doi.org/10.1177/154193129804201715.

Full text
Abstract:
An experiment was conducted to investigate the effects of automatic speech recognition (ASR) system design, driver input-modality, and driver age on driving performance during in-vehicle task execution and in-vehicle task usability. Results showed that ASR system design (i.e., recognition accuracy and recognition error type) and driver input-modality (i.e., manual or speech) significantly affected certain dependent measures. However, the differences found were small, suggesting that less than ideal ASR system design/performance can be considered for use in automobiles without substantially imp
APA, Harvard, Vancouver, ISO, and other styles
26

Wu, Xianxian, Yan Zhang, and Wenyan Zhu. "Study on an English Speaking Practice System based on Automatic Speech Recognition Technology." Journal of Education and Educational Research 4, no. 1 (2023): 143–46. http://dx.doi.org/10.54097/jeer.v4i1.10273.

Full text
Abstract:
This research paper presents a study on an English speaking practice system that utilizes automatic speech recognition (ASR) technology. The system aims to assess pronunciation accuracy and provide real-time feedback to learners, ultimately enhancing their spoken English skills. The system employs a web-based platform where users can record their speech, which is then uploaded to the server for recognition using a pre-trained ASR model. The recognized speech is compared with a reference text, allowing for the calculation of pronunciation accuracy and the generation of feedback highlighting cor
APA, Harvard, Vancouver, ISO, and other styles
27

Riadchenko, M. P., and O. E. Piatykop. "Recognizing speech in voice messages." Reporter of the Priazovskyi State Technical University. Section: Technical sciences, no. 45 (December 29, 2022): 28–34. http://dx.doi.org/10.31498/2225-6733.45.2022.276225.

Full text
Abstract:
The level of development of information technology makes it possible to use speech recognition technologies in a wide range of human life and activities. It is very convenient to use the voice interface: voice search for the necessary documents, dialing a phone number, managing IOT devices, voice navigation, simple text dictation. Since the natural language interface provides an additional convenience for a person when typing, sending voice messages has become common among users. In this case, voice messages are audio files. But it is not always available and convenient for the recipient to li
APA, Harvard, Vancouver, ISO, and other styles
28

Giachos, Ioannis, Vasileios-Stylianos Lefkelis, Evangelos C. Papakitsos, Petros Savvidis, and Nikolaos Laskaris. "Applying Automatic Speech Recognition on Intelligent Human-Robot Interfaces for Operational Usage." WSEAS TRANSACTIONS ON COMPUTERS 24 (February 10, 2025): 20–28. https://doi.org/10.37394/23205.2025.24.3.

Full text
Abstract:
This paper deals with the implementation of a readily available automatic speech recognition (ASR) system in a human-robot interface (HRI), intended for operational uses. Automatic speech recognition is a very important process that has occupied artificial intelligence for over 70 years. The aim is to build the prerequisites with a basic code for the full integration of a modern advanced automatic speech recognition system into an intelligent human-robot interface, designed by the authors, and which is part of a developing robotic system. At the beginning of this paper, a brief discussion of t
APA, Harvard, Vancouver, ISO, and other styles
29

Fontan, Lionel, Isabelle Ferrané, Jérôme Farinas, et al. "Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners With Simulated Age-Related Hearing Loss." Journal of Speech, Language, and Hearing Research 60, no. 9 (2017): 2394–405. http://dx.doi.org/10.1044/2017_jslhr-s-16-0269.

Full text
Abstract:
Purpose The purpose of this article is to assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an automatic speech recognition (ASR) system. The long-term goal of this research is to develop a system that will assist audiologists/hearing-aid dispensers in the fine-tuning of hearing aids. Method Sixty young participants with normal hearing listened to speech materials mimicking the perceptual consequences of ARHL at different levels of severity. Two intelligibility tests (repetition of w
APA, Harvard, Vancouver, ISO, and other styles
30

G, Thimmaraja Yadava, G. Nagaraja B, Yogesh Kumaran S, C. Ramachandra A, and M. Arun Kumar N. "Development of Small Vocabulary Continuous Speech-to-Text System for Kannada Language/Dialects." Indian Journal of Science and Technology 15, no. 45 (2022): 2476–81. https://doi.org/10.17485/IJST/v15i45.1884.

Full text
Abstract:
Abstract <strong>Objectives:</strong>&nbsp;To develop a speech-to-text (STT) system using Kaldi speech recognition toolkit for continuous Kannada language/dialects.&nbsp;<strong>Methods:</strong>&nbsp;A continuous Kannada speech data is collected from 100 speakers/farmers of Karnataka state in field. The lexicon/dictionary and set of phonemes for Kannada language/dialects are created and transcribed the collected speech data using transcriber tool. The ASR models are developed at different phoneme levels using Kaldi.&nbsp;<strong>Findings:</strong>&nbsp;In this work, an effort is made to devel
APA, Harvard, Vancouver, ISO, and other styles
31

Nagajyothi, D., and P. Siddaiah. "Speech Recognition Using Convolutional Neural Networks." International Journal of Engineering & Technology 7, no. 4.6 (2018): 133. http://dx.doi.org/10.14419/ijet.v7i4.6.20449.

Full text
Abstract:
Automatic speech recognition (ASR) is the process of converting the vocal speech signals into text using transcripts. In the present era of computer revolution, the ASR plays a major role in enhancing the user experience, in a natural way, while communicating with the machines. It rules out the use of traditional devices like keyboard and mouse, and the user can perform an endless array of applications like controlling of devices and interaction with customer care. In this paper, an ASR based Airport enquiry system is presented. The system has been developed natively for telugu language. The d
APA, Harvard, Vancouver, ISO, and other styles
32

Auti, Dr Nisha, Atharva Pujari, Anagha Desai, Shreya Patil, Sanika Kshirsagar, and Rutika Rindhe. "Advanced Audio Signal Processing for Speaker Recognition and Sentiment Analysis." International Journal for Research in Applied Science and Engineering Technology 11, no. 5 (2023): 1717–24. http://dx.doi.org/10.22214/ijraset.2023.51825.

Full text
Abstract:
Abstract: Automatic Speech Recognition (ASR) technology has revolutionized human-computer interaction by allowing users to communicate with computer interfaces using their voice in a natural way. Speaker recognition is a biometric recognition method that identifies individuals based on their unique speech signal, with potential applications in security, communication, and personalization. Sentiment analysis is a statistical method that analyzes unique acoustic properties of the speaker's voice to identify emotions or sentiments in speech. This allows for automated speech recognition systems to
APA, Harvard, Vancouver, ISO, and other styles
33

Arisaputra, Panji, and Amalia Zahra. "Indonesian Automatic Speech Recognition with XLSR-53." Ingénierie des systèmes d information 27, no. 6 (2022): 973–82. http://dx.doi.org/10.18280/isi.270614.

Full text
Abstract:
This study focuses on the development of Indonesian Automatic Speech Recognition (ASR) using the XLSR-53 pre-trained model, the XLSR stands for cross-lingual speech representations. The use of this XLSR-53 pre-trained model is to significantly reduce the amount of training data in non-English languages required to achieve a competitive Word Error Rate (WER). The total amount of data used in this study is 24 hours, 18 minutes, and 1 second: (1) TITML-IDN 14 hours and 31 minutes; (2) Magic Data 3 hours and 33 minutes; and (3) Common Voice 6 hours, 14 minutes, and 1 second. With a WER of 20%, the
APA, Harvard, Vancouver, ISO, and other styles
34

Popović, Branislav, Edvin Pakoci, and Darko Pekar. "Transfer learning for domain and environment adaptation in Serbian ASR." Telfor Journal 12, no. 2 (2020): 110–15. http://dx.doi.org/10.5937/telfor2002110p.

Full text
Abstract:
In automatic speech recognition systems, the training data used for system development and the data actually obtained from the users of the system sometimes significantly differ in practice. However, other, more similar data may be available. Transfer learning can help to exploit such similar data for training in order to boost the automatic speech recognizer's performance for a certain domain. This paper presents a few applications of transfer learning in the context of speech recognition, specifically for the Serbian language. Several methods are proposed, with the goal of optimizing system
APA, Harvard, Vancouver, ISO, and other styles
35

Pipiras, Laurynas, Rytis Maskeliūnas, and Robertas Damaševičius. "Lithuanian Speech Recognition Using Purely Phonetic Deep Learning." Computers 8, no. 4 (2019): 76. http://dx.doi.org/10.3390/computers8040076.

Full text
Abstract:
Automatic speech recognition (ASR) has been one of the biggest and hardest challenges in the field. A large majority of research in this area focuses on widely spoken languages such as English. The problems of automatic Lithuanian speech recognition have attracted little attention so far. Due to complicated language structure and scarcity of data, models proposed for other languages such as English cannot be directly adopted for Lithuanian. In this paper we propose an ASR system for the Lithuanian language, which is based on deep learning methods and can identify spoken words purely from their
APA, Harvard, Vancouver, ISO, and other styles
36

Kim, Ji Youn. "Research Trends in the Evaluation and Treatment of Speech Disorders Using Automatic Speech Recognition (ASR)." Audiology and Speech Research 20, no. 4 (2024): 253–62. http://dx.doi.org/10.21848/asr.240159.

Full text
Abstract:
Purpose: The purpose of this study is to examine recent research trends regarding automatic speech recognition (ASR), which is used in the evaluation and intervention of speech disorders.Methods: Through a search engine, articles published in domestic journals were searched. A total of 27 papers were selected from the searched documents and analyzed according to the year, research subject, speech task, and ASR system.Results: The years with the most research was done in 2019~2021. The subjects who most frequently underwent speech evaluation and treatment using ASR system were those with dysart
APA, Harvard, Vancouver, ISO, and other styles
37

Imad, Qasim Habeeb, Z. Fadhil Tamara, Naser Jurn Yaseen, Qasim Habeeb Zeyad, and Najm Abdulkhudhur Hanan. "An ensemble technique for speech recognition in noisy environments." Indonesian Journal of Electrical Engineering and Computer Science (IJEECS) 18, no. 2 (2020): 835–42. https://doi.org/10.11591/ijeecs.v18.i2.pp835-842.

Full text
Abstract:
Automatic speech recognition (ASR) is a technology that allows a computer and mobile device to recognize and translate spoken language into text. ASR systems often produce poor accuracy for the noisy speech signal. Therefore, this research proposed an ensemble technique that does not rely on a single filter for perfect noise reduction but incorporates information from multiple noise reduction filters to improve the final ASR accuracy. The main factor of this technique is the generation of K-copies of the speech signal using three noise reduction filters. The speech features of these copies dif
APA, Harvard, Vancouver, ISO, and other styles
38

Stoyanchev, Svetlana, and Amanda J. Stent. "Concept Type Prediction and Responsive Adaptation in a Dialogue System." Dialogue & Discourse 3, no. 1 (2012): 1–31. http://dx.doi.org/10.5087/dad.2012.101.

Full text
Abstract:
Responsive adaptation in spoken dialog systems involves a change in dialog system behavior in response to a user or a dialog situation. In this paper we address responsive adaptation in the automatic speech recognition (ASR) module of a spoken dialog system. We hypothesize that information about the content of a user utterance may help improve speech recognition for the utterance. We use a two-step process to test this hypothesis: first, we automatically predict the task-relevant concept types likely to be present in a user utterance using features from the dialog context and from the output o
APA, Harvard, Vancouver, ISO, and other styles
39

Behre, Piyush, Sharman Tan, Padma Varadharajan, and Shuangyu Chang. "Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition." International Journal on Natural Language Computing 11, no. 6 (2022): 01–13. http://dx.doi.org/10.5121/ijnlc.2022.11601.

Full text
Abstract:
While speech recognition Word Error Rate (WER) has reached human parity for English, continuous speech recognition scenarios such as voice typing and meeting transcriptions still suffer from segmentation and punctuation problems, resulting from irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi-directional context, which is crucial for automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctua
APA, Harvard, Vancouver, ISO, and other styles
40

Piyush, Behre, Tan Sharman, Varadharajan Padma, and Chang Shuangyu. "Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition." International Journal on Natural Language Computing (IJNLC) 11, no. 6 (2023): 13. https://doi.org/10.5281/zenodo.7546332.

Full text
Abstract:
While speech recognition Word Error Rate (WER) has reached human parity for English, continuous speech recognition scenarios such as voice typing and meeting transcriptions still suffer from segmentation and punctuation problems, resulting from irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi-directional context, which is crucial for automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctua
APA, Harvard, Vancouver, ISO, and other styles
41

Wang, Bohan. "The application and challenges of artificial intelligence in speech recognition." Applied and Computational Engineering 17, no. 1 (2023): 36–40. http://dx.doi.org/10.54254/2755-2721/17/20230907.

Full text
Abstract:
This paper provides an overview of artificial intelligence (AI) and speech recognition technology, including its history, applications, challenges, and future prospects. AI-powered speech recognition technology has significantly improved over the years, and it is used in various applications, such as virtual assistants, voice-activated devices, and dictation software. The technology leverages machine learning algorithms that are trained on vast amounts of speech data to recognize and interpret human speech with accuracy levels that are comparable to those of humans. However, the technology sti
APA, Harvard, Vancouver, ISO, and other styles
42

Liao, Lyuchao, Francis Afedzie Kwofie, Zhifeng Chen, et al. "A Bidirectional Context Embedding Transformer for Automatic Speech Recognition." Information 13, no. 2 (2022): 69. http://dx.doi.org/10.3390/info13020069.

Full text
Abstract:
Transformers have become popular in building end-to-end automatic speech recognition (ASR) systems. However, transformer ASR systems are usually trained to give output sequences in the left-to-right order, disregarding the right-to-left context. Currently, the existing transformer-based ASR systems that employ two decoders for bidirectional decoding are complex in terms of computation and optimization. The existing ASR transformer with a single decoder for bidirectional decoding requires extra methods (such as a self-mask) to resolve the problem of information leakage in the attention mechanis
APA, Harvard, Vancouver, ISO, and other styles
43

Phaladi, Amanda, and Thipe Modipa. "The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition System." International Journal on Cybernetics & Informatics 13, no. 2 (2024): 33–44. http://dx.doi.org/10.5121/ijci.2024.130203.

Full text
Abstract:
Speech technology is a field that encompasses various techniques and tools used to enable machines to interact with speech, such as automatic speech recognition (ASR), spoken dialog systems, and others, allowing a device to capture spoken words through a microphone from a human speaker. End-to-end approaches such as Connectionist Temporal Classification (CTC) and attention-based methods are the most used for the development of ASR systems. However, these techniques were commonly used for research and development for many high-resourced languages with large amounts of speech data for training a
APA, Harvard, Vancouver, ISO, and other styles
44

Jeong, Jiho, S. I. M. M. Raton Mondol, Yeon Wook Kim, and Sangmin Lee. "An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech." Electronics 10, no. 7 (2021): 807. http://dx.doi.org/10.3390/electronics10070807.

Full text
Abstract:
The automatic speech recognition (ASR) model usually requires a large amount of training data to provide better results compared with the ASR models trained with a small amount of training data. It is difficult to apply the ASR model to non-standard speech such as that of cochlear implant (CI) patients, owing to privacy concerns or difficulty of access. In this paper, an effective finetuning and augmentation ASR model is proposed. Experiments compare the character error rate (CER) after training the ASR model with the basic and the proposed method. The proposed method achieved a CER of 36.03%
APA, Harvard, Vancouver, ISO, and other styles
45

Ding, Ing-Jr, and Yen-Ming Hsu. "An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition." Mathematical Problems in Engineering 2014 (2014): 1–8. http://dx.doi.org/10.1155/2014/898729.

Full text
Abstract:
In the past, the kernel of automatic speech recognition (ASR) is dynamic time warping (DTW), which is feature-based template matching and belongs to the category technique of dynamic programming (DP). Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is esse
APA, Harvard, Vancouver, ISO, and other styles
46

Phuengrod, Siriluk, Panita Wannapiroon, and Prachyanun Nilsook. "Distributed Communicative Language Training Platform Using Automatic Speech Recognition Technology for Smart University." International Journal of Emerging Technologies in Learning (iJET) 18, no. 24 (2023): 96–111. http://dx.doi.org/10.3991/ijet.v18i24.40619.

Full text
Abstract:
The purpose of this research is to achieve the following objectives: 1) Synthesize documents and international research on the characteristics of a smart university. 2) Synthesize the processes of distributed communicative language training (DCLT). 3) Design the system architecture of a DCLT platform that utilizes automatic speech recognition (ASR) technology for a smart university. 4) Evaluate the appropriateness of a DCLT platform that utilizes ASR technology for a smart university. Nine experts were selected for this research. They were required to have more than five years of relevant expe
APA, Harvard, Vancouver, ISO, and other styles
47

Truong, Do Quoc, Pham Ngoc Phuong, Tran Hoang Tung, and Luong Chi Mai. "DEVELOPMENT OF HIGH-PERFORMANCE AND LARGE-SCALE VIETNAMESE AUTOMATIC SPEECH RECOGNITION SYSTEMS." Journal of Computer Science and Cybernetics 34, no. 4 (2019): 335–48. http://dx.doi.org/10.15625/1813-9663/34/4/13165.

Full text
Abstract:
Automatic Speech Recognition (ASR) systems convert human speech into the corresponding transcription automatically. They have a wide range of applications such as controlling robots, call center analytics, voice chatbot. Recent studies on ASR for English have achieved the performance that surpasses human ability. The systems were trained on a large amount of training data and performed well under many environments. With regards to Vietnamese, there have been many studies on improving the performance of existing ASR systems, however, many of them are conducted on a small-scaled data, which does
APA, Harvard, Vancouver, ISO, and other styles
48

Ghai, Wiqas, and Navdeep Singh. "Phone based acoustic modeling for automatic speech recognition for Punjabi language." Journal of Speech Sciences 3, no. 1 (2021): 68–83. http://dx.doi.org/10.20396/joss.v3i1.15040.

Full text
Abstract:
Punjabi language is a tonal language belonging to an Indo-Aryan language family and has a number of speakers all around the world. Punjabi language has gained acceptability in the media &amp; communication and therefore deserves to have a place in the growing field of automatic speech recognition which has been explored already for a number of other Indian and foreign languages successfully. Some work has been done in the field of isolated word speech recognition for Punjabi language, but only using whole word based acoustic models. A phone based approach has yet to be applied for Punjabi lang
APA, Harvard, Vancouver, ISO, and other styles
49

Marini, Marco, Nicola Vanello, and Luca Fanucci. "Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria." Sensors 21, no. 19 (2021): 6460. http://dx.doi.org/10.3390/s21196460.

Full text
Abstract:
Within the field of Automatic Speech Recognition (ASR) systems, facing impaired speech is a big challenge because standard approaches are ineffective in the presence of dysarthria. The first aim of our work is to confirm the effectiveness of a new speech analysis technique for speakers with dysarthria. This new approach exploits the fine-tuning of the size and shift parameters of the spectral analysis window used to compute the initial short-time Fourier transform, to improve the performance of a speaker-dependent ASR system. The second aim is to define if there exists a correlation among the
APA, Harvard, Vancouver, ISO, and other styles
50

Lin, Yu-Yi, Wei-Zhong Zheng, Wei Chung Chu, et al. "A Speech Command Control-Based Recognition System for Dysarthric Patients Based on Deep Learning Technology." Applied Sciences 11, no. 6 (2021): 2477. http://dx.doi.org/10.3390/app11062477.

Full text
Abstract:
Voice control is an important way of controlling mobile devices; however, using it remains a challenge for dysarthric patients. Currently, there are many approaches, such as automatic speech recognition (ASR) systems, being used to help dysarthric patients control mobile devices. However, the large computation power requirement for the ASR system increases implementation costs. To alleviate this problem, this study proposed a convolution neural network (CNN) with a phonetic posteriorgram (PPG) speech feature system to recognize speech commands, called CNN–PPG; meanwhile, the CNN model with Mel
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!