Log in

Relevant bibliographies by topics / Speech recognition (SR) / Journal articles

To see the other types of publications on this topic, follow the link: Speech recognition (SR).

Journal articles on the topic 'Speech recognition (SR)'

Author: Grafiati

Published: 2 June 2025

Last updated: 5 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speech recognition (SR).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Fager, Susan. "Speech Recognition as a Practice Tool for Dysarthria." Seminars in Speech and Language 38, no. 03 (2017): 220–28. http://dx.doi.org/10.1055/s-0037-1602841.

Full text

Abstract:

AbstractRecovery of speech in dysarthria requires an extensive amount of time and practice. Speech recognition (SR) technology may support long-term practice and speech recovery efforts for individuals with dysarthria. However, SR technology development has been focused on typical (neurologically intact) speakers to support writing. This article describes the history and development of SR technology, how it has been used by individuals with dysarthria, and includes a case study illustration of the use of a novel SR technology as a speech practice tool. Case study participants included two individuals with differing onsets and dysarthria due to traumatic brain injury. Results indicated that both were able to make acoustic/perceptual changes during speech practice sessions, and one participant demonstrated generalization of changes to habitual speech. Limitations and future directions of current SR technology as a speech practice tool are discussed.

APA, Harvard, Vancouver, ISO, and other styles

2

Gao, Bo, Zi Ming Kou, and Hong Wei Yan. "Research on Speaker Recognition Based on Wavelet Analysis and Search Tree." Advanced Materials Research 159 (December 2010): 68–71. http://dx.doi.org/10.4028/www.scientific.net/amr.159.68.

Full text

Abstract:

Speaker Recognition (SR) is an important branch of speech recognition. The current speech signal processing in SR uses short-time processing technique, namely assuming speech signals are short-time stationary. But in fact, the speech signal is non-stationary. The wavelet analysis is a kind of new analyzing tool and is suitable for analyzing non-stationary signal, which has achieved impressive results in the field of signal coding. Based on this, the wavelet analysis theory was introduced into SR research to improve the traditional speech segmentation methods and characteristics parameters. In order to speed the recognition, a kind of SR model based on search tree was also brought out.

APA, Harvard, Vancouver, ISO, and other styles

3

Venkatagiri, H. S. "Speech Recognition Technology Applications in Communication Disorders." American Journal of Speech-Language Pathology 11, no. 4 (2002): 323–32. http://dx.doi.org/10.1044/1058-0360(2002/037).

Full text

Abstract:

Speech recognition (SR) is the process whereby a microprocessor-based system, typically a computer with sound processing hardware and speech recognition software, responds in predictable ways to spoken commands and/or converts speech into text. This tutorial describes the types and the general uses of SR and provides an explanation of the technology behind it. The emerging applications of SR technology for dictation, articulation training, language and literacy development, environmental control, and communication augmentation are discussed.

APA, Harvard, Vancouver, ISO, and other styles

4

Shreya, Krishna, G. Shalini, U. P. Sannidhi, and N. SwathiY. "Robust Speech Recognition Techniques for Noisy Surroundings." Journal of Research and Review in Quantum Computing 1, no. 1 (2025): 1–7. https://doi.org/10.5281/zenodo.14870245.

Full text

Abstract:

<em>In modern years, the accuracy and effectiveness of speech recognition (SR) is the hottest research area by far. It has been tough-going, playing an increasingly important role in many real-world applications.  This method is being practiced in medicine for a several years as they are using it for sake of hearing aids where the performance and advancements has been drastically improved. This study provides a detailed analysis of diverse speech recognition test included two test contents and two auditory environments: quiet and noise. This research paper focuses on analyzing and improving speech recognition (SR) systems, particularly in noisy environments. It reviews traditional methods like noise reduction and adaptive filtering, alongside deep learning techniques such as convolutional and recurrent neural networks. The study examines the performance of SR systems in various scenarios, including quiet and noisy settings, and explores real-world applications in fields like transportation, offices, and hearing aids. Key challenges, such as handling noise and optimizing signal modelling, are highlighted, with experiments showing a 30% error reduction compared to state-of-the-art systems</em>.

APA, Harvard, Vancouver, ISO, and other styles

5

Nikita, Dhanvijay *. Prof. P. R. Badadapure. "HINDI SPEECH RECOGNITION TECHNIQUE USING HTK." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 5, no. 12 (2016): 530–36. https://doi.org/10.5281/zenodo.203964.

Full text

Abstract:

This paper is based on SR system. In the speech recognition process computer takes a voice signal which is recorded using a microphone and converted into words in real-time. This SR system has been developed using different feature extraction techniques which include MFCC, HMM. All are used as the classifier. ASR i.e. automated speech recognition is program or we can called it as a machine, and it has ability to recognize the voice signal (speech signal or voice commands) or take dictation which involves the ability to match a voice pattern opposite to a given vocabulary. HTK i.e. The Hidden Markov model Toolkit is used to develop the SR System. HMM consist of the Acoustic word model which is used to recognize the isolated word. In this paper, we collect Hindi database, with a vocabulary size a bit extended. HMM has been implemented using the HTK Toolkit.

APA, Harvard, Vancouver, ISO, and other styles

6

Sumit Srivastava, Khushboo Jha, Aruna Jain. "Analysis of Human Voice for Speaker Recognition: Concepts and Advancement." Journal of Electrical Systems 20, no. 1s (2024): 582–99. http://dx.doi.org/10.52783/jes.806.

Full text

Abstract:

Human voice or speech is a contactless, non-invasive biometric trait for human recognition, easy to use with minimal computer complexity and inexpensive to implement. Speaker recognition (SR) has turned out to be a magnificent approach using speech as the central premise since decades. Its broad range of usages, like forensic speech verification to identify culprits by law enforcement authorities and access control to mobile banking, mobile shopping, etc., has made it a lucrative area of research. Also, the ease of use and dependability of SR will significantly assist people with disabilities in securely accessing and reaping the benefits of digital-era services. Additionally, the emergence of numerous deep learning methods for feature extraction and classification, has helped SR to achieve tremendous progress. This paper presents a comprehensive study on the progression of SR for decades till the present, including integration with Blockchain and challenges. It covers most of the factors that influence SR performance such as fundamentals and structure of SR, different speech pre-processing techniques, various speech features, feature extraction techniques, traditional and neural network-based classification techniques and deep learning-based SR toolkits. As a consequence, in this digital Blockchain era, it will help to design robust and reliable recognition-based services for mankind.

APA, Harvard, Vancouver, ISO, and other styles

7

K, Khadar Nawas. "A REVIEW ON MULTIMODAL SPEAKER RECOGNITION." Asian Journal of Pharmaceutical and Clinical Research 10, no. 13 (2017): 382. http://dx.doi.org/10.22159/ajpcr.2017.v10s1.19761.

Full text

Abstract:

A review on multimodal speaker recognition (SR) is being presented. For many decades the speaker recognition has been studied and still it has grabbed the interest of many researchers. Speaker recognition includes of two levels –system training and system testing. The robustness of the speaker recognition system depends on the training environment and testing environment as well as the quality of speech .Air conducted (AC) Speech is a source from which speaker is recognized by extracting the features. The performance of the speaker recognition system depends on AC speech. further to improve the robustness and accuracy of the SR system various other sources(Modals) like Throat Microphone ,Bone Conduction Microphone, array of microphones,Non Audible murmur, non auditory information like video are used in complementary with standard AC microphone. This paper is purely a review on SR and various complimentary modals.

APA, Harvard, Vancouver, ISO, and other styles

8

Takao, Toshitatsu, Ryo Masumura, Sumitaka Sakauchi, et al. "New report preparation system for endoscopic procedures using speech recognition technology." Endoscopy International Open 06, no. 06 (2018): E676—E687. http://dx.doi.org/10.1055/a-0579-6494.

Full text

Abstract:

Abstract Background and study aims We developed a new reporting system based on structured data entry, which selectively extracts only endoscopic findings from endoscopists’ oral statements and automatically inputs them into appropriate columns in real time during endoscopic procedures. Methods We compared the time for endoscopic procedures and report preparation (ER time) by using an esophagogastroduodenoscopy simulator in three groups: one preparing reports using a mouse after endoscopic procedures (CE group); a second group preparing reports by using voice alone during endoscopic procedures (SR group); and the final group preparing reports by operating the system with a foot switch and inputting findings using voice during endoscopic procedures (SR + FS group). For the SR and SR + FS groups, we identified the recognition rates of the speech recognition system. Results Mean ER times for cases with three findings each were 162, 130 and 119 seconds in the CE, SR and SR + FS groups, respectively. The mean ER times for cases with six findings each were 220, 144 and 128 seconds, respectively. The times in the SR and SR + FS groups were significantly shorter than that in the CE group (P < 0.017). The recognition rate of the SR group for cases with three findings each was 98.4 %, and 97.6 % in the same group for cases with six findings each. The rates in the SR + FS group were 95.2 % and 98.4 %, respectively. Conclusion Our reporting system was demonstrated to allow an endoscopist to efficiently complete the report in real time during endoscopic procedures.

APA, Harvard, Vancouver, ISO, and other styles

9

Hodgson, Tobias, and Enrico Coiera. "Risks and benefits of speech recognition for clinical documentation: a systematic review." Journal of the American Medical Informatics Association 23, e1 (2015): e169-e179. http://dx.doi.org/10.1093/jamia/ocv152.

Full text

Abstract:

Abstract Objective To review literature assessing the impact of speech recognition (SR) on clinical documentation. Methods Studies published prior to December 2014 reporting clinical documentation using SR were identified by searching Scopus, Compendex and Inspect, PubMed, and Google Scholar. Outcome variables analyzed included dictation and editing time, document turnaround time (TAT), SR accuracy, error rates per document, and economic benefit. Twenty-three articles met inclusion criteria from a pool of 441. Results Most studies compared SR to dictation and transcription (DT) in radiology, and heterogeneity across studies was high. Document editing time increased using SR compared to DT in four of six studies (+1876.47% to –16.50%). Dictation time similarly increased in three of five studies (+91.60% to –25.00%). TAT consistently improved using SR compared to DT (16.41% to 82.34%); across all studies the improvement was 0.90% per year. SR accuracy was reported in ten studies (88.90% to 96.00%) and appears to improve 0.03% per year as the technology matured. Mean number of errors per report increased using SR (0.05 to 6.66) compared to DT (0.02 to 0.40). Economic benefits were poorly reported. Conclusions SR is steadily maturing and offers some advantages for clinical documentation. However, evidence supporting the use of SR is weak, and further investigation is required to assess the impact of SR on documentation error types, rates, and clinical outcomes.

APA, Harvard, Vancouver, ISO, and other styles

10

Hodgson, Tobias, Farah Magrabi, and Enrico Coiera. "Efficiency and safety of speech recognition for documentation in the electronic health record." Journal of the American Medical Informatics Association 24, no. 6 (2017): 1127–33. http://dx.doi.org/10.1093/jamia/ocx073.

Full text

Abstract:

Abstract Objective To compare the efficiency and safety of using speech recognition (SR) assisted clinical documentation within an electronic health record (EHR) system with use of keyboard and mouse (KBM). Methods Thirty-five emergency department clinicians undertook randomly allocated clinical documentation tasks using KBM or SR on a commercial EHR system. Tasks were simple or complex, and with or without interruption. Outcome measures included task completion times and observed errors. Errors were classed by their potential for patient harm. Error causes were classified as due to IT system/system integration, user interaction, comprehension, or as typographical. User-related errors could be by either omission or commission. Results Mean task completion times were 18.11% slower overall when using SR compared to KBM (P = .001), 16.95% slower for simple tasks (P = .050), and 18.40% slower for complex tasks (P = .009). Increased errors were observed with use of SR (KBM 32, SR 138) for both simple (KBM 9, SR 75; P &lt; 0.001) and complex (KBM 23, SR 63; P &lt; 0.001) tasks. Interruptions did not significantly affect task completion times or error rates for either modality. Discussion For clinical documentation, SR was slower and increased the risk of documentation errors, including errors with the potential to cause clinical harm compared to KBM. Some of the observed increase in errors may be due to suboptimal SR to EHR integration and workflow. Conclusion Use of SR to drive interactive clinical documentation in the EHR requires careful evaluation. Current generation implementations may require significant development before they are safe and effective. Improving system integration and workflow, as well as SR accuracy and user-focused error correction strategies, may improve SR performance.

APA, Harvard, Vancouver, ISO, and other styles

11

Karbasi, Zahra, Kambiz Bahaadinbeigy, Leila Ahmadian, Reza Khajouei, and Moghaddameh Mirzaee. "Accuracy of Speech Recognition System’s Medical Report and Physicians' Experience in Hospitals." Frontiers in Health Informatics 8, no. 1 (2019): 19. http://dx.doi.org/10.30699/fhi.v8i1.199.

Full text

Abstract:

Introduction: Speech recognition(SR) technology has been existing for more than two decades. But, it has been rarely used in health care institutions and not applied uniformly in all the clinical domains. The aim of this study was to investigate the accuracy of speech recognition system in four different situations in the real environment of health services. We also report physicians' experience of using speech recognition technology.Method:. To do this study, NEVISA SR software professional v.3 was installed on the computers of expert physicians. The pre-designated medical report was tested by the physicians in four different modes including slow expression in a silent environment, slow expression in crowded environments, rapid expression in a silent environment and rapid expression in a busy environment. After using the speech recognition software by 15 physicians in hospitals, a designed questionnaire was distributed among them. .Results: The results showed that the highest average accuracy of speech recognition software was in the silent environment by slow expression and the minimum average accuracy was in the busy environment by rapid expression. Of all the participants in the study, 53.3% of the physicians believed that the use of speech recognition system promoted the workflow.Conclusion: We found that software accuracy was generally higher than the expectation and its use required to upgrade the system and its operation. In order to achieve the highest level of recognition rate and error reduction by speech recognition, influential factors such as environmental noise, type of software or hardware, training and experience of participants can be also considered.

APA, Harvard, Vancouver, ISO, and other styles

12

Blackley, Suzanne V., Jessica Huynh, Liqin Wang, Zfania Korach, and Li Zhou. "Speech recognition for clinical documentation from 1990 to 2018: a systematic review." Journal of the American Medical Informatics Association 26, no. 4 (2019): 324–38. http://dx.doi.org/10.1093/jamia/ocy179.

Full text

Abstract:

Abstract Objective The study sought to review recent literature regarding use of speech recognition (SR) technology for clinical documentation and to understand the impact of SR on document accuracy, provider efficiency, institutional cost, and more. Materials and Methods We searched 10 scientific and medical literature databases to find articles about clinician use of SR for documentation published between January 1, 1990, and October 15, 2018. We annotated included articles with their research topic(s), medical domain(s), and SR system(s) evaluated and analyzed the results. Results One hundred twenty-two articles were included. Forty-eight (39.3%) involved the radiology department exclusively and 10 (8.2%) involved emergency medicine; 10 (8.2%) mentioned multiple departments. Forty-eight (39.3%) articles studied productivity; 20 (16.4%) studied the effect of SR on documentation time, with mixed findings. Decreased turnaround time was reported in all 19 (15.6%) studies in which it was evaluated. Twenty-nine (23.8%) studies conducted error analyses, though various evaluation metrics were used. Reported percentage of documents with errors ranged from 4.8% to 71%; reported word error rates ranged from 7.4% to 38.7%. Seven (5.7%) studies assessed documentation-associated costs; 5 reported decreases and 2 reported increases. Many studies (44.3%) used products by Nuance Communications. Other vendors included IBM (9.0%) and Philips (6.6%); 7 (5.7%) used self-developed systems. Conclusion Despite widespread use of SR for clinical documentation, research on this topic remains largely heterogeneous, often using different evaluation metrics with mixed findings. Further, that SR-assisted documentation has become increasingly common in clinical settings beyond radiology warrants further investigation of its use and effectiveness in these settings.

APA, Harvard, Vancouver, ISO, and other styles

13

Tran, Van-An, Dinh-Son Le, Ha Huy Hung, and Dinh Quan Nguyen. "Improving the Accuracy of Speech Recognition Models for Non-Native English Speakers using Bag-of-Words and Deep Neural Networks." Scientific Review, no. 92 (May 13, 2023): 10–14. http://dx.doi.org/10.32861/sr.91.10.14.

Full text

Abstract:

This letter presents a novel error correction module using a Bag-of-Words model and deep neural networks to improve the accuracy of cloud-based speech-to-text services on recognition tasks of non-native speakers with foreign accents. The Bag-of-Words model transforms text into input vectors for the deep neural network, which is trained using typical sentences in the curriculum for elementary schools in Vietnam and the Google Speech-to-Text data for those sentences. The trained network is then used for real-time error correction on a humanoid robot and yields 18% better accuracy than Google Speech-to-Text.

APA, Harvard, Vancouver, ISO, and other styles

14

Hodgson, Tobias, Farah Magrabi, and Enrico Coiera. "Evaluating the Efficiency and Safety of Speech Recognition within a Commercial Electronic Health Record System: A Replication Study." Applied Clinical Informatics 09, no. 02 (2018): 326–35. http://dx.doi.org/10.1055/s-0038-1649509.

Full text

Abstract:

Objective To conduct a replication study to validate previously identified significant risks and inefficiencies associated with the use of speech recognition (SR) for documentation within an electronic health record (EHR) system. Methods Thirty-five emergency department clinicians undertook randomly allocated clinical documentation tasks using keyboard and mouse (KBM) or SR using a commercial EHR system. The experiment design, setting, and tasks (E2) replicated an earlier study (E1), while technical integration issues that may have led to poorer SR performance were addressed. Results Complex tasks were significantly slower to complete using SR (16.94%) than KBM (KBM: 191.9 s, SR: 224.4 s; p = 0.009; CI, 11.9–48.3), replicating task completion times observed in the earlier experiment. Errors (non-typographical) were significantly higher with SR compared with KBM for both simple (KBM: 3, SR: 84; p < 0.001; CI, 1.5–2.5) and complex tasks (KBM: 23, SR: 53; p = 0.001; CI, 0.5–1.0), again replicating earlier results (E1: 170, E2: 163; p = 0.660; CI, 0.0–0.0). Typographical errors were reduced significantly in the new study (E1: 465, E2: 150; p < 0.001; CI, 2.0–3.0). Discussion The results of this study replicate those reported earlier. The use of SR for clinical documentation within an EHR system appears to be consistently associated with decreased time efficiencies and increased errors. Modifications implemented to optimize SR integration in the EHR seem to have resulted in minor improvements that did not fundamentally change overall results. Conclusion This replication study adds further evidence for the poor performance of SR-assisted clinical documentation within an EHR. Replication studies remain rare in informatics literature, especially where study results are unexpected or have significant implication; such studies are clearly needed to avoid overdependence on the results of a single study.

APA, Harvard, Vancouver, ISO, and other styles

15

Viccaro, Elizabeth, Elaine Sands, and Carolyn Springer. "Spaced Retrieval Using Static and Dynamic Images to Improve Face–Name Recognition: Alzheimer's Dementia and Vascular Dementia." American Journal of Speech-Language Pathology 28, no. 3 (2019): 1184–97. http://dx.doi.org/10.1044/2019_ajslp-18-0131.

Full text

Abstract:

Purpose The primary objective of this study examined whether spaced retrieval (SR) using dynamic images (video clips without audio) is more effective than SR using static images to improve face–name recognition in persons with dementia. A secondary objective examined the length of time associations were retained after participants reached criterion. A final objective sought to determine if there is a relationship between SR training and dementia diagnosis. Method A repeated-measures design analyzed whether SR using dynamic images was more effective than SR using static images for face–name recognition. Twelve participants diagnosed with Alzheimer's dementia or vascular dementia were randomly assigned to 2 experimental conditions in which the presentation of images was counterbalanced. Results All participants demonstrated improvement in face–name recognition; there was no significant difference between the dynamic and static images. Eleven of 12 participants retained the information from 1 to 4 weeks post training. Additional analysis revealed a significant interaction effect when diagnoses and images were examined together. Participants with vascular dementia demonstrated improved performance using SR with static images, whereas participants with Alzheimer's dementia displayed improved performance using SR with dynamic images. Conclusions SR using static and/or dynamic images improved face–name recognition in persons with dementia. Further research is warranted to continue exploration of the relationship between dementia diagnosis and SR performance using static and dynamic images.

APA, Harvard, Vancouver, ISO, and other styles

16

Rudramurthy, M. S., Nilabh Kumar Pathak, V. Kamakshi Prasad, and R. Kumaraswamy. "Speaker Identification Using Empirical Mode Decomposition-Based Voice Activity Detection Algorithm under Realistic Conditions." Journal of Intelligent Systems 23, no. 4 (2014): 405–21. http://dx.doi.org/10.1515/jisys-2013-0089.

Full text

Abstract:

AbstractSpeaker recognition (SR) under mismatched conditions is a challenging task. Speech signal is nonlinear and nonstationary, and therefore, difficult to analyze under realistic conditions. Also, in real conditions, the nature of the noise present in speech data is not known a priori. In such cases, the performance of speaker identification (SI) or speaker verification (SV) degrades considerably under realistic conditions. Any SR system uses a voice activity detector (VAD) as the front-end subsystem of the whole system. The performance of most VADs deteriorates at the front end of the SR task or system under degraded conditions or in realistic conditions where noise plays a major role. Recently, speech data analysis and processing using Norden E. Huang’s empirical mode decomposition (EMD) combined with Hilbert transform, commonly referred to as Hilbert–Huang transform (HHT), has become an emerging trend. EMD is an a posteriori, adaptive, data analysis tool used in time domain that is widely accepted by the research community. Recently, speech data analysis and speech data processing for speech recognition and SR tasks using EMD have been increasing. EMD-based VAD has become an important adaptive subsystem of the SR system that mostly mitigates the effect of mismatch between the training and the testing phase. Recently, we have developed a VAD algorithm using a zero-frequency filter-assisted peaking resonator (ZFFPR) and EMD. In this article, the efficacy of an EMD-based VAD algorithm is studied at the front end of a text-independent language-independent SI task for the speaker’s data collected in three languages at five different places, such as home, street, laboratory, college campus, and restaurant, under realistic conditions using EDIROL-R09 HR, a 24-bit wav/MP3 recorder. The performance of this proposed SI task is compared against the traditional energy-based VAD in terms of percentage identification rate. In both cases, widely accepted Mel frequency cepstral coefficients are computed by employing frame processing (20-ms frame size and 10-ms frame shift) from the extracted voiced speech regions using the respective VAD techniques from the realistic speech utterances, and are used as a feature vector for speaker modeling using popular Gaussian mixture models. The experimental results showed that the proposed SI task with the VAD algorithm using ZFFPR and EMD at its front end performs better than the SI task with short-term energy-based VAD when used at its front end, and is somewhat encouraging.

APA, Harvard, Vancouver, ISO, and other styles

17

Minerbi, Amir, Markus Besemann, Tom Kari, Christina Gentile, and Gaurav Gupta. "Comparing the Efficiency of Software-Based Speech Recognition Versus Traditional Telephone Transcription in an Outpatient Physical Medicine and Rehabilitation Practice." Military Medicine 185, no. 7-8 (2020): e1183-e1186. http://dx.doi.org/10.1093/milmed/usz374.

Full text

Abstract:

ABSTRACT Introduction Speech recognition (SR) uses computerized word recognition software that automatically transcribes spoken words to written text. Some studies indicate that SR may improve efficiency of electronic charting as well as associated cost and turnaround time1,2, but it remains unclear in the literature whether SR is superior to traditional transcription (TT). This study compared the impact of report generation efficiency of SR to TT at the Canadian Armed Forces Health Services Centre. Materials and Methods Dragon Medical Dictation™ SR software and traditional telephone dictation TT were used for two prespecified clinical days per week. In order to adjust for note length, total transcription efficacy was calculated as follows: word count/[dictation time + correction time]. The means and standard deviations were then separately calculated for TT visits and for SR visits. Differences in transcription efficacy and in visit measures, including patient demographics, visit duration, number of issues raised during the visit, and interventions performed, were compared using ANOVA, with the significance level set to 0.05. Results A total of 340 consecutive visits were analyzed; 198 were dictated over the phone using TT and 142 were transcribed using SR software. Dictation efficacy was significantly higher (p &lt; 0.0001) for TT as compared to SR, while turnaround times were shorter for SR (0.12 versus 4.75 days). Conclusions In light of these results, the Canadian Forces Health Services Centre in Ottawa has returned to use of TT because the relative inefficiency of report generation was deemed to have a greater impact on clinical care when compared to slower dictation turnaround time.

APA, Harvard, Vancouver, ISO, and other styles

18

Chaudhary, Gopal, Smriti Srivastava, and Saurabh Bhardwaj. "Feature Extraction Methods for Speaker Recognition: A Review." International Journal of Pattern Recognition and Artificial Intelligence 31, no. 12 (2017): 1750041. http://dx.doi.org/10.1142/s0218001417500410.

Full text

Abstract:

This paper presents main paradigms of research for feature extraction methods to further augment the state of art in speaker recognition (SR) which has been recognized extensively in person identification for security and protection applications. Speaker recognition system (SRS) has become a widely researched topic for the last many decades. The basic concept of feature extraction methods is derived from the biological model of human auditory/vocal tract system. This work provides a classification-oriented review of feature extraction methods for SR over the last 55 years that are proven to be successful and have become the new stone to further research. Broadly, the review work is dichotomized into feature extraction methods with and without noise compensation techniques. Feature extraction methods without noise compensation techniques are divided into following categories: On the basis of high/low level of feature extraction; type of transform; speech production/auditory system; type of feature extraction technique; time variability; speech processing techniques. Further, feature extraction methods with noise compensation techniques are classified into noise-screened features, feature normalization methods, feature compensation methods. This classification-oriented review would endow the clear vision of readers to choose among different techniques and will be helpful in future research in this field.

APA, Harvard, Vancouver, ISO, and other styles

19

Isaac, Samson, Khalid Haruna, Muhammad Aminu Ahmad, and Rabi Mustapha. "DEEP REINFORCEMENT LEARNING WITH HIDDEN MARKOV MODEL FOR SPEECH RECOGNITION." JOURNAL OF TECHNOLOGY & INNOVATION 3, no. 1 (2023): 01–05. http://dx.doi.org/10.26480/jtin.01.2023.01.05.

Full text

Abstract:

Nowadays, many applications uses speech recognition especially the field of computer science and electronics, Speech Recognition (SR) is the interpretation of words spoken into a text. It is also known as Speech-To-Text (STT) or Automatic-Speech-Recognition(ASR), or just Word-Recognition(WR). The Hidden-Markov-Model (HMM) is a type of Markov model, which means that the future state of the model depends on the current state, not on the entire history of the system and the goal of HMM is to learn a sequence of hidden states from a set of known states. The Long-Short-Time-Memory (LSTM) network is a type of Recurrent Neural Network (RNN) that can learn long-term dependencies between time steps of sequence data. The LSTM network is trained by the network in order to predict the values of subsequent time steps in a series-to-series regression. Deep Neural Network (DNN) models are better classifiers than Gaussian Mixture Models (GMMs), they can generalize much better with a smaller number of parameters over complex distributions. They model distributions of different classes jointly, called “distributed” learning, or, more properly “tied” learning. This work is aimed at developing a speech recognition model that will predict isolated speech of some selected fruits in Hausa, Igbo and Yoruba language by using the predicting power of Mel-Frequency-Cepstral-Coefficient (MFCC), LSTM and HMM algorithms. The findings of the study would improve the development of better automatic speech applications systems and would benefit the academic and research community in the field of Natural Language Processing.

APA, Harvard, Vancouver, ISO, and other styles

20

Rista, Amarildo, and Arbana Kadriu. "A Model for Albanian Speech Recognition Using End-to-End Deep Learning Techniques." Interdisciplinary Journal of Research and Development 9, no. 3 (2022): 1. http://dx.doi.org/10.56345/ijrdv9n301.

Full text

Abstract:

End-to-end Automatic Speech Recognition (ASR) system folds the acoustic model (AM), language model (LM), and pronunciation model (PM) into a single neural network. The joint optimization of all these components optimizes performance of the model. In this paper, we introduce a model for Albanian speech recognition (SR) using end-to-end deep learning techniques. The two main modules that build this model are: Residual Convolutional Neural Networks (ResCNN), which aims to learn the relevant features and Bidirectional Recurrent Neural Networks (BiRNN) aiming to leverage the learned ResCNN audio features. To train and evaluate the model, we have built a corpus for Albanian Speech Recognition (CASR), which contains 100 hours of audio data along with their transcripts. During the design of the corpus we took into account the attributes of the speaker such as: age, gender, and accent, speed of utterance and dialect, so that it is as heterogeneous as possible. The evaluation of the model is done through word error rate (WER) and character error rate (CER) metrics. It achieves 5% WER and 1% CER.

APA, Harvard, Vancouver, ISO, and other styles

21

CHEN, QINGCAI, XIAOLONG WANG, PENGFEI SU, and YI YAO. "AUTO ADAPTED ENGLISH PRONUNCIATION EVALUATION: A FUZZY INTEGRAL APPROACH." International Journal of Pattern Recognition and Artificial Intelligence 22, no. 01 (2008): 153–68. http://dx.doi.org/10.1142/s0218001408006090.

Full text

Abstract:

To evaluate the pronunciation skills of spoken English is one of the key tasks for computer-aided spoken language learning (CALL). While most of the researchers focus on improving the speech recognition techniques to build a reliable evaluation system, another important aspect of this task has been ignored, i.e. the pronunciation evaluation model that integrates both the reliabilities of existing speech processing systems and the learner's pronunciation personalities. To take this aspect into consideration, a Sugeno integral-based evaluation model is introduced in this paper. At first, the English phonemes that are hard to be distinguished (HDP) for Chinese language learners are grouped into different HDP sets. Then, the system reliabilities for distinguishing the phonemes within a HDP set are computed from the standard speech corpus and are integrated with the phoneme recognition results under the Sugeno integral framework. The fuzzy measures are given for each subset of speech segments that contains n occurrences of phonemes within a HDP set. Rather than providing a quantity of scores, the linguistic descriptions of evaluation results are given by the model, which is more helpful for the users to improve their spoken language skills. To get a better performance, generic algorithm (GA)-based parameter optimization is also applied to optimize the model parameters. Experiments are conducted on the Sphinx-4 speech recognition platform. They show that, with 84.7% of average recognition rate of the SR system on standard speech corpus, our pronunciation evaluation model has got reasonable and reliable results for three kinds of test corpora.

APA, Harvard, Vancouver, ISO, and other styles

22

Billones, Robert Kerwin C., Elmer P. Dadios, and Edwin Sybingco. "Design and Development of an Artificial Intelligent System for Audio-Visual Cancer Breast Self-Examination." Journal of Advanced Computational Intelligence and Intelligent Informatics 20, no. 1 (2016): 124–31. http://dx.doi.org/10.20965/jaciii.2016.p0124.

Full text

Abstract:

This paper presents the development of a computer system for breast cancer awareness and education, particularly, in proper breast self-examination (BSE) performance. It includes the design and development of an artificial intelligent system (AIS) for audio-visual BSE which is capable of computer vision (CV), speech recognition (SR), speech synthesis (SS), and audio-visual (AV) feedback response. The AIS is named BEA, an acronym for Breast Examination Assistant, which acts like a virtual health care assistant that can assist a female user in performing proper BSE. BEA is composed of four interdependent modules: perception, memory, intelligence, and execution. Collectively, these modules are part of an intelligent operating architecture (IOA) that runs the BEA system. The methods of development of the individual subsystems (CV, SR, SS, and AV feedback) together with the intelligent integration of these components are discussed in the methodology section. Finally, the authors presented the results of the tests performed in the system.

APA, Harvard, Vancouver, ISO, and other styles

23

Cvietusa, P. J., D. J. Magid, G. Goodrich, et al. "A Speech Recognition (SR) Reminder System Improves Adherence to ICS Among Pediatric Asthma Patients." Journal of Allergy and Clinical Immunology 129, no. 2 (2012): AB142. http://dx.doi.org/10.1016/j.jaci.2011.12.476.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Mees, Inger M., Barbara Dragsted, Inge Gorm Hansen, and Arnt Lykke Jakobsen. "Sound effects in translation." Target. International Journal of Translation Studies 25, no. 1 (2013): 140–54. http://dx.doi.org/10.1075/target.25.1.11mee.

Full text

Abstract:

On the basis of a pilot study using speech recognition (SR) software, this paper attempts to illustrate the benefits of adopting an interdisciplinary approach in translator training. It shows how the collaboration between phoneticians, translators and interpreters can (1) advance research, (2) have implications for the curriculum, (3) be pedagogically motivating, and (4) prepare students for employing translation technology in their future practice as translators. In a twophase study in which 14 MA students translated texts in three modalities (sight, written, and oral translation using an SR program), Translog was employed to measure task times. The quality of the products was assessed by three experienced translators, and the number and types of misrecognitions were identified by a phonetician. Results indicate that SR translation provides a potentially useful supplement to written translation, or indeed an alternative to it.

APA, Harvard, Vancouver, ISO, and other styles

25

Nikita, Dhanvijay *. Prof. P. R. Badadapure. "HINDI SPEECH RECOGNITION SYSTEM USING MFCC AND HTK TOOLKIT." INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY 5, no. 12 (2016): 690–95. https://doi.org/10.5281/zenodo.212079.

Full text

Abstract:

This paper presents the approach for Hindi fruit name recognizer system. Every person has its uniqueness in his speech. So in this approach the database speech samples are collected from different 20 speakers with two iterations. These recordings are used to train by acoustic model. This model is trained on 20 speaker database having vocabulary size is 45 words. HTK toolkit is used to train the input data and evaluation of the results. The proposed system gives a recognition rate of 94.28% for sentence and 98.09 for word level.

APA, Harvard, Vancouver, ISO, and other styles

26

Kurbanazarova, Nargis, Dilnavoz Shavkidinova, Murodilla Khaydarov, et al. "Development of Speech Recognition in Wireless Mobile Networks for An Intelligent Learning System in Language Education." Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications 15, no. 3 (2024): 298–311. http://dx.doi.org/10.58346/jowua.2024.i3.020.

Full text

Abstract:

Communication has been crucial to human existence, society, and globalization for millennia. Speech Recognition (SR) technologies include biometric evaluation, security, safety, medical care, and smart cities. Most research has primarily focused on English; others neglect other lower-asset dialects like Uzbek, neglecting its research unaddressed. This study examines the efficacy of peer and ASR response in wireless mobile networks-assisted pronouncing training. This study proposes a Deep Neural Network (DNN) and Hidden Markov Method (HMM) based ASR model to develop a voice recognition system utilizing a combination of Connected Time-based Categorization (CTC)-attending networks for the Uzbek words and their variants. The suggested method diminishes training duration and enhances SR precision by efficiently employing the CTC goal function in attentiveness modeling. The research assessed the results of both linguistic experts and native speakers on the Uzbek database, which was compiled for this research. The data were gathered through a pronunciation assessment and a discussion. The participant was further instructed in the classroom. Test outcomes indicate that the suggested method attained a word error rate of 13.1%, utilizing 210 hours of records as a learning dataset for the Uzbek dialect. The proposed technique can significantly enhance students' pronunciation qualities. It might inspire pupils to participate in pronunciation learning.

APA, Harvard, Vancouver, ISO, and other styles

27

G. Mahdi, Mohamed, Ahmed Sleem, and Ibrahim Elhenawy. "Deep Learning Algorithms for Arabic Optical Character Recognition: A Survey." Multicriteria Algorithms with Applications 2 (January 26, 2024): 65–79. http://dx.doi.org/10.61356/j.mawa.2024.26861.

Full text

Abstract:

In recent years, deep learning has begun to supplant traditional machine learning algorithms in a variety of fields, including machine translation (MT), pattern recognition (PR), natural language processing (NLP), speech recognition (SR), and computer vision. Systems for optical character recognition (OCR) have recently been developed using deep learning techniques with great success. Within the area of pattern recognition and computer vision, the procedure of handwritten character recognition is still considered to be one of the most challenging. The height, orientation, and width of the handwritten characters do not always correspond with one another because different people use different writing instruments and have their own unique writing styles. This makes the job of handwritten recognition challenging and difficult. The regional languages of Arabic and Urdu have received less research. In this article, a summary and comparison of the most significant techniques of deep learning that are used in the recognition of Arabic-adapted scripts like Arabic and Urdu have been provided.

APA, Harvard, Vancouver, ISO, and other styles

28

Meng, Fanfei, and Yuxin Wang. "Transformers: Statistical interpretation, architectures and applications." Applied and Computational Engineering 43, no. 1 (2024): 193–210. http://dx.doi.org/10.54254/2755-2721/43/20230832.

Full text

Abstract:

Transformers have been widely recognized as powerful tools to analyze multiple tasks due to its state-of art multi-head attention spaces, such as Natural Language Processing (NLP), Computer Vision (CV) and Speech Recognition (SR). Inspired by its abundant designs and strong functions on analyzing input data, we would like to start from the various architectures, further proceed to the investigation on its statistical mechanism and inference and then introduce its applications on dominant tasks. The underlying statistical mechanisms arouse our interests and intrigue us to investigate it in a higher level, and this surveys will focus on its mathematical foundations and then use the principles to try to analyze the reasons for its excellent performance on many recognition scenarios.

APA, Harvard, Vancouver, ISO, and other styles

29

Zhao, Xiaoda, and Xiaoyan Jin. "Standardized Evaluation Method of Pronunciation Teaching Based on Deep Learning." Security and Communication Networks 2022 (March 7, 2022): 1–11. http://dx.doi.org/10.1155/2022/8961836.

Full text

Abstract:

With the advancement of globalization, an increasing number of people are learning and using a common language as a tool for international communication. However, there are clear distinctions between the native language and target language, especially in pronunciation, and the domestic target language, the learning environment is far from ideal, with few competent teachers. In addition, such learning cannot achieve computer-assisted language learning (CALL) technology. The efficient combination of computer technology and language teaching and learning methods provides a new solution to this problem. The core of CALL is speech recognition (SR) technology and speech evaluation technology. The development of deep learning (DL) has greatly promoted the development of speech recognition. The pronunciation resource collected from the Chinese college students, whose majors are language education or who are planning to obtain better pronunciation, shall be the research object of this paper. The study applies deep learning to the standard but of target language pronunciation and builds a standard evaluation model of pronunciation teaching based on the deep belief network (DBN). On this basis, this work improves the traditional pronunciation quality evaluation method, comprehensively considers intonation, speaking speed, rhythm, intonation, and other multi-parameter indicators and their weights, and establishes a reasonable and efficient pronunciation model. The systematic research results show that this article has theoretical and practical value in the field of phonetics education.

APA, Harvard, Vancouver, ISO, and other styles

30

Niu, Mengqi, Liang He, Zhihua Fang, Baowei Zhao, and Kai Wang. "Pseudo-Phoneme Label Loss for Text-Independent Speaker Verification." Applied Sciences 12, no. 15 (2022): 7463. http://dx.doi.org/10.3390/app12157463.

Full text

Abstract:

Compared with text-independent speaker verification (TI-SV) systems, text-dependent speaker verification (TD-SV) counterparts often have better performance for their efficient utilization of speech content information. On this account, some TI-SV methods tried to boost performance by incorporating an extra automatic speech recognition (ASR) component to explore content information, such as c-vector. However, the introduced ASR component requires a large amount of annotated data and consumes high computation resources. In this paper, we propose a pseudo-phoneme label (PPL) loss for the TI-SR task by integrating content cluster loss at the frame level and speaker recognition loss at the segment level in a unified network by multitask learning, without additional data requirement and exhausting computation. By referring to HuBERT, we generate pseudo-phoneme labels to adjust a frame level feature distribution by deep cluster to ensure each cluster corresponds to an implicit pronunciation unit in the feature space. We compare the proposed loss with the softmax loss, center loss, triplet loss, log-likelihood-ratio cost loss, additive margin softmax loss and additive angular margin loss on the VoxCeleb database. Experimental results demonstrate the effectiveness of our proposed method.

APA, Harvard, Vancouver, ISO, and other styles

31

Al-Karawi, Khamis A. "Robustness Speaker Recognition Based on Feature Space in Clean and Noisy Condition." International Journal of Sensors, Wireless Communications and Control 9, no. 4 (2019): 497–506. http://dx.doi.org/10.2174/2210327909666181219143918.

Full text

Abstract:

Background & Objective: Speaker Recognition (SR) techniques have been developed into a relatively mature status over the past few decades through development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security and forensics, robustness and reliability of the system are crucial. Methods: The background noise and reverberation as often occur in many real-world applications are known to compromise recognition performance. To improve the performance of speaker verification systems, an effective and robust technique is proposed to extract features for speech processing, capable of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most common features, which are used for speaker recognition. MFCCs are calculated from the log energies in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of Gammatone filters, which was originally suggested to model human cochlear filtering. This paper investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions. The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance have been taken into account in this work. Conclusion: Experimental results have shown significant improvement in system performance in terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation. Results of the study are also presented and discussed.

APA, Harvard, Vancouver, ISO, and other styles

32

Banks, Russell, Barry Greene, Isaiah Morrow, et al. "A QUICK NOVEL MOBILE HEARING IMPAIRMENT ASSESSMENT FOR DIGITAL SPEECH HEARING SCREENING." Innovation in Aging 8, Supplement_1 (2024): 1034. https://doi.org/10.1093/geroni/igae098.3329.

Full text

Abstract:

Abstract It is estimated that 1 in 4 people worldwide will be living with hearing impairment by 2050. We propose a digital Speech Hearing Screener (dSHS) using short nonsense word recognition to measure speech-hearing ability. We compare dSHS outcomes with standardized pure-tone averages (PTA) and speech-recognition thresholds (SRT). 50 participants aged 55 or older underwent PTA and SRT measurement. One-way ANOVA was used to compare differences between hearing impaired and hearing not-impaired groups, by the dSHS, with a clinical threshold of moderately impaired hearing at 35dB and severe hearing impairment at 50 dB. dSHS results significantly correlated with PTAs/SRTs. ANOVA results revealed the dSHS was significantly different (F(1,47)=38.1, p&lt; 0.001) between hearing impaired and unimpaired groups. Classification analysis using a 35dB threshold, yielded accuracy of 85.7% for PTA-based impairment and 81.6% for SR-based impairment. At a 50dB threshold, dSHS classification accuracy was 79.6% for PTA-based impairment (NPV-93%) and 83.7% (NPV-100%) for SRT-based impairment. The dSHS successfully differentiates between hearing-impaired and unimpaired individuals in under 3 minutes. This hearing screener offers a time-saving, in-clinic hearing screening to streamline the triage of those with likely hearing impairment to the appropriate follow-up assessment, thereby improving the quality of services. Additionally, this tool can help to rule out hearing impairment as a cause or confounder of cognitive impairment.

APA, Harvard, Vancouver, ISO, and other styles

33

Li, Wenjuan, and Fengkai Liu. "Exploration on College Ideological and Political Education Integrating Artificial Intelligence-Intellectualized Information Technology." Computational Intelligence and Neuroscience 2022 (May 18, 2022): 1–9. http://dx.doi.org/10.1155/2022/4844565.

Full text

Abstract:

In recent years, with the vigorous development and application of Artificial Intelligence (AI), the application of AI in education is becoming more and more extensive. This study makes a theoretical analysis of AI-Intellectualized Information Technology (IT). Discrete Cosine Transform (DCT)-Based Speech Recognition (SR) and Genetic Algorithm (GA)-Based Image Recognition (IR) are used to analyze the College Ideological and Political Education (IAPE). The research findings prove that the advantages of integrating AI-intellectualized IT on College IAPE outweigh the disadvantages. The improvement of technological development, which accounts for 71.17% of undergraduate gains, is the most significant, and the smallest gain is technology coverage, which is 36.80%. Overall, 57.21% are interested in new technology, and the students’ enthusiasm accounts for 30.77%. Most of the students focus on the innovation performance of technology, accounting for 75.92%. With an average influence of 89.04% on undergraduates, technology has the largest impact, followed by 85.78% on students with masters or higher degrees. The largest impact of diversified teaching methods for all students is 62.48%. This study provides some reference values for AI-intellectualized IT research and analysis, as well as students’ IAPE.

APA, Harvard, Vancouver, ISO, and other styles

34

Qafmolla, Nejla. "Automatic Language Identification." European Journal of Language and Literature 7, no. 1 (2017): 140. http://dx.doi.org/10.26417/ejls.v7i1.p140-150.

Full text

Abstract:

Automatic Language Identification (LID) is the process of automatically identifying the language of spoken utterance or written material. LID has received much attention due to its application to major areas of research and long-aspired dreams in computational sciences, namely Machine Translation (MT), Speech Recognition (SR) and Data Mining (DM). A considerable increase in the amount of and access to data provided not only by experts but also by users all over the Internet has resulted into both the development of different approaches in the area of LID – so as to generate more efficient systems – as well as major challenges that are still in the eye of the storm of this field. Despite the fact that the current approaches have accomplished considerable success, future research concerning some issues remains on the table. The aim of this paper shall not be to describe the historic background of this field of studies, but rather to provide an overview of the current state of LID systems, as well as to classify the approaches developed to accomplish them. LID systems have advanced and are continuously evolving. Some of the issues that need special attention and improvement are semantics, the identification of various dialects and varieties of a language, identification of spelling errors, data retrieval, multilingual documents, MT and speech-to-speech translation. Methods applied to date have been good from a technical point of view, but not from a semantic one.

APA, Harvard, Vancouver, ISO, and other styles

35

Adebayo, Akinsanya Atchrimi, Toyin Kareem Fatai, and Collins Asemota Ekue. "Artificial Intelligence (AI) in Multilingual Education: Investigating Its Potential in Supporting the Preservation and Instruction of Indigenous Languages in Formal Education." Journal of College of Languages and Communication Arts Education 2, no. 1 (2024): 194–205. https://doi.org/10.5281/zenodo.14635760.

Full text

Abstract:

<strong>Abstract</strong> Linguistic diversity is a fundamental aspect of human civilization, but many indigenous languages are at risk of extinction due to inadequate formal education and preservation efforts. In multilingual societies, especially in African countries, the decline of indigenous languages threatens cultural heritage and identity. This study examines the role of artificial intelligence (AI) in supporting the preservation and teaching of indigenous languages within formal education systems. The research explores how AI technologies such as Language Learning Applications (LLA), Speech Recognition (SR), and Machine Translation (MT) can contribute to the revitalization of endangered languages, promote linguistic diversity, and ensure cultural sustainability in educational contexts. The study highlights the potential of AI in overcoming barriers like limited resources and access to instructional support, while also addressing challenges related to technological limitations, cultural sensitivity, and unequal access to technology. It emphasizes the need for collaboration among policymakers, educators, technologists, and indigenous communities to create AI solutions that are effective and culturally appropriate. The findings offer valuable insights into the intersection of AI and multilingual education, proposing actionable strategies to preserve and promote indigenous languages for future generations. <strong>Keywords</strong>: Artificial Intelligence, Indigenous Languages, Language Preservation, Multilingual Education, Cultural Sustainability  

APA, Harvard, Vancouver, ISO, and other styles

36

Rudramurthy, M. S., V. Kamakshi Prasad, and R. Kumaraswamy. "Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm." Journal of Intelligent Systems 23, no. 4 (2014): 359–78. http://dx.doi.org/10.1515/jisys-2013-0085.

Full text

Abstract:

AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.

APA, Harvard, Vancouver, ISO, and other styles

37

Medina, Rosana, Ignacio Blanquer, Luis Martí-Bonmatí, and J. Damian Segrelles. "Increasing the Efficiency on Producing Radiology Reports for Breast Cancer Diagnosis by Means of Structured Reports." Methods of Information in Medicine 56, no. 03 (2017): 248–60. http://dx.doi.org/10.3414/me16-01-0091.

Full text

Abstract:

SummaryBackground: Radiology reports are commonly written on free-text using voice recognition devices. Structured reports (SR) have a high potential but they are usually considered more difficult to fill-in so their adoption in clinical practice leads to a lower efficiency. However, some studies have demonstrated that in some cases, producing SRs may require shorter time than plain-text ones. This work focuses on the definition and demonstration of a methodology to evaluate the productivity of software tools for producing radiology reports. A set of SRs for breast cancer diagnosis based on BI-RADS have been developed using this method. An analysis of their efficiency with respect to free-text reports has been performed.Material and Methods: The methodology proposed compares the Elapsed Time (ET) on a set of radiological reports. Free-text reports are produced with the speech recognition devices used in the clinical practice. Structured reports are generated using a web application generated with TRENCADIS framework. A team of six radiologists with three different levels of experience in the breast cancer diagnosis was recruited. These radiologists performed the evaluation, each one introducing 50 reports for mammography, 50 for ultrasound scan and 50 for MRI using both approaches. Also, the Relative Efficiency (REF) was computed for each report, dividing the ET of both methods. We applied the T-Student (T-S) test to compare the ETs and the ANOVA test to compare the REFs. Both tests were computed using the SPSS software.Results: The study produced three DICOM- SR templates for Breast Cancer Diagnosis on mammography, ultrasound and MRI, using RADLEX terms based on BIRADs 5th edition. The T-S test on radiologists with high or intermediate profile, showed that the difference between the ET was only statistically significant for mammography and ultrasound. The ANOVA test performed grouping the REF by modalities, indicated that there were no significant differences between mammograms and ultrasound scans, but both have significant statistical differences with MRI. The ANOVA test of the REF for each modality, indicated that there were only significant differences in Mammography (ANOVA p = 0.024) and Ultrasound (ANOVA p = 0.008). The ANOVA test for each radiologist profile, indicated that there were significant differences on the high profile (ANOVA p = 0.028) and medium (ANOVA p=0.045).Conclusions: In this work, we have defined and demonstrated a methodology to evaluate the productivity of software tools for producing radiology reports in Breast Cancer. We have evaluated that adopting Structured Reporting in mammography and ultrasound studies in breast cancer diagnosis improves the performance in producing reports.

APA, Harvard, Vancouver, ISO, and other styles

38

Holube, Inga, Stefan Taesler, Saskia Ibelings, Martin Hansen, and Jasper Ooster. "Automated Measurement of Speech Recognition, Reaction Time, and Speech Rate and Their Relation to Self-Reported Listening Effort for Normal-Hearing and Hearing-Impaired Listeners Using various Maskers." Trends in Hearing 28 (January 2024). http://dx.doi.org/10.1177/23312165241276435.

Full text

Abstract:

In speech audiometry, the speech-recognition threshold (SRT) is usually established by adjusting the signal-to-noise ratio (SNR) until 50% of the words or sentences are repeated correctly. However, these conditions are rarely encountered in everyday situations. Therefore, for a group of 15 young participants with normal hearing and a group of 12 older participants with hearing impairment, speech-recognition scores were determined at SRT and at four higher SNRs using several stationary and fluctuating maskers. Participants’ verbal responses were recorded, and participants were asked to self-report their listening effort on a categorical scale (self-reported listening effort, SR-LE). The responses were analyzed using an Automatic Speech Recognizer (ASR) and compared to the results of a human examiner. An intraclass correlation coefficient of r = .993 for the agreement between their corresponding speech-recognition scores was observed. As expected, speech-recognition scores increased with increasing SNR and decreased with increasing SR-LE. However, differences between speech-recognition scores for fluctuating and stationary maskers were observed as a function of SNR, but not as a function of SR-LE. The verbal response time (VRT) and the response speech rate (RSR) of the listeners’ responses were measured using an ASR. The participants with hearing impairment showed significantly lower RSRs and higher VRTs compared to the participants with normal hearing. These differences may be attributed to differences in age, hearing, or both. With increasing SR-LE, VRT increased and RSR decreased. The results show the possibility of deriving a behavioral measure, VRT, measured directly from participants’ verbal responses during speech audiometry, as a proxy for SR-LE.

APA, Harvard, Vancouver, ISO, and other styles

39

Ahmed, Laarfi, and Dr Kepuska Veton. "Implementation of a Verbal Compiler: The Need to Develop Audio Language to Keep Pace with Rapid Development becomes a Necessity." Global Journal of Human-Social Science, May 18, 2020, 1–11. http://dx.doi.org/10.34257/gjhssgvol20is4pg1.

Full text

Abstract:

This research paper aims to make essential developments in Speech Recognition(SR), the compiler gives the user a choice to choose the type o f output, whether it is textual or conversational (audio). Many large companies have developed such Speech Recognition Systems (SRS), especially the companies producing Smartphones, Computers, and Laptops. If translation is taken as a model application, they have not yet developed the perfect systems. The purpose of this paper is to add facilities to the Speech Recognition (SR) software so that it can deal with spoken languages.

APA, Harvard, Vancouver, ISO, and other styles

40

Devi K A, Dr Sumithra, Swet raj Shrivastava, Pranav Ranjan, and Romit Dev. "Human Action Recognition (HAR) and Speech Recognition (SR) using Data Science." Indian Journal of Computer Science and Technology, May 23, 2025, 206–9. https://doi.org/10.59256/indjcst.20250402026.

Full text

Abstract:

Human Action Recognition (HAR) and Speech Recognition are rapidly evolving fields within Data Science, significantly impacting applications in healthcare, security, human-computer interaction, and automation. This paper explores the methodologies, challenges, and advancements in these domains. Machine learning and deep learning models such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers play a crucial role in recognizing human activities and speech patterns.

APA, Harvard, Vancouver, ISO, and other styles

41

Jorg, Tobias, Benedikt Kämpgen, Dennis Feiler, et al. "Efficient structured reporting in radiology using an intelligent dialogue system based on speech recognition and natural language processing." Insights into Imaging 14, no. 1 (2023). http://dx.doi.org/10.1186/s13244-023-01392-y.

Full text

Abstract:

Abstract Background Structured reporting (SR) is recommended in radiology, due to its advantages over free-text reporting (FTR). However, SR use is hindered by insufficient integration of speech recognition, which is well accepted among radiologists and commonly used for unstructured FTR. SR templates must be laboriously completed using a mouse and keyboard, which may explain why SR use remains limited in clinical routine, despite its advantages. Artificial intelligence and related fields, like natural language processing (NLP), offer enormous possibilities to facilitate the imaging workflow. Here, we aimed to use the potential of NLP to combine the advantages of SR and speech recognition. Results We developed a reporting tool that uses NLP to automatically convert dictated free text into a structured report. The tool comprises a task-oriented dialogue system, which assists the radiologist by sending visual feedback if relevant findings are missed. The system was developed on top of several NLP components and speech recognition. It extracts structured content from dictated free text and uses it to complete an SR template in RadLex terms, which is displayed in its user interface. The tool was evaluated for reporting of urolithiasis CTs, as a use case. It was tested using fictitious text samples about urolithiasis, and 50 original reports of CTs from patients with urolithiasis. The NLP recognition worked well for both, with an F1 score of 0.98 (precision: 0.99; recall: 0.96) for the test with fictitious samples and an F1 score of 0.90 (precision: 0.96; recall: 0.83) for the test with original reports. Conclusion Due to its unique ability to integrate speech into SR, this novel tool could represent a major contribution to the future of reporting.

APA, Harvard, Vancouver, ISO, and other styles

42

Mr., Maindargi L.C., and Mantri D.B. Prof. "IMPLEMENTATION OF SPEECH RECOGNITION SYSTEM." JournalNX - A Multidisciplinary Peer Reviewed Journal VESCCOMM-2016 (February 12, 2016). https://doi.org/10.5281/zenodo.1472060.

Full text

Abstract:

The aim of this paper is to express the accuracy and time results of speech recognition (SR) system, based on Mel- Frequency Cepstral Coefficients (MFCC). The numbers of speech files were considered for the experimentation, MFCC were extracted and coefficients were statistically analyzed. Audio database is used for training and testing of the algorithm. Also Gaussian filters have been replaced with triangular filters to achieve higher level of accuracy. The speech files of about 2 second duration are to be given as input to training and testing unit. The results will be checked for a number of speech files. The accuracy achieved by the proposed approach is expected higher than previous systems under study and can be implemented using Matlab as the programming tool. https://journalnx.com/journal-article/20150042

APA, Harvard, Vancouver, ISO, and other styles

43

Ok, Min Wook, Kavita Rao, Jon Pennington, and Paula R. Ulloa. "Speech Recognition Technology for Writing: Usage Patterns and Perceptions of Students with High Incidence Disabilities." Journal of Special Education Technology, December 17, 2020, 016264342097992. http://dx.doi.org/10.1177/0162643420979929.

Full text

Abstract:

This exploratory study examined the usage of speech recognition (SR) technology by students with high incidence disabilities in grades 4–8 and student and teacher perceptions of using SR as part of the writing process. The study also examined factors contributing to students' use of SR and barriers to using this technology. Results indicated that students across all grades had positive perceptions about using SR, but younger students tended to use it more often. SR was especially helpful for students who struggled with spelling and supported some, but not all, students with drafting text. The study illustrated the importance of taking student variability into account in relation to affinity for SR usage. By integrating opportunities for using SR as part of writing instruction and guiding students to reflect on whether the technology is useful for their individual needs and preferences, teachers can help students with disabilities make choices to use SR in ways that are the most useful for their individual needs.

APA, Harvard, Vancouver, ISO, and other styles

44

Chen, Sijia, and Jan-Louis Kruger. "The effectiveness of computer-assisted interpreting." Translation and Interpreting Studies, December 5, 2022. http://dx.doi.org/10.1075/tis.21036.che.

Full text

Abstract:

Abstract Facing a new technological turn, the field of interpreting is in great need of evidence on the effectiveness of computer-assisted interpreting. This study proposes a computer-assisted consecutive interpreting (CACI) mode incorporating speech recognition (SR) and machine translation (MT). First, the interpreter listens to the source speech and respeaks it into an SR system, creating an SR text which is then processed by an MT system. Second, the interpreter produces a target speech with reference to the SR and MT texts. Six students participated in training on CACI, after which they performed consecutive interpreting in both the conventional and the new mode. The study finds that CACI featured fewer pauses and reduced cognitive load. Moreover, the overall interpreting quality, especially the accuracy, was increased. The effectiveness of the new mode is found to be modulated by the interpreting direction.

APA, Harvard, Vancouver, ISO, and other styles

45

"ASR System for Isolated Words using ANN with Back Propagation and Fuzzy based DWT." International Journal of Engineering and Advanced Technology 8, no. 6 (2019): 4813–19. http://dx.doi.org/10.35940/ijeat.f9110.088619.

Full text

Abstract:

Speech is the primary means through which human beings interact. Speech has become a way for Man Machine Interaction (MMI). The Speech Recognition (SR) systems have been widely used in smart phones to initiate searches or to type certain text messages, and in control devices to perform switch on or off functions etc. This system comprises three blocks: Pre-processing, Feature Extraction and Classification. The input speech signal is pre-processed to remove the noise and to convert it into a digital form for feature extraction. The feature extraction is a significant process during SR systems design because the features extracted form the basis for accurate recognition of the speech. Only a few features of this signal may be selected for classification purposes. For final recognition of the spoken word or the input signal, various optimization algorithms as classifiers are used. This paper presents an extensive literature review on SR Systems. The authors have attempted to do a brief survey to identify the progress in this field. The survey provides the reader with well-known methods used by previous researchers. It also compares the performance metrics for two ASR techniques developed by the authors. The first technique uses Artificial Neural Network with Back Propagation while the second uses Fuzzy based Discrete Wavelet Transform. It was found that the fuzzy based DWT system provided better results in terms of the performance metrics like accuracy, sensitivity, specificity and word error rate. The paper concludes by providing the reader with a direction of future scope in this research area.

APA, Harvard, Vancouver, ISO, and other styles

46

"ASR System for Isolated words using ANN with Back Propagation and Fuzzy based DWT." International Journal of Engineering and Advanced Technology 8, no. 6 (2019): 4878–84. http://dx.doi.org/10.35940/ijeat.f9130.088619.

Full text

Abstract:

Speech is the primary means through which human beings interact. Speech has become a way for Man Machine Interaction (MMI). The Speech Recognition (SR) systems have been widely used in smart phones to initiate searches or to type certain text messages, and in control devices to perform switch on or off functions etc. This system comprises three blocks: Pre-processing, Feature Extraction and Classification. The input speech signal is pre-processed to remove the noise and to convert it into a digital form for feature extraction. The feature extraction is a significant process during SR systems design because the features extracted form the basis for accurate recognition of the speech. Only a few features of this signal may be selected for classification purposes. For final recognition of the spoken word or the input signal, various optimization algorithms as classifiers are used. This paper presents an extensive literature review on SR Systems. The authors have attempted to do a brief survey to identify the progress in this field. The survey provides the reader with well-known methods used by previous researchers. It also compares the performance metrics for two ASR techniques developed by the authors. The first technique uses Artificial Neural Network with Back Propagation while the second uses Fuzzy based Discrete Wavelet Transform. It was found that the fuzzy based DWT system provided better results in terms of the performance metrics like accuracy, sensitivity, specificity and word error rate. The paper concludes by providing the reader with a direction of future scope in this research area

APA, Harvard, Vancouver, ISO, and other styles

47

Kang, Soojin, Jihwan Woo, Kyung Myun Lee, Hye Yoon Seol, Sung Hwa Hong, and Il Joon Moon. "Feasibility of an Objective Approach Using Acoustic Change Complex for Evaluating Spectral Resolution in Individuals with Normal Hearing and Hearing Loss." Journal of Integrative Neuroscience 24, no. 3 (2025). https://doi.org/10.31083/jin25911.

Full text

Abstract:

Background: Identifying the temporal and spectral information in sound is important for understanding speech; indeed, a person who has good spectral resolution usually shows good speech recognition performance. The spectral ripple discrimination (SRD) test is often used to behaviorally determine spectral resolution capacity. However, although the SRD test is useful, it is difficult to apply to populations who cannot execute the behavioral task, such as younger children and people with disabilities. In this study, an objective approach using spectral ripple (SR) stimuli to evoke the acoustic change complex (ACC) response was investigated to determine whether it could objectively evaluate the spectral resolution ability of subjects with normal hearing (NH) and those with hearing loss (HL). Method: Ten subjects with NH and eight with HL were enrolled in this study. All subjects completed the behavioral SRD test and the objective SR-ACC test. Additionally, the HL subjects completed speech perception performance tests while wearing hearing aids. Results: In the SRD test, the average thresholds were 6.48 and 1.52 ripples per octave (RPO) for the NH and HL groups, respectively, while in the SR-ACC test, they were 4.90 and 1.35 RPO, respectively. There was a significant difference in the average thresholds between the two groups for the SRD (p < 0.001) and the SR-ACC (p < 0.001) tests. A significant positive correlation was observed between the SRD and SR-ACC tests (ρ = 0.829, p < 0.001). In the HL group, there was a statistically significant relationship between speech recognition performance in noisy conditions and the SR-ACC threshold (ρ = 0.911, p < 0.001 in Sentence score of Korean Speech Audiometry (KSA)). Conclusions: The results supported the feasibility of the SR-ACC test to objectively evaluate auditory spectral resolution in individuals with HL. This test has potential for use in individuals with HL who are unable to complete the behavioral task associated with the SRD test; therefore, it is proposed as a more inclusive alternative to the SRD test.

APA, Harvard, Vancouver, ISO, and other styles

48

Chen, Sijia, and Jan-Louis Kruger. "Visual processing during computer-assisted consecutive interpreting." Interpreting. International Journal of Research and Practice in Interpreting, July 5, 2024. http://dx.doi.org/10.1075/intp.00104.che.

Full text

Abstract:

Abstract This study investigates the visual processing patterns during computer-assisted consecutive interpreting (CACI). In phase I of the proposed CACI workflow, the interpreter listens to the source speech and respeaks it into speech recognition (SR) software. In phase II, the interpreter produces target speech supported by the SR text and its machine translation (MT) output. A group of students performed CACI with their eye movements tracked. In phase I, the participants devoted the majority of their attention to listening and respeaking, with very limited attention distributed to the SR text. However, a positive correlation was found between the percentage of dwell time on the SR text and the quality of respeaking, which suggests that active monitoring could be important. In phase II, the participants devoted more visual attention to the MT text than to the SR text and engaged in deeper and more effortful processing when reading the MT text. We identified a positive correlation between the percentage of dwell time on the MT text and interpreting quality in the L2–L1 direction but not in the L1–L2 direction. These results contribute to our understanding of computer-assisted interpreting and can provide insights for future research and training in this area.

APA, Harvard, Vancouver, ISO, and other styles

49

"Advancements in speech recognition technology to rejuvenate interactive voice response systems market." Sensor Review 23, no. 2 (2003). http://dx.doi.org/10.1108/sr.2003.08723bab.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

"Creation and Instigation of Triphone based Big-Lexicon Speaker-Independent Continuous Speech Recognition Framework for Kannada Language." International Journal of Innovative Technology and Exploring Engineering 9, no. 2S (2019): 152–58. http://dx.doi.org/10.35940/ijitee.b1090.1292s19.

Full text

Abstract:

This paper proposes a framework that is intended to do the comparably accurate recognition of speech and in precise, continuous speech recognition (CSR) based on triphone modelling for Kannada dialect. For designing the proposed framework, the features from the speech data are obtained from the well-known feature extraction technique Mel-frequency cepstral coefficients (MFCC) and from its transformations, like, linear discriminant analysis (LDA) and maximum likelihood linear transforms (MLLT) are obtained from Kannada speech data files. At that point, the system is trained to evaluate the hidden Markov model (HMM) parameters for continuous speech (CS) data. The persistent Kannada speech information is gathered from 2600 speakers (1560 men and 1040women) of the age bunch in the scope of 14 years-80 years. The speech information is acquired from different geographical regions of the Karnataka (one of the 29 states situated in the southern part of India) state under degraded condition. It comprises of 21,551 words that spread 30 locales. The performance evaluation of both monophone and triphone models concerning word error rate (WER) is done and the obtained results are compared with the standard databases such as TIMIT and aurora4. A significant reduction in WER is obtained for triphone models. The speech recognition (SR) rate is verified for both offline and online recognition mode for all the speakers. The results reveal that the recognition rate (RR) for Kannada speech corpus has got a better improvement over the state-of-the-art existing databases.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!