Log in

Relevant bibliographies by topics / Whisper ASR / Journal articles

To see the other types of publications on this topic, follow the link: Whisper ASR.

Journal articles on the topic 'Whisper ASR'

Author: Grafiati

Published: 2 June 2025

Last updated: 31 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Whisper ASR.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Galić, Jovan, Branko Marković, Đorđe Grozdić, Branislav Popović, and Slavko Šajić. "Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering." Applied Sciences 14, no. 18 (2024): 8223. http://dx.doi.org/10.3390/app14188223.

Full text

Abstract:

Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolution

APA, Harvard, Vancouver, ISO, and other styles

2

Attia, Ahmed Adel, Jing Liu, Wei Ai, Dorottya Demszky, and Carol Espy-Wilson. "Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults." Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7 (October 16, 2024): 74–80. http://dx.doi.org/10.1609/aies.v7i1.31618.

Full text

Abstract:

Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data. However, this progress doesn’t readily extend to ASR for children due to the lim- ited availability of suitable child-specific databases and the distinct characteristics of children’s speech. A recent study investigated leveraging the My Science Tutor (MyST) chil- dren’s speech corpus to enhance Whisper’s performance in recognizing children’s speech. They were able to demon- strate some improvement

APA, Harvard, Vancouver, ISO, and other styles

3

Si, Mei, Omar Cobas, and Michael Fababeir. "Lexical Error Guard: Leveraging Large Language Models for Enhanced ASR Error Correction." Machine Learning and Knowledge Extraction 6, no. 4 (2024): 2435–46. http://dx.doi.org/10.3390/make6040120.

Full text

Abstract:

Error correction is a vital element in modern automatic speech recognition (ASR) systems. A significant portion of ASR error correction work is closely integrated within specific ASR systems, which creates challenges for adapting these solutions to different ASR frameworks. This research introduces Lexical Error Guard (LEG), which leverages the extensive pre-trained knowledge of large language models (LLMs) and employs instructional learning to create an adaptable error correction system compatible with various ASR platforms. Additionally, a parameter-efficient fine-tuning method is utilized u

APA, Harvard, Vancouver, ISO, and other styles

4

Papala, Gowtham, Aniket Ransing, and Pooja Jain. "Sentiment Analysis and Speaker Diarization in Hindi and Marathi Using using Finetuned Whisper." Scalable Computing: Practice and Experience 24, no. 4 (2023): 835–46. http://dx.doi.org/10.12694/scpe.v24i4.2248.

Full text

Abstract:

Automatic Speech Recognition (ASR) is a crucial technology that enables machines to automatically recognize human voices based on audio signals. In recent years, there has been a rigorous growth in the development of ASR models with the emergence of new techniques and algorithms. One such model is the Whisper ASR model developed by OpenAI, which is based on a Transformer encoder-decoder architecture and can handle multiple tasks such as language identification, transcription, and translation. However, there are still limitations to the Whisper ASR model, such as speaker diarization, summarizat

APA, Harvard, Vancouver, ISO, and other styles

5

Saraf, Aryan. "Multilingual Translation for Speech and Text using Whisper AI: A Deep Learning Approach." International Journal for Research in Applied Science and Engineering Technology 13, no. 7 (2025): 1895–901. https://doi.org/10.22214/ijraset.2025.73288.

Full text

Abstract:

In an increasingly interconnected world, the ability to accurately translate between multiple languages, both written and spoken, is essential for global communication. Traditional machine translation and speech recognition systems often operate as separate pipelines, leading to increased complexity and reduced efficiency, especially when dealing with low-resource languages or noisy audio environments. This research presents a comprehensive study of Whisper AI, a multilingual, multitask model developed by OpenAI for speech recognition and translation. Leveraging a transformer-based encoder-dec

APA, Harvard, Vancouver, ISO, and other styles

6

Ghale, Akarsh, Janaki K, and Devaraj Verma C. "Instant Transcription and Translation Tool using OpenAI?s Whisper ASR Model." International Journal of Science and Research (IJSR) 11, no. 12 (2022): 185–88. http://dx.doi.org/10.21275/sr221203164929.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Pratama, Riefkyanov Surya Adia, and Agit Amrullah. "ANALYSIS OF WHISPER AUTOMATIC SPEECH RECOGNITION PERFORMANCE ON LOW RESOURCE LANGUAGE." Jurnal Pilar Nusa Mandiri 20, no. 1 (2024): 1–8. http://dx.doi.org/10.33480/pilar.v20i1.4633.

Full text

Abstract:

Implementing Automatic Speech Recognition Technology in daily life could give convenience to its users. However, speeches that can be recognized accurately by the ASR model right now are in languages considered high resources, like English. In previous research, a few regional languages like Javanese, Sundanese, Balinese and Btaknese are used in automatic speech recognition. This research aim is to improve speech recognition using the ASR model on low-resource language. The dataset used in this research is the Javanese dataset specifically because there is a high-quality Javanese speech datase

APA, Harvard, Vancouver, ISO, and other styles

8

Polat, Hüseyin, Alp Kaan Turan, Cemal Koçak, and Hasan Basri Ulaş. "Implementation of a Whisper Architecture-Based Turkish Automatic Speech Recognition (ASR) System and Evaluation of the Effect of Fine-Tuning with a Low-Rank Adaptation (LoRA) Adapter on Its Performance." Electronics 13, no. 21 (2024): 4227. http://dx.doi.org/10.3390/electronics13214227.

Full text

Abstract:

This paper focuses on the implementation of the Whisper architecture to create an automatic speech recognition (ASR) system optimized for the Turkish language, which is considered a low-resource language in terms of speech recognition technologies. Whisper is a transformer-based model known for its high performance across numerous languages. However, its performance in Turkish, a language with unique linguistic features and limited labeled data, has yet to be fully explored. To address this, we conducted a series of experiments using five different Turkish speech datasets to assess the model’s

APA, Harvard, Vancouver, ISO, and other styles

9

Maurya, Maruti, Mohd Zaheer, Nawab Mohammad, Sadaf siddiqui, Mohd Zeeshan Khan, and Mohd Ayan Akram. "Speech Recognition Technologies: Design, Challenges, and Real-World Applications." International Journal of Innovative Research in Computer Science and Technology 13, no. 3 (2025): 55–61. https://doi.org/10.55524/ijircst.2025.13.3.9.

Full text

Abstract:

This paper presents an automated speech recognition (ASR) system that transcribes audio from YouTube videos into accurate text using OpenAI's Whisper model. Leveraging tools such as yt_dlp, FFmpeg, and PyTorch, the system creates a robust speech-to-text pipeline. On receiving a video URL, the system extracts and preprocesses audio, transcribes it using Whisper, and evaluates transcription quality through metrics like Word Error Rate (WER), Character Error Rate (CER), and Match Error Rate (MER). The pipeline supports offline use, making it suitable for accessible, cost-effective deployment in e

APA, Harvard, Vancouver, ISO, and other styles

10

Lee, Sangmin, Woojin Chung, and Hong-Goo Kang. "LAMA-UT: Language Agnostic Multilingual ASR Through Orthography Unification and Language-Specific Transliteration." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 23 (2025): 24393–401. https://doi.org/10.1609/aaai.v39i23.34617.

Full text

Abstract:

Building a universal multilingual automatic speech recognition (ASR) model that performs equitably across languages has long been a challenge due to its inherent difficulties. To address this task we introduce a Language-Agnostic Multilingual ASR pipeline through orthography Unification and language-specific Transliteration (LAMA-UT). LAMA-UT operates without any language-specific modules while matching the performance of state-of-the-art models trained on a minimal amount of data. Our pipeline consists of two key steps. First, we utilize a universal transcription generator to unify orthograph

APA, Harvard, Vancouver, ISO, and other styles

11

Viomesh Kumar Singh. "Transcripto Fine-Tuning Multilingual ASR for Indian Grievance Feedback Calls." Journal of Information Systems Engineering and Management 10, no. 42s (2025): 249–61. https://doi.org/10.52783/jisem.v10i42s.7880.

Full text

Abstract:

This paper presents a comprehensive study on fine-tuning automatic speech recognition (ASR) models for Indian languages, particularly Marathi and Hindi, using the Common Voice 13.0 dataset. By leveraging OpenAI’s Whisper-small model architecture and implementing cuttingedge techniques such as sequence-to-sequence learning, multilingual support, and normalization, this research achieves state-of-the-art Word Error Rates (WER). The Marathi finetuned model exhibits a WER of 17.79%, while the Hindi fine-tuned model achieves 18.85%. The proposed system supports key functionalities such as transcrip

APA, Harvard, Vancouver, ISO, and other styles

12

Mengke, Dalai, Yan Meng, and Péter Mihajlik. "Transliteration-Aided Transfer Learning for Low-Resource ASR: A Case Study on Khalkha Mongolian." Electronics 14, no. 6 (2025): 1137. https://doi.org/10.3390/electronics14061137.

Full text

Abstract:

Automatic Speech Recognition (ASR) systems have made consistent advancements, achieving notable improvements in state-of-the-art performance across various languages. However, their effectiveness often declines significantly in low-resource settings, where data and linguistic resources are limited. This paper addresses the challenges of ASR for a low-resource language, Khalkha Mongolian, by leveraging a transliteration-aided transfer learning approach. Specifically, it improves the ASR system for Khalkha Mongolian by transliterating text from a well-resourced Chakhar Mongolian (Uighur script)

APA, Harvard, Vancouver, ISO, and other styles

13

Chen, Junrong, Jan Kwong, and Sarah C. Creel. "Accented sentence and word recognition: Humans versus whisper automatic speech recognition." Journal of the Acoustical Society of America 156, no. 4_Supplement (2024): A50. https://doi.org/10.1121/10.0035071.

Full text

Abstract:

Despite advancements in speech recognition technology, questions remain about model generalizability and how much models mirror human perception. These questions are addressed by comparing OpenAI's Whisper model and 75 human transcribers on 300 English sentences (20 speakers, half F, half M, half US-accented, half [Mexican-]Spanish-accented). Sentences ended in 100 target words, with ⅓ high-predictability sentences (The farmer milked the cows) and ⅔ varying degrees of low-predictability (The farmer/barmer milked the nose). Target-word error rate (WER) was examined for final words in sentences

APA, Harvard, Vancouver, ISO, and other styles

14

Nacimiento-García, Eduardo, Holi Sunya Díaz-Kaas-Nielsen, and Carina S. González-González. "Gender and Accent Biases in AI-Based Tools for Spanish: A Comparative Study between Alexa and Whisper." Applied Sciences 14, no. 11 (2024): 4734. http://dx.doi.org/10.3390/app14114734.

Full text

Abstract:

Considering previous research indicating the presence of biases based on gender and accent in AI-based tools such as virtual assistants or automatic speech recognition (ASR) systems, this paper examines these potential biases in both Alexa and Whisper for the major Spanish accent groups. The Mozilla Common Voice dataset is employed for testing, and after evaluating tens of thousands of audio fragments, descriptive statistics are calculated. After analyzing the data disaggregated by gender and accent, it is observed that, for this dataset, in terms of means and medians, Alexa performs slightly

APA, Harvard, Vancouver, ISO, and other styles

15

Kunisetty, Jaswanth, Pranav Ramachandrula, Sruthi S, Susmitha Vekkot, and Deepa Gupta. "Advancing ASR for Indian-Accented English: Dataset Creation and Whisper Fine-Tuning." Procedia Computer Science 258 (2025): 2510–19. https://doi.org/10.1016/j.procs.2025.04.513.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Xu, Yuanhang. "Revitalizing Cantonese Proficiency: An Interactive ASR-Driven Approach." Communications in Humanities Research 27, no. 1 (2024): 260–65. http://dx.doi.org/10.54254/2753-7064/27/20231778.

Full text

Abstract:

In recent years, an increasing number of children in the Guangdong region found themselves unfamiliar with Cantonese which is their native language. This trend poses a significant threat to the preservation of the rich cultural heritage and traditions associated with the Cantonese language, meanwhile Cantonese is very different from Mandarin so it is very difficult to learn. Recognizing this challenge, the "Assistant for Cantonese Learning" app has been developed to bridge this gap. The app aids in using an ASR model to provide exercises tailored to a student's proficiency, allowing the user's

APA, Harvard, Vancouver, ISO, and other styles

17

Shah, Daaman, Rishit Saboo, Aiyush Dwivedi, and Megh Gajjar. "Integrating Faster Whisper with Deep Learning Speaker Recognition." International Journal of Computer Science and Mobile Computing 13, no. 9 (2024): 1–8. http://dx.doi.org/10.47760/ijcsmc.2024.v13i09.001.

Full text

Abstract:

Effectively communicating and understanding can be a challenging task for people that are either deaf or hard of hearing, it involves them to constantly rely on help to adequately fit in, however with assistive technologies they can minimize their everyday problems. This paper contributes as an advancement to one of these techs and addresses to integrate Faster Whisper, a real-time Automatic Speech Recognition (ASR) model, and a deep learning-based speaker recognition system built on ResNet Convolutional Neural Network (CNN) architecture. Noise Augmentation is employed to enhance the capabilit

APA, Harvard, Vancouver, ISO, and other styles

18

Jelassi, Mariem, Oumaima Jemai, and Jacques Demongeot. "Revolutionizing Radiological Analysis: The Future of French Language Automatic Speech Recognition in Healthcare." Diagnostics 14, no. 9 (2024): 895. http://dx.doi.org/10.3390/diagnostics14090895.

Full text

Abstract:

This study introduces a specialized Automatic Speech Recognition (ASR) system, leveraging the Whisper Large-v2 model, specifically adapted for radiological applications in the French language. The methodology focused on adapting the model to accurately transcribe medical terminology and diverse accents within the French language context, achieving a notable Word Error Rate (WER) of 17.121%. This research involved extensive data collection and preprocessing, utilizing a wide range of French medical audio content. The results demonstrate the system’s effectiveness in transcribing complex radiolo

APA, Harvard, Vancouver, ISO, and other styles

19

Mulfari, Davide, and Massimo Villari. "A Voice User Interface on the Edge for People with Speech Impairments." Electronics 13, no. 7 (2024): 1389. http://dx.doi.org/10.3390/electronics13071389.

Full text

Abstract:

Nowadays, fine-tuning has emerged as a powerful technique in machine learning, enabling models to adapt to a specific domain by leveraging pre-trained knowledge. One such application domain is automatic speech recognition (ASR), where fine-tuning plays a crucial role in addressing data scarcity, especially for languages with limited resources. In this study, we applied fine-tuning in the context of atypical speech recognition, focusing on Italian speakers with speech impairments, e.g., dysarthria. Our objective was to build a speaker-dependent voice user interface (VUI) tailored to their uniqu

APA, Harvard, Vancouver, ISO, and other styles

20

Rai, Anand Kumar, Siddharth D. Jaiswal, and Animesh Mukherjee. "A Deep Dive into the Disparity of Word Error Rates across Thousands of NPTEL MOOC Videos." Proceedings of the International AAAI Conference on Web and Social Media 18 (May 28, 2024): 1302–14. http://dx.doi.org/10.1609/icwsm.v18i1.31390.

Full text

Abstract:

Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of ~9.8K technical lectures in the English language along with their transcripts delivered b

APA, Harvard, Vancouver, ISO, and other styles

21

Klimov, Roman Aleckseevich, and Azat Shavkatovich Yakupov. "Development of a System for Searching and Indexing the Content of Audio Recordings." Russian Digital Libraries Journal 26, no. 4 (2023): 483–97. https://doi.org/10.26907/1562-5419-2023-26-4-483-497.

Full text

Abstract:

The article is devoted to the development of a search and indexing system for audio files using Automatic Speech Recognition (ASR) and Elasticsearch. Current Russian-language audio file transcription systems have been analyzed, and Whisper has been chosen as the best one. An algorithm for optimizing transcription speed using parallelization of file processing processes has been developed, and its effectiveness has been demonstrated. A microservice architecture-based system has been built, capable of indexing audio file content and their metadata for search purposes. The research results show t

APA, Harvard, Vancouver, ISO, and other styles

22

Bhargavi, A. D. "Video Transcripts Summarization using OpenAI Whisper and GPT Model." International Journal for Research in Applied Science and Engineering Technology 12, no. 3 (2024): 2319–27. http://dx.doi.org/10.22214/ijraset.2024.59365.

Full text

Abstract:

Abstract: In today’s digital age, a vast amount of video content is generated and shared on the internet every minute. However, extracting relevant information from these videos can be time-consuming and challenging. This is where video transcript summarization comes in, providing a concise summary of video content without the need to watch the entire video. The video transcript summarization system aims to streamline the process of extracting key insights and information from video content by generating concise and informative summaries from their transcripts. In the dynamic landscape of vide

APA, Harvard, Vancouver, ISO, and other styles

23

Tilkar, Swati. "Generating Meeting Transcription Using Natural Language Processing." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 06 (2025): 1–9. https://doi.org/10.55041/ijsrem51091.

Full text

Abstract:

Natural Language Processing plays a pivotal role in automating the transcription of meetings. It enables machines to understand, interpret, and generate human language. In meeting transcription, NLP components such as Automatic Speech Recognition (ASR), speaker diarization, entity recognition, summarization, and sentiment analysis work together to produce accurate and readable transcripts. ASR converts spoken words into text, while NLP refines the raw output by correcting grammatical errors, identifying speakers, and structuring dialogue for readability and comprehension. Ethical consideration

APA, Harvard, Vancouver, ISO, and other styles

24

Khairani, Dewi, Tabah Rosyadi, Arini Arini, Imam Luthfi Rahmatullah, and Fauzan Farhan Antoro. "Enhancing Speech-to-Text and Translation Capabilities for Developing Arabic Learning Games: Integration of Whisper OpenAI Model and Google API Translate." JURNAL TEKNIK INFORMATIKA 17, no. 2 (2024): 203–12. http://dx.doi.org/10.15408/jti.v17i2.41240.

Full text

Abstract:

This study tackles language barriers in computer-mediated communication by developing an application that integrates OpenAI’s Whisper ASR model and Google Translate machine translation to enable real-time, continuous speech transcription and translation and the processing of video and audio files. The application was developed using the Experimental method, incorporating standards for testing and evaluation. The integration expanded language coverage to 133 languages and improved translation accuracy. Efficiency was enhanced through the use of greedy parameters and the Faster Whisper model. Us

APA, Harvard, Vancouver, ISO, and other styles

25

AboSarafa, Maryam, and Mohamed Arteimi. "DEVELOPMENT OF SMART VOICE AGENT With case study (Libyan Voice Assistant)." Academy Journal For Basic and Applied Sciences 7, no. 1 (2025): 1–11. https://doi.org/10.5281/zenodo.15505226.

Full text

Abstract:

<strong><em>The paper presents the creation of&ensp;an end-to-end voice assistant system designed for a lesser-resourced dialect of Arabic, Libyan Tripolitanian, which does not receive local support in commercial ASR and NLP applications. To remediate this lack, we built a demographically balanced and phonemically rich corpus of speech data containing&ensp;over 13,000 audio samples. It contains both natural&ensp;and semi-structured utterances and is annotated using the CODA* orthography for dialectal Arabic. Using this dataset, we trained the OpenAI Whisper model with the Hugging Face Transfor

APA, Harvard, Vancouver, ISO, and other styles

26

Dang, Duc Thinh, Nguyen Duc Vuong, Luong Dinh Ha, Nguyen Cong Thanh, Nguyen Chi Thanh, and Nhu Hai Phung. "Intent classification for voice-based military information search on digital maps using integrated BiGRU-CNN network and speech recognition technology." Journal of Military Science and Technology, CSCE8 (December 30, 2024): 87–97. https://doi.org/10.54939/1859-1043.j.mst.csce8.2024.87-97.

Full text

Abstract:

Searching for information is one of the most important functions of software that supports drafting operational documents on digital maps. To enhance usability and meet the demands of modern military operations, it is necessary to automate the information search function using voice commands. A universal voice search tool that supports searches for various types of information requires an initial step of search intent classification. This paper proposes the development of a search intent classification process using an integrated BiGRU-CNN network and automatic speech recognition technology (A

APA, Harvard, Vancouver, ISO, and other styles

27

Małecki, Paweł, та Magdalena Piotrowska. "Нови тенденциї у розвою сучасней линґвистики у Сербї". Rocznik Ruskiej Bursy 20 (10 грудня 2024): 189–204. https://doi.org/10.12797/rrb.20.2024.20.10.

Full text

Abstract:

ANALIZA I KLASYFIKACJA JĘZYKA RUSIŃSKIEGO PRZY UŻYCIU MODELU SZTUCZNEJ SIECI NEURONOWEJ ASR OPENAI WHISPERArtykuł przedstawia analizę lingwistyczną języka rusińskiego, koncentrując się na jego złożonych i zmieniających się aspektach, takich jak wymowa oraz różnice indywidualne, regionalne i historyczne. Do przeprowadzenia badania wykorzystano sztuczną sieć neuronową opartą na modelu OpenAI Whisper. Model ten, choć szkolony na danych z większości państwowych języków urzędowych, nie był bezpośrednio trenowany na bazach próbek języka rusińskiego ze względu na jego lokalny i mniejszościowy/etniczn

APA, Harvard, Vancouver, ISO, and other styles

28

Lyu, Ke-Ming, Ren-yuan Lyu, and Hsien-Tsung Chang. "Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation." PeerJ Computer Science 10 (March 29, 2024): e1973. http://dx.doi.org/10.7717/peerj-cs.1973.

Full text

Abstract:

This research presents the development of a cutting-edge real-time multilingual speech recognition and speaker diarization system that leverages OpenAI’s Whisper model. The system specifically addresses the challenges of automatic speech recognition (ASR) and speaker diarization (SD) in dynamic, multispeaker environments, with a focus on accurately processing Mandarin speech with Taiwanese accents and managing frequent speaker switches. Traditional speech recognition systems often fall short in such complex multilingual and multispeaker contexts, particularly in SD. This study, therefore, inte

APA, Harvard, Vancouver, ISO, and other styles

29

Shrivastava, Vishal, and Marisha Speights. "Overcoming biases in state-of-the-art automatic speech recognition for young children with speech disorders." Journal of the Acoustical Society of America 156, no. 4_Supplement (2024): A100. https://doi.org/10.1121/10.0035243.

Full text

Abstract:

State-of-the-art models like Whisper and GPT-4o face significant challenges in recognizing and processing child speech, particularly disordered speech, due to the limited availability of annotated child speech data and inherent demographic biases. Our research aims to bridge this gap by adapting these models for more accurate classification and recognition of disordered vs. non-disordered child speech. Using the SEED corpus, we leveraged advanced data augmentation, transfer learning, and parameter-efficient fine-tuning, achieving significant Word Error Rate (WER) reductions: from 57.5% to 10.3

APA, Harvard, Vancouver, ISO, and other styles

30

Niu, Tong, Yaqi Chen, Dan Qu, and Hengbo Hu. "Enhancing Far-Field Speech Recognition with Mixer: A Novel Data Augmentation Approach." Applied Sciences 15, no. 7 (2025): 4073. https://doi.org/10.3390/app15074073.

Full text

Abstract:

Recent advancements in end-to-end (E2E) modeling have notably improved automatic speech recognition (ASR) systems; however, far-field speech recognition (FSR) remains challenging due to signal degradation from factors such as low signal-to-noise ratio, reverberation, and interfering sounds. This requires richer training data and multi-channel speech enhancement. To address this gap, we introduce Mixer, a novel data augmentation technique designed to further enhance the performance of large-scale pre-trained models for FSR. Mixer interpolates and mixes feature representations of speech samples

APA, Harvard, Vancouver, ISO, and other styles

31

Ms.R.R. Owhal, Pauravi Vinchurkar, Harsh Raut, Abhijeet Ravatale, and Swapnil Pokale. "Talklingo: A Smart Solution for Multilingual Communication." International Research Journal on Advanced Engineering Hub (IRJAEH) 3, no. 03 (2025): 465–72. https://doi.org/10.47392/irjaeh.2025.0064.

Full text

Abstract:

In today’s interconnected world, language barriers hinder access to essential services like education, healthcare, and global collaboration, creating a pressing need for efficient multilingual communication tools. Traditional text-based translators, while useful, often fall short in supporting natural, spontaneous speech, making them inadequate for live conversations. To address this, TalkLingo introduces an innovative speech-to-speech translation system that seamlessly integrates Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS) technologies. By incorporat

APA, Harvard, Vancouver, ISO, and other styles

32

Karne, Sravanthi. "Realistic Video Synthesis from Audio using GAN." International Journal for Research in Applied Science and Engineering Technology 13, no. 7 (2025): 757–61. https://doi.org/10.22214/ijraset.2025.73064.

Full text

Abstract:

Realistic video generation from audio input is a challenging and emerging domain in the intersection of natural language processing, computer vision, and generative modeling. The ability to automatically generate coherent and visually compelling video content from raw audio has promising applications in media creation, virtual education, assistive technologies, and entertainment. Manual video creation remains time-consuming and skill-intensive, while automated solutions often lack semantic alignment and visual realism. To address this gap, this project proposes an end-to-end intelligent pipeli

APA, Harvard, Vancouver, ISO, and other styles

33

Jiang, Yu Zhi, Yan Bo Li, Yue Xin Han, and Wan Zhong Yin. "Study of Mechanical Properties of Magnesium Oxysulfate Whisker/ABS Composites." Advanced Materials Research 92 (January 2010): 241–46. http://dx.doi.org/10.4028/www.scientific.net/amr.92.241.

Full text

Abstract:

The magnesium oxysulfate whisker/ABS composites were prepared by making the magnesium oxysulfate whisker as dispersed phase.Scaning electron microscope(SEM) was used to test the distribution of whiskers in ABS matrix.Effect of the surface modification and the amount of whiskers to the mechanical properties of composites were studied.The results show that,the surface modification can improve effectively dispersity both itself and in ABS matrix,interface acting force between ABS matrix and whiskers, and mechanical properties of composites ,remarkably.When the mass ratio of whisker to ABS is 40/1

APA, Harvard, Vancouver, ISO, and other styles

34

Sachdev, Robert N. S., Takashi Sato, and Ford F. Ebner. "Divergent Movement of Adjacent Whiskers." Journal of Neurophysiology 87, no. 3 (2002): 1440–48. http://dx.doi.org/10.1152/jn.00539.2001.

Full text

Abstract:

The current view of whisker movement is that ∼25 whiskers on each side of the face move in synchrony. To determine whether whiskers are constrained to move together, we trained rats to use two whiskers on the same side of the face in simple behavioral tasks and videotaped the whiskers during the task. Here we report that the movement of adjacent whiskers is usually synchronous but can diverge: 1) the distance between whiskers can vary dramatically during movement; 2) one whisker can move while the second one remains stationary; 3) two whiskers can simultaneously move in opposite directions; an

APA, Harvard, Vancouver, ISO, and other styles

35

Jiang, Zhong Li, Qiu Ju Sun, Yan Ran Zhang, and Lin Li. "Thermal Behavior, Mechanical Properties and Microstructure of Polypropylene Composites Filled by Calcium Carbonate Whiskers." Advanced Materials Research 503-504 (April 2012): 494–97. http://dx.doi.org/10.4028/www.scientific.net/amr.503-504.494.

Full text

Abstract:

CaCO3 whiskers were treated by stearic acid and the surface property of the treated whisker were evaluated by activation index. When the mass fraction of stearic acid was 4.0%, activation exponential increased to 80.0% almost from zero. The composites were prepared by blending with polypropylene and the treated whiskers. The performance of the composites, such as thermal behavior, mechanical properties and microstructure, were analyzed with differential scanning calorimetry, scanning electron microscopy and tensile testing. The results showed that CaCO3 whisker played heterogeneous nucleation

APA, Harvard, Vancouver, ISO, and other styles

36

Kuwano, Noriyuki, Sadanori Horikami, Masanori Maeda, and Harini Sosiati. "TEM and SEM Analysis for Formation Mechanism of Tin Whiskers." Advanced Materials Research 545 (July 2012): 16–20. http://dx.doi.org/10.4028/www.scientific.net/amr.545.16.

Full text

Abstract:

Close observation with a transmission electron microscope (TEM) and a scanning electron microscope (SEM) was performed for the growth process of tin (Sn) whiskers on lead (Pb)-free Sn-plating. Whiskers were formed on a Sn layer plated on Cu/polyimide flexible substrate. The whisker was found to be of a single crystal and have a characteristic "Y"-shaped grain boundary structure at its root. The growth process of a curling whisker was successfully observed in a continuous way in SEM. TEM observation revealed that the curling whisker had a single crystallographic orientation irrespective with it

APA, Harvard, Vancouver, ISO, and other styles

37

Jiang, Yu Zhi, Li Li Zhang, and Zhong Yang Zhang. "Study on the Flame Retardant Property of Magnesium Hydroxide Whiskers/PE Composites." Advanced Materials Research 454 (January 2012): 93–96. http://dx.doi.org/10.4028/www.scientific.net/amr.454.93.

Full text

Abstract:

The magnesium hydroxide (MH) whisker/PE composites were prepared by melt-extrusion with modified whiskers and unmodified whiskers as fillers respectively. The flame retardant property of composites was tested by the oxygen index machine. The magnesium hydroxide whiskers could improve the flame retardant property of the composites. When the contents of the modified and unmodified whiskers increased from 10% to 60%, the oxygen index of the composites increased from 24.4% to 43.4%, 23.2% to 40.2% respectively. The flame retardant of the modified whiskers/PE composites was better than the unmodifi

APA, Harvard, Vancouver, ISO, and other styles

38

Shen, Rui, Gang Chu, and Xin Pu Shen. "Advances Research on Preparation of Magnesium Carbonate Whisker." Advanced Materials Research 699 (May 2013): 17–21. http://dx.doi.org/10.4028/www.scientific.net/amr.699.17.

Full text

Abstract:

Magnesium carbonate whisker is a single crystal of magnesium carbonate crystal, which has an integral crystal growth, and small defects, furthermore, it is colorless, transparent, and has high-intensity. It has an incomparable property compared to other conventional whiskers. It is a cost-effective whisker; thus has been of great interest due of this characteristic. Reported magnesium carbonate whiskers have been increased gradually since the beginning of this century. The preparation of magnesium carbonate whisker can be done by using chemicals as raw material, as well as by using natural res

APA, Harvard, Vancouver, ISO, and other styles

39

Jiang, Yu Zhi, Yue Xin Han, Wan Zhong Yin, and Yan Bo Li. "Study on Process and Mecahnism for Preparation of Magnesium Hydroxide Whiskers." Advanced Materials Research 92 (January 2010): 247–54. http://dx.doi.org/10.4028/www.scientific.net/amr.92.247.

Full text

Abstract:

Taking alkaline magnesium sulfate whiskers and sodium hydroxide as raw materials, the proper process parameters were experimentally investigated for magnesium hydroxide whiskers preparation by hydro-thermal synthesis. The experimental results show that satisfactory product of magnesium hydroxide whiskers can be obtained under the proper synthesis conditions. The whiskers are characterized with smooth surface, straight morphology, and small diameter. The whisker product of magnesium hydroxide is of less than 0.5μm in diameter, around 100 in slenderness ratio, and 99.69% in purity under the cond

APA, Harvard, Vancouver, ISO, and other styles

40

Sakakida, Tomomi, Tatsuo Kubouchi, Yasuyuki Miyano, Mamoru Takahashi, and Osamu Kamiya. "Effect of Environment on Sn Whisker Growth during Welding of Electronic Wires." Advanced Materials Research 1110 (June 2015): 235–40. http://dx.doi.org/10.4028/www.scientific.net/amr.1110.235.

Full text

Abstract:

In Pb-free Al-Sn welding of electrolytic parts, single-crystal Sn whiskers easily form and can cause problems such as short circuits. Here we report that the growth of Sn whiskers in the weld zone of Al electrolytic condenser leads was suppressed in a vacuum environment. We examined the effect of the environment and weld metal microstructure in order to understand how to control and prevent whisker growth. In vacuum, the weld zone did not form whiskers after more than 100 h, whereas in air, whiskers grew within several hours. This suggests that whiskers require oxygen to form. The growth can b

APA, Harvard, Vancouver, ISO, and other styles

41

Yan, Ping Ke, Bin Wang, and Yu Juan Gao. "Study on Synthesis of the High Aspect Ratios Nesquehonite Whiskers." Advanced Materials Research 239-242 (May 2011): 1118–22. http://dx.doi.org/10.4028/www.scientific.net/amr.239-242.1118.

Full text

Abstract:

In this paper, nesquehonite whiskers were synthesized by low-temperature aqueous solution method, and the impacts of reaction temperature, reaction time and surfactant dosage and other factors on the maximum whisker length and high aspect ratios of nesquehonite whiskers were also investigated. Results showed that under the conditions that the reaction temperature was 40 – 50 °C the reaction time was 50 – 60min and the amount of surfactant dosage was 1% (by mass), high aspect ratios nesquehonite whisker products can be synthesized. On this basis, growth mechanism of the nesquehonite whiskers wa

APA, Harvard, Vancouver, ISO, and other styles

42

Zhao, Ping, Shuai Zhao, Tai Rong Zhao, Xue Hua Ren, Feng Wang, and Xiu Na Chen. "Hydroxyapatite Whisker Effect on Strength of Calcium Phosphate Bone Cement." Advanced Materials Research 534 (June 2012): 30–33. http://dx.doi.org/10.4028/www.scientific.net/amr.534.30.

Full text

Abstract:

Hydroxyapatite whisker was the reinforcement phase to prepare whisker/calcium phosphate cement composites, which was obtained by homogeneous precipitation method, with 2.5~15 microns in length, 2~30 length/diameter ratio. Mechanical properties and microstructure of composites were tested. With the increase of hydroxyapatite whisker addition, composites strength reduces after the first rise. When hydroxyapatite whisker is added to 4%(wt), the composite achieves the maximum strength. SEM method was used to observe fracture microstructures of composite materials. As a result, dispersion degree of

APA, Harvard, Vancouver, ISO, and other styles

43

Zhao, Guo Long, Chuan Zhen Huang, Han Lian Liu, Bin Zou, Hong Tao Zhu, and Jun Wang. "Microstructure and Mechanical Properties of Al₂O₃-TaC_w Ceramic Cutting Tool Materials." Advanced Materials Research 797 (September 2013): 172–76. http://dx.doi.org/10.4028/www.scientific.net/amr.797.172.

Full text

Abstract:

Three kinds of in-situ growth TaC whiskers toughening Al2O3 matrix ceramic cutting tool materials were prepared by two steps, which were in-situ synthesis of TaC whiskers in Al2O3 matrix powder by carbothermal reduction process and hot pressing of the composites respectively. The preparation process, microstructure, mechanical properties and toughening mechanisms of the composites were investigated. The in-situ synthesized TaC whiskers had a diameter of 0.1-0.5μm and an aspect ratio of 10-30. The composite containing 20vol.% TaC had the optimal comprehensive mechanical properties with flexural

APA, Harvard, Vancouver, ISO, and other styles

44

Li, Yan Bo, Yu Zhi Jiang, Jin Gui He, and Yu Lian Wang. "The Compatibility of Magnesium Hydroxide Whiskers in Organic Phase." Advanced Materials Research 412 (November 2011): 388–92. http://dx.doi.org/10.4028/www.scientific.net/amr.412.388.

Full text

Abstract:

The magnesium hydroxide (MH) whiskers were modified in micro-emulsion by introducing monomer that can polymerize. The SEM and FT-IR were used to analyze the structure after polymeration in whiskers surface. The results shows that there was flexible layer forming in the surface of whisker and the compatibility of whiskers in organic phase was improved. Get the MH/PE (polyethylene) composites by melt extrusion, study the compatibility of MH whiskers in organic phase. The results shows that MH whiskers after modification play a signification toughening effect in composites.

APA, Harvard, Vancouver, ISO, and other styles

45

Qiu, Hui Hui, Kang Bi Luo, and Hu Ping Li. "Progress on Preparation and Application of Calcium Carbonate Whisker." Advanced Materials Research 1094 (March 2015): 113–17. http://dx.doi.org/10.4028/www.scientific.net/amr.1094.113.

Full text

Abstract:

Calcium carbonate whisker is a new kind of environmental-friendly inorganic material. Some advantages of calcium carbonate whisker such as low manufacture cost, simple preparation condition and excellent properties draw a lot of attention and make it higher potential competitive in the market. This review summarizes the research progress of calcium carbonate whiskers prepared with the different methods such as carbonation method, metathesis reaction method, sol-gel method, urea hydrolysis method and gravity crystallization method, and describes the research status in diverse applications of ca

APA, Harvard, Vancouver, ISO, and other styles

46

Sui, Xue Ye, Jie Xu, Han Li, et al. "Preparation of Aluminum Nitride Whiskers." Advanced Materials Research 1058 (November 2014): 7–10. http://dx.doi.org/10.4028/www.scientific.net/amr.1058.7.

Full text

Abstract:

Aluminum nitride whiskers have excellent characteristics, not only can be used in the high heat conductivity for the preparation of a new composite, but also can be used as a reinforcing agent for the preparation of a new composite toughened. Using wet, melamine, and aluminum nitrate as raw material, aluminum nitride whiskers precursor are prepared and pure aluminum nitride whiskers can be obtained by nitrogen and carbon removal processes. This kind of aluminum nitride whiskers possess smooth surface, uniform length, straight whisker, and a long cylindrical structure with a diameter of 4-6 μm

APA, Harvard, Vancouver, ISO, and other styles

47

Sun, Qiu Ju, Gui Zhen Zhao, Shi Gang Xin, et al. "Performance of Polypropylene Composites Filled by Calcium Carbonate Whiskers." Advanced Materials Research 399-401 (November 2011): 415–18. http://dx.doi.org/10.4028/www.scientific.net/amr.399-401.415.

Full text

Abstract:

Calcium carbonate (CaCO3) whiskers were firstly treated by sodium stearate, and then blended with polypropylene (PP) to prepare the composites by a closely intermeshing co-rotating twin-screw extruder at 200°C. The performance of the composites, such as thermal behavior, microstructure and mechanical properties, were analyzed with differential scanning calorimetry (DSC), thermogravimetric analysis (TGA), X-ray diffraction (XRD), scanning electron microscopy (SEM) and mechanical testing. The results showed that CaCO3 whisker influenced the crystalline behavior of PP phase in the blends because

APA, Harvard, Vancouver, ISO, and other styles

48

Cui, Wen, and Shao Jun Qi. "The Effect of Surface Finish on Zinc Whisker Growth." Advanced Materials Research 472-475 (February 2012): 2756–59. http://dx.doi.org/10.4028/www.scientific.net/amr.472-475.2756.

Full text

Abstract:

To understand the relationship between surface finish and zinc whisker growth, this study investigated the growth of whiskers on two mild steel substrates of different surface finish by Field Emission Gun Scanning Electron Microscope (FEG SEM). Results show that, under the same experimental conditions, deposits on substrates with a mirror finish grew less whiskers and nodules than substrates with a rough surface finish.

APA, Harvard, Vancouver, ISO, and other styles

49

Zhang, Shu Hua, Wen Jun Gan, Wu Xing Sun, Chen Jun Ling, Xie Wang, and Qing Feng Li. "Study on Structures and Properties of CaSO₄ Whiskers/PVC Composites." Advanced Materials Research 335-336 (September 2011): 234–39. http://dx.doi.org/10.4028/www.scientific.net/amr.335-336.234.

Full text

Abstract:

Structures of CaSO4 whisker(A) and CaSO4 whisker(B) were characterized by IR and XRD, and the morphologies were observed by optical microscope. The results showed that the modified CaSO4 whisker(B) has perfect crystal structure, high crystallinity and less defects. Mechanical properties and static thermal stability time(190 °C) of two composites which were prepared by CaSO4 whisker(A) / PVC and CaSO4 whisker(B) / PVC were tested separately, and the degradation processes were analyzed by TG at 50 °C-400°C, N2 atmosphere, and the dispersing states of two whiskers in the composites were observed

APA, Harvard, Vancouver, ISO, and other styles

50

Li, Shi-Bo, Guo-Ping Bei, Hong-Xiang Zhai, Zhi-Li Zhang, Yang Zhou, and Cui-Wei Li. "The origin of driving force for the formation of Sn whiskers at room temperature." Journal of Materials Research 22, no. 11 (2007): 3226–32. http://dx.doi.org/10.1557/jmr.2007.0402.

Full text

Abstract:

Sn whiskers can form at room temperature on the agglomerated flakes produced by mechanical alloying (MA) of Ti, Sn, and C powders, whether the flakes are stored in air or water. The Sn whiskers forming in air are tens of micrometers to several centimeters in length and 0.5 to ∼10 μm in diameter. Whereas a large amount of Sn polyhedra forms on the flakes that are stored in water, a small amount of Sn whiskers forms on the polyhedra. The driving force for Sn whisker formation is the compressive stress induced by mechanical alloying (MA) and oxidation. The mechanism about the spontaneous growth o

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!