Log in

Relevant bibliographies by topics / OpenAI Whisper / Journal articles

To see the other types of publications on this topic, follow the link: OpenAI Whisper.

Journal articles on the topic 'OpenAI Whisper'

Author: Grafiati

Published: 5 June 2025

Last updated: 16 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 31 journal articles for your research on the topic 'OpenAI Whisper.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Ghale, Akarsh, Janaki K, and Devaraj Verma C. "Instant Transcription and Translation Tool using OpenAI?s Whisper ASR Model." International Journal of Science and Research (IJSR) 11, no. 12 (2022): 185–88. http://dx.doi.org/10.21275/sr221203164929.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Bhargavi, A. D. "Video Transcripts Summarization using OpenAI Whisper and GPT Model." International Journal for Research in Applied Science and Engineering Technology 12, no. 3 (2024): 2319–27. http://dx.doi.org/10.22214/ijraset.2024.59365.

Full text

Abstract:

Abstract: In today’s digital age, a vast amount of video content is generated and shared on the internet every minute. However, extracting relevant information from these videos can be time-consuming and challenging. This is where video transcript summarization comes in, providing a concise summary of video content without the need to watch the entire video. The video transcript summarization system aims to streamline the process of extracting key insights and information from video content by generating concise and informative summaries from their transcripts. In the dynamic landscape of vide

APA, Harvard, Vancouver, ISO, and other styles

3

William, Ezra, and Amalia Zahra. "Speech Recognition Dengan Whisper Dalam Bahasa Indonesia." Action Research Literate 9, no. 2 (2025): 386–97. https://doi.org/10.46799/arl.v9i2.2573.

Full text

Abstract:

Perkembangan teknologi kecerdasan buatan telah mendorong kemajuan dalam pengenalan suara (speech recognition), terutama dalam mendukung komunikasi digital yang lebih efisien. Salah satu model terbaru yang banyak digunakan adalah Whisper, yang dikembangkan oleh OpenAI dengan kemampuan pengenalan suara multibahasa yang diklaim memiliki akurasi tinggi. Namun, tantangan utama dalam implementasi teknologi ini di Indonesia adalah keterbatasan sumber daya data dalam bahasa lokal serta variasi aksen yang signifikan. Oleh karena itu, penelitian ini dilakukan untuk mengevaluasi kinerja model Whisper dal

APA, Harvard, Vancouver, ISO, and other styles

4

Amudhiniyan, Amudhiniyan. "Enhancing Communication between Speech and Hearing Impaired People." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 02 (2025): 1–9. https://doi.org/10.55041/ijsrem41922.

Full text

Abstract:

Mute Mate is a novel video conferencing system that uses Artificial Intelligence and Real-Time communication technologies to bridge the communication gap between sign language users and verbal communicators. The system uses YOLOv11 for sign language detection, OpenAI's Whisper model for speech-to-text translation, and WebRTC for real-time lag-free video communication. It ensures seamless communication between users of different modes of communication. Large-scale testing demonstrates the system's remarkable accuracy, low latency, and effectiveness, demonstrating its potential to revolutionize

APA, Harvard, Vancouver, ISO, and other styles

5

Małecki, Paweł, та Magdalena Piotrowska. "Нови тенденциї у розвою сучасней линґвистики у Сербї". Rocznik Ruskiej Bursy 20 (10 грудня 2024): 189–204. https://doi.org/10.12797/rrb.20.2024.20.10.

Full text

Abstract:

ANALIZA I KLASYFIKACJA JĘZYKA RUSIŃSKIEGO PRZY UŻYCIU MODELU SZTUCZNEJ SIECI NEURONOWEJ ASR OPENAI WHISPERArtykuł przedstawia analizę lingwistyczną języka rusińskiego, koncentrując się na jego złożonych i zmieniających się aspektach, takich jak wymowa oraz różnice indywidualne, regionalne i historyczne. Do przeprowadzenia badania wykorzystano sztuczną sieć neuronową opartą na modelu OpenAI Whisper. Model ten, choć szkolony na danych z większości państwowych języków urzędowych, nie był bezpośrednio trenowany na bazach próbek języka rusińskiego ze względu na jego lokalny i mniejszościowy/etniczn

APA, Harvard, Vancouver, ISO, and other styles

6

Bhute, Dr Harsha A. "MockMate: AI-Powered Online Mock Interview Assessment and Evaluation System." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem45858.

Full text

Abstract:

Abstract: In the current competitive job market, being well-prepared for interviews is essential to landing a job. Traditional mock interviews, however, are not scalable and can call for a large human resource commitment. By providing a tailored, automated, and interactive online platform driven by artificial intelligence, MockMate tackles this problem. The system mimics actual interview situations, assesses candidate responses in real time, and provides thorough, data-driven feedback by utilizing cutting-edge Natural Language Processing (NLP) and speech-to-text technologies. In addition to cu

APA, Harvard, Vancouver, ISO, and other styles

7

Papala, Gowtham, Aniket Ransing, and Pooja Jain. "Sentiment Analysis and Speaker Diarization in Hindi and Marathi Using using Finetuned Whisper." Scalable Computing: Practice and Experience 24, no. 4 (2023): 835–46. http://dx.doi.org/10.12694/scpe.v24i4.2248.

Full text

Abstract:

Automatic Speech Recognition (ASR) is a crucial technology that enables machines to automatically recognize human voices based on audio signals. In recent years, there has been a rigorous growth in the development of ASR models with the emergence of new techniques and algorithms. One such model is the Whisper ASR model developed by OpenAI, which is based on a Transformer encoder-decoder architecture and can handle multiple tasks such as language identification, transcription, and translation. However, there are still limitations to the Whisper ASR model, such as speaker diarization, summarizat

APA, Harvard, Vancouver, ISO, and other styles

8

Ferdiansyah, Danny, and Christian Sri Kusuma Aditya. "Implementasi Automatic Speech Recognition Bacaan Al-Qur’an Menggunakan Metode Wav2Vec 2.0 dan OpenAI-Whisper." Jurnal Teknik Elektro dan Komputer TRIAC 11, no. 1 (2024): 11–16. http://dx.doi.org/10.21107/triac.v11i1.24332.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Polepaka, Sanjeeva, Varikuppala Prashanth Kumar, S. Umesh Chandra, Hema Nagendra Sri Krishna, and Gaurav Thakur. "Automated Caption Generation for Video Call with Language Translation." E3S Web of Conferences 430 (2023): 01025. http://dx.doi.org/10.1051/e3sconf/202343001025.

Full text

Abstract:

In the modern era, virtual communication between individuals is common. Many people’s lives have been made simpler in a number of circumstances by providing subtitles, generating automated captions for social media videos, and language translation from a source language to a targeted language. Both are included, which offers face-to-face translated captions during video conversations. React is used for application development. To send the data, socket programming is utilized. Context is understood and translated using Google translate API and speech recognition modules. With OpenAI and Whisper

APA, Harvard, Vancouver, ISO, and other styles

10

Khairani, Dewi, Tabah Rosyadi, Arini Arini, Imam Luthfi Rahmatullah, and Fauzan Farhan Antoro. "Enhancing Speech-to-Text and Translation Capabilities for Developing Arabic Learning Games: Integration of Whisper OpenAI Model and Google API Translate." JURNAL TEKNIK INFORMATIKA 17, no. 2 (2024): 203–12. http://dx.doi.org/10.15408/jti.v17i2.41240.

Full text

Abstract:

This study tackles language barriers in computer-mediated communication by developing an application that integrates OpenAI’s Whisper ASR model and Google Translate machine translation to enable real-time, continuous speech transcription and translation and the processing of video and audio files. The application was developed using the Experimental method, incorporating standards for testing and evaluation. The integration expanded language coverage to 133 languages and improved translation accuracy. Efficiency was enhanced through the use of greedy parameters and the Faster Whisper model. Us

APA, Harvard, Vancouver, ISO, and other styles

11

Raghuvanshi, Deepansh. "Edu AI Summarizer." International Journal for Research in Applied Science and Engineering Technology 13, no. 4 (2025): 1373–76. https://doi.org/10.22214/ijraset.2025.68205.

Full text

Abstract:

Edu AI Summarizer serves as an innovative platform which enhance the educational video experience for students and teachers while helping researchers make the most of their study materials. The platform uses advanced speech-to-text technology along with natural language processing (NLP) to detect video topics after which it forms tidy summaries. Users receive downloadable PDFs containing straightforward well-prepared study materials derived from original video summaries. The system enhances student revision efficiency by providing clear content direction to educators and enables researchers to

APA, Harvard, Vancouver, ISO, and other styles

12

Mohamud, Osman Hamud, and Aydın Serpil. "Enhancing Conversational AI for Low-Resource Languages: A Case Study on Somali." International Journal of Innovative Science and Research Technology (IJISRT) 10, no. 2 (2025): 290–93. https://doi.org/10.5281/zenodo.14908879.

Full text

Abstract:

Conversational AI has made huge strides in understanding and generating human language. However, these advances have mostly benefited high-resource languages such as English and Spanish. In contrast, languages like Somali— spoken by an estimated 20 million people—lack the abundance of annotated data needed to develop robust language models. This study focuses on practical strategies to boost Somali text and speech processing capabilities. We explore three core approaches: (1) transfer learning, (2) synthetic data augmentation, and (3) fine-tuning multilingual models. Our experiment

APA, Harvard, Vancouver, ISO, and other styles

13

Rai, Anand Kumar, Siddharth D. Jaiswal, and Animesh Mukherjee. "A Deep Dive into the Disparity of Word Error Rates across Thousands of NPTEL MOOC Videos." Proceedings of the International AAAI Conference on Web and Social Media 18 (May 28, 2024): 1302–14. http://dx.doi.org/10.1609/icwsm.v18i1.31390.

Full text

Abstract:

Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of ~9.8K technical lectures in the English language along with their transcripts delivered b

APA, Harvard, Vancouver, ISO, and other styles

14

Hannon, Brendan, Yulia Kumar, J. Jenny Li, and Patricia Morreale. "Chef Dalle: Transforming Cooking with Multi-Model Multimodal AI." Computers 13, no. 7 (2024): 156. http://dx.doi.org/10.3390/computers13070156.

Full text

Abstract:

In an era where dietary habits significantly impact health, technological interventions can offer personalized and accessible food choices. This paper introduces Chef Dalle, a recipe recommendation system that leverages multi-model and multimodal human-computer interaction (HCI) techniques to provide personalized cooking guidance. The application integrates voice-to-text conversion via Whisper and ingredient image recognition through GPT-Vision. It employs an advanced recipe filtering system that utilizes user-provided ingredients to fetch recipes, which are then evaluated through multi-model

APA, Harvard, Vancouver, ISO, and other styles

15

Whitehill, Jacob, and Jennifer LoCasale-Crouch. "Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback." Journal of Educational Data Mining 16, no. 1 (2024): 33–60. https://doi.org/10.5281/zenodo.10974824.

Full text

Abstract:

With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching,we explore how Large Language Models (LLMs) can be used to estimate “Instructional Support”domain scores of the CLassroom Assessment Scoring System (CLASS), a widely used observation protocol.We design a machine learning architecture that uses either zero-shot prompting of Meta’s Llama2,and/or a classic Bag of Words (BoW) model, to classify individual utterances of teachers’ speech (transcribedautomatically using OpenAI’s Whisper) for the presence of Instruc

APA, Harvard, Vancouver, ISO, and other styles

16

AboSarafa, Maryam, and Mohamed Arteimi. "DEVELOPMENT OF SMART VOICE AGENT With case study (Libyan Voice Assistant)." Academy Journal For Basic and Applied Sciences 7, no. 1 (2025): 1–11. https://doi.org/10.5281/zenodo.15505226.

Full text

Abstract:

<strong><em>The paper presents the creation of&ensp;an end-to-end voice assistant system designed for a lesser-resourced dialect of Arabic, Libyan Tripolitanian, which does not receive local support in commercial ASR and NLP applications. To remediate this lack, we built a demographically balanced and phonemically rich corpus of speech data containing&ensp;over 13,000 audio samples. It contains both natural&ensp;and semi-structured utterances and is annotated using the CODA* orthography for dialectal Arabic. Using this dataset, we trained the OpenAI Whisper model with the Hugging Face Transfor

APA, Harvard, Vancouver, ISO, and other styles

17

Bazán-Gil, Virginia. "Inteligencia artificial en la preservación y puesta en valor de los archivos audiovisuales en el contexto territorial." Tábula, no. 27 (November 19, 2024): 227–40. http://dx.doi.org/10.51598/tab.1019.

Full text

Abstract:

Este artículo explora la integración de inteligencia artificial (IA) en el Archivo de RTVE para generar automáticamente metadatos y mejorar la accesibilidad de contenidos audiovisuales. La IA se ha implementado para optimizar la catalogación y recuperación de colecciones filmadas, especialmente los fondos más antiguos de RTVE. Desde las primeras pruebas en 2017 hasta la implementación de servicios en 2021 y 2023, se han mejorado 16.000 horas de contenido mediante tecnologías avanzadas de IA, como Whisper de OpenAI y GPT-3.5. Se describe la arquitectura del sistema, el flujo de archivos y los p

APA, Harvard, Vancouver, ISO, and other styles

18

Banjade, Shivraj, Hiran Patel, and Sangita Pokhrel. "Empowering Education by Developing and Evaluating Generative AI-Powered Tutoring System for Enhanced Student Learning." Journal of Artificial Intelligence and Capsule Networks 6, no. 3 (2024): 278–98. http://dx.doi.org/10.36548/jaicn.2024.3.003.

Full text

Abstract:

Personalized learning has always been a dream for schools, educators, and students but until recently, educators didn’t have the time or resources to implement it on a large scale. With the advancements in AI, Generative AI can automate many of a teacher’s core tasks, such as creating lesson resources. providing lesson structures and key talking points, designing infographics, creating slideshows, and converting text into videos and images. This study details the development and evaluation of an AI-powered tutoring system designed to enhance student learning experiences. Motivated by the trans

APA, Harvard, Vancouver, ISO, and other styles

19

Arkhipova, Zoya, and Valery Staver. "Comparative Analysis of Speech Transcription Technologies for the Digitalization of Technical Support Services." System Analysis & Mathematical Modeling 7, no. 1 (2025): 5–16. https://doi.org/10.17150/2713-1734.2025.7(1).5-16.

Full text

Abstract:

This article is dedicated to the application of neural network technologies to enhance the efficiency and quality of technical support services. The use of speech transcription technologies is becoming increasingly relevant due to the rising demands for high-quality information processing in various fields. The study examines the main approaches to speech transcription, including classical methods, deep learning-based solutions, hybrid approaches, as well as commercial and open-source tools. The research aims to conduct a comparative analysis of modern transcription systems to select and subse

APA, Harvard, Vancouver, ISO, and other styles

20

Dhamdhere, Rahul, Manthan Dhawale, Satyajeet Jagtap, Harsh Memane, Shashank Lahane, and Sneha Salvekar. "AI Summarizer: Interactive Multi-Modal Processing for Lectures, Meetings and Text Documents." International Journal For Multidisciplinary Research 7, no. 3 (2025). https://doi.org/10.36948/ijfmr.2025.v07i03.47071.

Full text

Abstract:

This paper introduces an AI-powered summarization system that processes both text and audio content—such as lectures and meetings—to improve productivity. It integrates OpenAI Whisper for transcription, Nomic embeddings for extractive summarization, and DeepSeek’s language model (via Ollama) for generating refined summaries and enabling chatbot interaction. The system runs locally using a Flask backend and HTML/JavaScript frontend. Whisper achieves a Word Error Rate (WER) of ~10%, and the system’s summarization accuracy averages 77.46%, as evaluated by Grok. Designed for students and professio

APA, Harvard, Vancouver, ISO, and other styles

21

Waghmare, Suhas, Chirag Brahme, Siddhi Panchal, Numaan Sayed, and Mohit Goud. "Comparative Analysis of State-of-the-Art Speech Recognition Models for Low-Resource Marathi Language." International Journal of Innovative Science and Research Technology (IJISRT), May 2, 2023, 1544–45. http://dx.doi.org/10.38124/ijisrt/ijisrt24apr1816.

Full text

Abstract:

In this research, we present a comparative analysis of two state-of-the-art speech recognition models, Whisper by OpenAI and XLSR Wave2vec by Facebook, applied to the low-resource Marathi language. Leveraging the Common Voice 16 dataset, we evaluated the performance of these models using the word error rate (WER) metric. Our findings reveal that the Whisper (Small) model achieved a WER of 45%, while the XLSR Wave2vec model obtained a WER of 71%. This study sheds light on the capabilities and limitations of current speech recognition technologies for low-resource languages and provides valuable

APA, Harvard, Vancouver, ISO, and other styles

22

Shaga, Samyak Vamshi. "AI-Powered Audio Summarization and Ethical Content Analysis Using OpenAI Whisper." SSRN Electronic Journal, 2025. https://doi.org/10.2139/ssrn.5219188.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Vivek, Kanji Malam. "Natural Language Processing-based Solution for Accurate Transcription and Translation of Distorted Multilingual Audio Signals." July 11, 2023. https://doi.org/10.5281/zenodo.8133465.

Full text

Abstract:

This research paper addresses the challenge of transcribing and translating noise-filled audio recordings that contain a mix of multiple languages and dialects. The objective is to develop a software-based tool capable of ingesting low-quality audio files, cleaning the signals, and creating accurate textual transcripts. The paper explores the unique difficulties posed by these recordings, including the presence of slang and local words not found in standard language models. Furthermore, the paper discusses the need for context- dependent translations and the provision of timestamps for efficie

APA, Harvard, Vancouver, ISO, and other styles

24

Morales-Muñoz, Walter, and Saúl Calderón-Ramírez. "Estimación de incertidumbre para un sistema de reconocimiento de voz." Revista Tecnología en Marcha, September 9, 2024. http://dx.doi.org/10.18845/tm.v37i7.7305.

Full text

Abstract:

Whisper es un sistema de reconocimiento de voz diseñado por la compañía OpenAI, dicho sistema ha sido entrenado con 680,000 horas de datos supervisados multilingües y multitarea recopilados de la web. La siguiente investigación tiene como objetivo adaptar y emplear la técnica de Monte Carlo Dropout utilizando datos audios etiquetados en español y contaminados con una cantidad de ruido y la distancia de Levensthein para estimar la incertidumbre de dicho sistema. Resultados preliminares muestran que existe una relación lineal entre la estimación de la incertiumbre utilizando la distancia Levenst

APA, Harvard, Vancouver, ISO, and other styles

25

Borglund, Erik, Martina Granholm, Catrin Johansson, and Peter Jonriksson. "Using Automatic Speech Recognition for Documenting Work in Municipal Emergency Operations Centers." Proceedings of the International ISCRAM Conference, May 6, 2025. https://doi.org/10.59297/pey4xp40.

Full text

Abstract:

Automatic speech recognition (ASR) and automatic documentation have not been widely explored in crisis management, despite their potential utility in facilitating the transcription of speech recordings. Although documentation is widely recognized as essential for creating a common operational picture, there is often a lack of such documentation, which can hinder understanding of events during and after a crisis. The novelty of the research is to apply existing technology and evaluate the potential of ASR technology in the domain of crisis management. We present preliminary results of using Ope

APA, Harvard, Vancouver, ISO, and other styles

26

Spiecker, Elivan Ricardo, Nícolas Sartori Emer, and Edson Moacir Ahlert. "EXPLORAÇÃO DA VOZ COMO MECÂNICA INTERATIVA EM JOGOS DIGITAIS: IMPACTOS NA EXPERIÊNCIA DO USUÁRIO E JOGABILIDADE." Revista Destaques Acadêmicos 16, no. 4 (2024). https://doi.org/10.22410/issn.2176-3070.v16i4a2024.4024.

Full text

Abstract:

Este trabalho investiga a interação por voz como uma mecânica inovadora em jogos digitais, com foco em RPGs 2D. A proposta utiliza o OpenAI Whisper para transformar comandos de voz em ações no jogo, criando uma experiência mais imersiva e natural para os jogadores. Entre os desafios técnicos abordados estão a adaptação a diferentes sotaques e a redução de ruídos, resolvidos com técnicas de pré-processamento acústico e treinamento contínuo do sistema. Além disso, o protótipo foi implementado na plataforma Unity, escolhida por sua flexibilidade e integração com bibliotecas externas. Os resultado

APA, Harvard, Vancouver, ISO, and other styles

27

Zhao, Robin, Anna S. G. Choi, Allison Koenecke, and Anaïs Rameau. "Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech." Laryngoscope, August 19, 2024. http://dx.doi.org/10.1002/lary.31713.

Full text

Abstract:

ObjectiveTo evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard‐of‐hearing (d/Dhh) speech.MethodsA corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech‐to‐text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility clas

APA, Harvard, Vancouver, ISO, and other styles

28

Tolle, Hannah, Maria del Mar Castro, Jonas Wachinger, et al. "From voice to ink (Vink): development and assessment of an automated, free-of-charge transcription tool." BMC Research Notes 17, no. 1 (2024). http://dx.doi.org/10.1186/s13104-024-06749-0.

Full text

Abstract:

Abstract Background Verbatim transcription of qualitative audio data is a cornerstone of analytic quality and rigor, yet the time and energy required for such transcription can drain resources, delay analysis, and hinder the timely dissemination of qualitative insights. In recent years, software programs have presented a promising mechanism to accelerate transcription, but the broad application of such programs has been constrained due to expensive licensing or “per-minute” fees, data protection concerns, and limited availability of such programs in many languages. In this article, we outline

APA, Harvard, Vancouver, ISO, and other styles

29

Nandkumar, Chandran, and Luka Peternel. "Enhancing supermarket robot interaction: an equitable multi-level LLM conversational interface for handling diverse customer intents." Frontiers in Robotics and AI 12 (April 29, 2025). https://doi.org/10.3389/frobt.2025.1576348.

Full text

Abstract:

This paper presents the design and evaluation of a comprehensive system to develop voice-based interfaces to support users in supermarkets. These interfaces enable shoppers to convey their needs through both generic and specific queries. Although customisable state-of-the-art systems like GPTs from OpenAI are easily accessible and adaptable, featuring low-code deployment with options for functional integration, they still face challenges such as increased response times and limitations in strategic control for tailored use cases and cost optimization. Motivated by the goal of crafting equitabl

APA, Harvard, Vancouver, ISO, and other styles

30

Graham, Calbert, and Nathan Roll. "Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits." JASA Express Letters 4, no. 2 (2024). http://dx.doi.org/10.1121/10.0024876.

Full text

Abstract:

This study investigates Whisper's automatic speech recognition (ASR) system performance across diverse native and non-native English accents. Results reveal superior recognition in American compared to British and Australian English accents with similar performance in Canadian English. Overall, native English accents demonstrate higher accuracy than non-native accents. Exploring connections between speaker traits [sex, native language (L1) typology, and second language (L2) proficiency] and word error rate uncovers notable associations. Furthermore, Whisper exhibits enhanced performance in rea

APA, Harvard, Vancouver, ISO, and other styles

31

Naffah, Ava, Valeria A. Pfeifer, and Matthias R. Mehl. "Spoken language analysis in aging research: The validity of AI-generated speech-to-text using OpenAI’s Whisper." Gerontology, March 13, 2025, 1–12. https://doi.org/10.1159/000545244.

Full text

Abstract:

INTRODUCTION: Studying what older adults say can provide important insights into cognitive, affective, and social aspects of aging. Available language analysis tools generally require audio-recorded speech to be transcribed into verbatim text, a task that has historically been performed by humans. However, recent advances in AI-based language processing open up the possibility of replacing this time and resource intensive task with fully automatic speech-to-text. METHODS: This study evaluates the accuracy of two common automatic speech-to-text tools–OpenAI’s Whisper and otter.ai–relative to hu

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!