Academic literature on the topic 'Speech Recognition and Transcription Technologies'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speech Recognition and Transcription Technologies.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speech Recognition and Transcription Technologies"

1

Drashti Mehta, Dr. Rocky Upadhyay, and Krishna Jariwala. "Integrating Speech Recognition and NLP for Efficient Transcription Solutions." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 11, no. 1 (2025): 1089–96. https://doi.org/10.32628/cseit2526479.

Full text
Abstract:
Speech recognition, an essential component of natural language processing (NLP), plays a pivotal role in enhancing communication and human-computer interaction. This paper reviews the advancements, challenges, and applications of speech recognition, natural language understanding (NLU), and chatbot technologies. Current speech recognition systems utilize techniques like Mel Frequency Cepstral Coefficients (MFCC) and Hidden Markov Models (HMM) to address linguistic errors, gender recognition failures, and inaccurate voice recognition. Applications such as voice assistants offer continuous interaction capabilities, enabling users, including those with disabilities, to perform tasks like web searches and document preparation. Additionally, we examine vulnerabilities in voice assistants, particularly in NLU components like Intent Classifiers, which can misinterpret user inputs and pose security risks. The transformative impact of deep neural networks (DNN) on speech recognition since 2010 is also discussed, alongside their application to fields like machine translation and image captioning. Furthermore, this paper highlights the evolution of chatbots, integrating NLU platforms like Google DialogFlow and IBM Watson, to deliver intelligent, adaptive interactions. By addressing challenges in intent recognition and system integration, this review underscores the potential of AI-driven solutions to revolutionize speech-based applications.
APA, Harvard, Vancouver, ISO, and other styles
2

Maurya, Maruti, Mohd Zaheer, Nawab Mohammad, Sadaf siddiqui, Mohd Zeeshan Khan, and Mohd Ayan Akram. "Speech Recognition Technologies: Design, Challenges, and Real-World Applications." International Journal of Innovative Research in Computer Science and Technology 13, no. 3 (2025): 55–61. https://doi.org/10.55524/ijircst.2025.13.3.9.

Full text
Abstract:
This paper presents an automated speech recognition (ASR) system that transcribes audio from YouTube videos into accurate text using OpenAI's Whisper model. Leveraging tools such as yt_dlp, FFmpeg, and PyTorch, the system creates a robust speech-to-text pipeline. On receiving a video URL, the system extracts and preprocesses audio, transcribes it using Whisper, and evaluates transcription quality through metrics like Word Error Rate (WER), Character Error Rate (CER), and Match Error Rate (MER). The pipeline supports offline use, making it suitable for accessible, cost-effective deployment in educational, research, and assistive applications.
APA, Harvard, Vancouver, ISO, and other styles
3

N, Bhanu Prakash. "Assistive Communication Web App: A Multi-Model Solution for Individuals with Disabilities”." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 01 (2025): 1–9. https://doi.org/10.55041/ijsrem40889.

Full text
Abstract:
The Assistive Communication Web App is an innovative platform designed to address the communication challenges faced by individuals with speech, hearing, and motor impairments. By integrating advanced technologies such as speech recognition, head-gaze tracking, and gesture-based sign language recognition, the app provides real-time speech-to-text, text-to-speech, and gesture-to-text functionalities. This research outlines the app's system architecture, methodology, and user interface, emphasizing its potential to enhance accessibility and empower users with greater independence. Through its multi-modal approach, the web app fosters inclusivity and sets a new benchmark for assistive technologies. Keywords: Assistive communication, Speech-to-text, Text-to-speech, Sign language recognition, Head gaze tracking, WebGazer.js, Real-time interaction, Gesture recognition, MediaPipe FaceMesh, TensorFlow HandPose, Web Speech API, Real-time transcription, Multilingual support, Cross-platform compatibility, Emotion detection, User-centered design, Modular architecture
APA, Harvard, Vancouver, ISO, and other styles
4

Shaji, Edwin Alex, Jerishab M. Jerishab M, Leya Thomas, M. Viraj Prabhu, and Asst Prof Chinchu M Pillai. "Survey on Speech Recognition and Retrieval-Augmented Generation." International Journal of Advances in Engineering and Management 06, no. 12 (2024): 75–81. https://doi.org/10.35629/5252-06127581.

Full text
Abstract:
Automatic speech recognition (ASR) and retrieval-augmented generation (RAG) systems have seen remarkable progress in handling multilingualism, noise robustness, real-time transcription, and knowledge-intensive tasks. The survey reviews 12 key papers that contribute to advancements in ASR and RAG, covering approaches like end-to-end multilingual models, noise-reduction techniques, and real-time speech processing. It also examines RAG systems that enhance generative models by integrating retrieval mechanisms for improved accuracy in tasks like question answering and summarization. By categorizing the papers into themes, this survey highlights key methodologies, compares their performance, and identifies future directions for improving ASR and RAG technologies in handling real-world challenges.
APA, Harvard, Vancouver, ISO, and other styles
5

Dibble, W. Flint. "Data Collection in Zooarchaeology: Incorporating Touch-Screen, Speech-Recognition, Barcodes, and GIS." Ethnobiology Letters 6, no. 2 (2015): 249–57. http://dx.doi.org/10.14237/ebl.6.2.2015.393.

Full text
Abstract:
When recording observations on specimens, zooarchaeologists typically use a pen and paper or a keyboard. However, the use of awkward terms and identification codes when recording thousands of specimens makes such data entry prone to human transcription errors. Improving the quantity and quality of the zooarchaeological data we collect can lead to more robust results and new research avenues. This paper presents design tools for building a customized zooarchaeological database that leverages accessible and affordable 21st century technologies. Scholars interested in investing time in designing a custom-database in common software (here, Microsoft Access) can take advantage of the affordable touch-screen, speech-recognition, and geographic information system (GIS) technologies described here. The efficiency that these approaches offer a research project far exceeds the time commitment a scholar must invest to deploy them.
APA, Harvard, Vancouver, ISO, and other styles
6

Kumar V, Suresh, Raveendra Nadh B, Sureshkumar S, Anisetty Suresh Kumar, Arun Raj S R, and Sheeba G. "Leveraging Artificial Neural Networks for Real-Time Speech Recognition in Voice-Activated Systems." ITM Web of Conferences 76 (2025): 01003. https://doi.org/10.1051/itmconf/20257601003.

Full text
Abstract:
It has further shaped the domain of real-time voice-controlled speech recognition systems. Things like language bias, computational expense and background noise have made strides more difficult in the past. This paper provides a novel view on these tasks, allowing for broader accessibility and real-world applicability of state-of-the-art models. We advocate a multi-dimensional methodology, including ad hoc model contextualization, tailored neural designs, and personalized learning strategies, to achieve voice and chip-height optimization. Existing speech recognition systems have a major limitation of being limited to few languages, dialects and accents. This study introduces a multilingual and multicultural model to address this issue and also provides access to the benefit of no-bias technologies to all parts of society. Out of these regional filters, it can work with more and work on them more precisely. In addition to this, the efficient structure also improves computational efficiency, meaning that the model is able to gain speed in real-time processing on low-power devices, fitting to rising demand for speech recognition in mobile and edge computing settings. Performance in contextual understanding remains a problem, errors in pronunciation or deviations from the dialect tend to lead to mistakes. As a result, the current study utilise semantic analysis and natural language processing (NLP) methods to assist understanding across different languages globally. These services are used in applications like medical/legal transcription or customer support, where correct transcription is critical. Additionally, the architecture enhances real-time processing by reducing latency and increasing responsiveness, which is critical in emergency response systems and autonomous vehicles where timely decision-making is crucial. By enhancing the efficiency and accuracy of ANN-based speech recognition, this research drives advancements in increasingly more accessible, effective, and reliable voice-activated technologies.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Ruijing. "A Comparative Analysis of LSTM and Transformer-based Automatic Speech Recognition Techniques." Transactions on Computer Science and Intelligent Systems Research 5 (August 12, 2024): 272–76. http://dx.doi.org/10.62051/zq6v0d49.

Full text
Abstract:
Automatic Speech Recognition (ASR) is a technology that leverages artificial intelligence to convert spoken language into written text. It utilizes machine learning algorithms, specifically deep learning models, to analyze audio signals and extract linguistic features. This technology has revolutionized the way that people interact with voice-enabled devices, enabling efficient and accurate transcription of human speech in various applications, including voice assistants, captioning, and transcription services. Among previous works for ASR, Long Short-Term Memory (LSTM) networks and Transformer-based methods are typical solutions towards effective ASR. In this paper, the author focuses on an in-depth exploration of the progression and comparative analysis of deep learning innovations within the ASR domain. This work starts with a foundational historical perspective, mapping the evolution from pioneering ASR systems to the current benchmarks: LSTM networks and Transformer-based models. The study meticulously evaluates these technologies, dissecting their strengths, weaknesses, and the potential they hold for future advancements in ASR.
APA, Harvard, Vancouver, ISO, and other styles
8

Kim, Minsu, Chae Won Kim, and Yong Man Ro. "Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (2023): 8273–81. http://dx.doi.org/10.1609/aaai.v37i7.25998.

Full text
Abstract:
Forced alignment refers to a technology that time-aligns a given transcription with a corresponding speech. However, as the forced alignment technologies have developed using speech audio, they might fail in alignment when the input speech audio is noise-corrupted or is not accessible. We focus on that there is another component that the speech can be inferred from, the speech video (i.e., talking face video). Since the drawbacks of audio-based forced alignment can be complemented using the visual information when the audio signal is under poor condition, we try to develop a novel video-based forced alignment method. However, different from audio forced alignment, it is challenging to develop a reliable visual forced alignment technology for the following two reasons: 1) Visual Speech Recognition (VSR) has a much lower performance compared to audio-based Automatic Speech Recognition (ASR), and 2) the translation from text to video is not reliable, so the method typically used for building audio forced alignment cannot be utilized in developing visual forced alignment. In order to alleviate these challenges, in this paper, we propose a new method that is appropriate for visual forced alignment, namely Deep Visual Forced Alignment (DVFA). The proposed DVFA can align the input transcription (i.e., sentence) with the talking face video without accessing the speech audio. Moreover, by augmenting the alignment task with anomaly case detection, DVFA can detect mismatches between the input transcription and the input video while performing the alignment. Therefore, we can robustly align the text with the talking face video even if there exist error words in the text. Through extensive experiments, we show the effectiveness of the proposed DVFA not only in the alignment task but also in interpreting the outputs of VSR models.
APA, Harvard, Vancouver, ISO, and other styles
9

S, R. Rakshitha, P. Naik Sahana, VS Sanjana, V. Suprasanna, and C. P. Nayana. "Real-Time Audio Transcription with Automated PDF Summarization and Contextual Insights." International Journal of Innovative Science and Research Technology (IJISRT) 9, no. 11 (2024): 2742–45. https://doi.org/10.5281/zenodo.14443356.

Full text
Abstract:
This research provides a comprehensive  examination of automated PDF summarization and real- time audio transcription systems, focusing on their  integration to derive contextual insights. The study explores recent advancements in automatic speech recognition (ASR) technologies that enable instantaneous conversion of spoken language into text, as well as techniques for both extractive and abstractive summarization of PDF 4 documents. The paper investigates how combining these technologies can enhance applications across various sectors, including media, business, healthcare, and education, by delivering real-time, contextually relevant information. It also addresses key industry challenges, such as handling complex documents, ensuring scalability, achieving high transcription accuracy, and managing noisy environments. The research concludes with a discussion on potential future developments, including improving multilingual capabilities, reducing biases in AI models, and enhancing system integration with other technologies to provide more efficient and personalized insights.
APA, Harvard, Vancouver, ISO, and other styles
10

SUNNY, Ms ANCY K. "A Novel Approach to Malayalam Speech-to-Text and Text-to-English Translation." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem33108.

Full text
Abstract:
This paper presents a novel approach to facilitate Malayalam speech-to-text transcription and subsequent translation into English text. The proposed system leverages advancements in speech recognition, natural language processing, and machine translation techniques. We demonstrate the effectiveness of our approach through a practical implementation and evaluation. Introduction: The ability to accurately transcribe spoken language and translate it into other languages has numerous applications in today's digital world. However, the development of such systems for languages with complex structures, such as Malayalam, presents unique challenges. In this paper, we propose a solution to address these challenges by combining state-of-the-art technologies in speech recognition and machine translation. Literature Review: Previous studies have explored various approaches to speech-to-text transcription and machine translation. However, few have focused specifically on the Malayalam language. Existing systems often struggle with accurately transcribing and translating Malayalam due to its complex morphology and syntax. Methodology: Our approach consists of several key steps: Speech Recognition : We employ the SpeechRecognition library to transcribe spoken Malayalam into text. Text Preprocessing: The transcribed text undergoes preprocessing, including tokenization and normalization, using the IndicNLP library. Translation: The preprocessed text is translated into English using a custom-built translation model implemented with CTranslate2 and SentencePiece. Results: We evaluated our system using a dataset of spoken Malayalam sentences. The system achieved a high accuracy in speech recognition and produced fluent translations into English. Discussion: Our results demonstrate the feasibility and effectiveness of our approach in accurately transcribing and translating spoken Malayalam. However, certain challenges remain, such as handling dialectal variations and improving translation quality for complex sentences. Conclusion: In conclusion, we have presented a novel approach to Malayalam speech-to-text transcription and text-to-English translation. Our system shows promising results and opens up possibilities for further research and development in this area. References: [1] S. K. Sheshadri, B. S. Bharath, A. H. N. S. C. Sarvani, P. R. V. B. Reddy, and D. Gupta, “Unsupervised neural machine translation for english to kannada using pre-trained language model,” pp. 1–5, 2022. [2] A. H. Patil, S. S. Patil, S. M. Patil, and T. P. Nagarhalli, “Real time machine translation system between indian languages,” pp. 1778–1783, 2022.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Speech Recognition and Transcription Technologies"

1

Silvestre, Cerdà Joan Albert. "Different Contributions to Cost-Effective Transcription and Translation of Video Lectures." Doctoral thesis, Universitat Politècnica de València, 2016. http://hdl.handle.net/10251/62194.

Full text
Abstract:
[EN] In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions.<br>[ES] Durante estos últimos años, los repositorios multimedia on-line han experimentado un gran crecimiento que les ha hecho establecerse como fuentes fundamentales de conocimiento, especialmente en el área de la educación, donde se han creado grandes repositorios de vídeo charlas educativas para complementar e incluso reemplazar los métodos de enseñanza tradicionales. No obstante, la mayoría de estas charlas no están transcritas ni traducidas debido a la ausencia de soluciones de bajo coste que sean capaces de hacerlo garantizando una calidad mínima aceptable. Soluciones de este tipo son claramente necesarias para hacer que las vídeo charlas sean más accesibles para hablantes de otras lenguas o para personas con discapacidades auditivas. Además, dichas soluciones podrían facilitar la aplicación de funciones de búsqueda y de análisis tales como clasificación, recomendación o detección de plagios, así como el desarrollo de funcionalidades educativas avanzadas, como por ejemplo la generación de resúmenes automáticos de contenidos para ayudar al estudiante a tomar apuntes. Por este motivo, el principal objetivo de esta tesis es desarrollar una solución de bajo coste capaz de transcribir y traducir vídeo charlas con un nivel de calidad razonable. Más específicamente, abordamos la integración de técnicas estado del arte de Reconocimiento del Habla Automático y Traducción Automática en grandes repositorios de vídeo charlas educativas para la generación de subtítulos multilingües de alta calidad sin requerir intervención humana y con un reducido coste computacional. Además, también exploramos los beneficios potenciales que conllevaría la explotación de la información de la que disponemos a priori sobre estos repositorios, es decir, conocimientos específicos sobre las charlas tales como el locutor, la temática o las transparencias, para crear sistemas de transcripción y traducción especializados mediante técnicas de adaptación masiva. Las soluciones propuestas en esta tesis han sido testeadas en escenarios reales llevando a cabo nombrosas evaluaciones objetivas y subjetivas, obteniendo muy buenos resultados. El principal legado de esta tesis, The transLectures-UPV Platform, ha sido liberado públicamente como software de código abierto, y, en el momento de escribir estas líneas, está sirviendo transcripciones y traducciones automáticas para diversos miles de vídeo charlas educativas en nombrosas universidades e instituciones Españolas y Europeas.<br>[CAT] Durant aquests darrers anys, els repositoris multimèdia on-line han experimentat un gran creixement que els ha fet consolidar-se com a fonts fonamentals de coneixement, especialment a l'àrea de l'educació, on s'han creat grans repositoris de vídeo xarrades educatives per tal de complementar o inclús reemplaçar els mètodes d'ensenyament tradicionals. No obstant això, la majoria d'aquestes xarrades no estan transcrites ni traduïdes degut a l'absència de solucions de baix cost capaces de fer-ho garantint una qualitat mínima acceptable. Solucions d'aquest tipus són clarament necessàries per a fer que les vídeo xarres siguen més accessibles per a parlants d'altres llengües o per a persones amb discapacitats auditives. A més, aquestes solucions podrien facilitar l'aplicació de funcions de cerca i d'anàlisi tals com classificació, recomanació o detecció de plagis, així com el desenvolupament de funcionalitats educatives avançades, com per exemple la generació de resums automàtics de continguts per ajudar a l'estudiant a prendre anotacions. Per aquest motiu, el principal objectiu d'aquesta tesi és desenvolupar una solució de baix cost capaç de transcriure i traduir vídeo xarrades amb un nivell de qualitat raonable. Més específicament, abordem la integració de tècniques estat de l'art de Reconeixement de la Parla Automàtic i Traducció Automàtica en grans repositoris de vídeo xarrades educatives per a la generació de subtítols multilingües d'alta qualitat sense requerir intervenció humana i amb un reduït cost computacional. A més, també explorem els beneficis potencials que comportaria l'explotació de la informació de la que disposem a priori sobre aquests repositoris, és a dir, coneixements específics sobre les xarrades tals com el locutor, la temàtica o les transparències, per a crear sistemes de transcripció i traducció especialitzats mitjançant tècniques d'adaptació massiva. Les solucions proposades en aquesta tesi han estat testejades en escenaris reals duent a terme nombroses avaluacions objectives i subjectives, obtenint molt bons resultats. El principal llegat d'aquesta tesi, The transLectures-UPV Platform, ha sigut alliberat públicament com a programari de codi obert, i, en el moment d'escriure aquestes línies, està servint transcripcions i traduccions automàtiques per a diversos milers de vídeo xarrades educatives en nombroses universitats i institucions Espanyoles i Europees.<br>Silvestre Cerdà, JA. (2016). Different Contributions to Cost-Effective Transcription and Translation of Video Lectures [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62194<br>TESIS
APA, Harvard, Vancouver, ISO, and other styles
2

Valor, Miró Juan Daniel. "Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/90496.

Full text
Abstract:
Nowadays, the technology enhanced learning area has experienced a strong growth with many new learning approaches like blended learning, flip teaching, massive open online courses, and open educational resources to complement face-to-face lectures. Specifically, video lectures are fast becoming an everyday educational resource in higher education for all of these new learning approaches, and they are being incorporated into existing university curricula around the world. Transcriptions and translations can improve the utility of these audiovisual assets, but rarely are present due to a lack of cost-effective solutions to do so. Lecture searchability, accessibility to people with impairments, translatability for foreign students, plagiarism detection, content recommendation, note-taking, and discovery of content-related videos are examples of advantages of the presence of transcriptions. For this reason, the aim of this thesis is to test in real-life case studies ways to obtain multilingual captions for video lectures in a cost-effective way by using state-of-the-art automatic speech recognition and machine translation techniques. Also, we explore interaction protocols to review these automatic transcriptions and translations, because unfortunately automatic subtitles are not error-free. In addition, we take a step further into multilingualism by extending our findings and evaluation to several languages. Finally, the outcomes of this thesis have been applied to thousands of video lectures in European universities and institutions.<br>Hoy en día, el área del aprendizaje mejorado por la tecnología ha experimentado un fuerte crecimiento con muchos nuevos enfoques de aprendizaje como el aprendizaje combinado, la clase inversa, los cursos masivos abiertos en línea, y nuevos recursos educativos abiertos para complementar las clases presenciales. En concreto, los videos docentes se están convirtiendo rápidamente en un recurso educativo cotidiano en la educación superior para todos estos nuevos enfoques de aprendizaje, y se están incorporando a los planes de estudios universitarios existentes en todo el mundo. Las transcripciones y las traducciones pueden mejorar la utilidad de estos recursos audiovisuales, pero rara vez están presentes debido a la falta de soluciones rentables para hacerlo. La búsqueda de y en los videos, la accesibilidad a personas con impedimentos, la traducción para estudiantes extranjeros, la detección de plagios, la recomendación de contenido, la toma de notas y el descubrimiento de videos relacionados son ejemplos de las ventajas de la presencia de transcripciones. Por esta razón, el objetivo de esta tesis es probar en casos de estudio de la vida real las formas de obtener subtítulos multilingües para videos docentes de una manera rentable, mediante el uso de técnicas avanzadas de reconocimiento automático de voz y de traducción automática. Además, exploramos diferentes modelos de interacción para revisar estas transcripciones y traducciones automáticas, pues desafortunadamente los subtítulos automáticos no están libres de errores. Además, damos un paso más en el multilingüismo extendiendo nuestros hallazgos y evaluaciones a muchos idiomas. Por último, destacar que los resultados de esta tesis se han aplicado a miles de vídeos docentes en universidades e instituciones europeas.<br>Hui en dia, l'àrea d'aprenentatge millorat per la tecnologia ha experimentat un fort creixement, amb molts nous enfocaments d'aprenentatge com l'aprenentatge combinat, la classe inversa, els cursos massius oberts en línia i nous recursos educatius oberts per tal de complementar les classes presencials. En concret, els vídeos docents s'estan convertint ràpidament en un recurs educatiu quotidià en l'educació superior per a tots aquests nous enfocaments d'aprenentatge i estan incorporant-se als plans d'estudi universitari existents arreu del món. Les transcripcions i les traduccions poden millorar la utilitat d'aquests recursos audiovisuals, però rara vegada estan presents a causa de la falta de solucions rendibles per fer-ho. La cerca de i als vídeos, l'accessibilitat a persones amb impediments, la traducció per estudiants estrangers, la detecció de plagi, la recomanació de contingut, la presa de notes i el descobriment de vídeos relacionats són un exemple dels avantatges de la presència de transcripcions. Per aquesta raó, l'objectiu d'aquesta tesi és provar en casos d'estudi de la vida real les formes d'obtenir subtítols multilingües per a vídeos docents d'una manera rendible, mitjançant l'ús de tècniques avançades de reconeixement automàtic de veu i de traducció automàtica. A més a més, s'exploren diferents models d'interacció per a revisar aquestes transcripcions i traduccions automàtiques, puix malauradament els subtítols automàtics no estan lliures d'errades. A més, es fa un pas més en el multilingüisme estenent els nostres descobriments i avaluacions a molts idiomes. Per últim, destacar que els resultats d'aquesta tesi s'han aplicat a milers de vídeos docents en universitats i institucions europees.<br>Valor Miró, JD. (2017). Evaluation of innovative computer-assisted transcription and translation strategies for video lecture repositories [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90496<br>TESIS
APA, Harvard, Vancouver, ISO, and other styles
3

Shah, Afnan Arafat. "Improving automatic speech recognition transcription through signal processing." Thesis, University of Southampton, 2017. https://eprints.soton.ac.uk/418970/.

Full text
Abstract:
Automatic speech recognition (ASR) in the educational environment could be a solution to address the problem of gaining access to the spoken words of a lecture for many students who find lectures hard to understand, such as those whose mother tongue is not English or who have a hearing impairment. In such an environment, it is difficult for ASR to provide transcripts with Word Error Rates (WER) less than 25% for the wide range of speakers. Reducing the WER reduces the time and therefore cost of correcting errors in the transcripts. To deal with the variation of acoustic features between speakers, ASR systems implement automatic vocal tract normalisation (VTN) that warps the formants (resonant frequencies) of the speaker to better match the formants of the speakers in the training set. The ASR also implements automatic dynamic time warping (DTW) to deal with variation in the speaker’s rate of speaking, by aligning the time series of the new spoken words with the time series of the matching spoken words of the training set. This research investigates whether the ASR’s automatic estimation of VTN and DTW can be enhanced through pre-processing the recording by manually warping the formants and speaking rate of the recordings using sound processing libraries (Rubber Band and SoundTouch) before transcribing the pre-processed recordings using ASR. An initial experiment, performed with the recordings of two male and two female speakers, showed that pre-processing the recording could improve the WER by an average of 39.5% for male speakers and 36.2% for female speakers. However the selection of the best warp factors was achieved through an iterative ‘trial and error’ approach that involved many hours calculating the word error rate for each warp factor setting. Finding a more efficient approach for selecting the warp factors for pre-processing was then investigated. The second experiment investigated the development of a modification function using, as its training set, the best warp factors from the ‘trial and error’ approach to estimate the modification percentage required to improve the WER of a recording. A modification function was found that on average improved the WER by 16% for female speakers and 7% for male speakers.
APA, Harvard, Vancouver, ISO, and other styles
4

Sundaram, Ramasubramanian H. "Effects of transcription errors on supervised learning in speech recognition." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-06132003-120252.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rangarajan, Vibhav Shyam 1980. "Interfacing speech recognition an vision guided microphone array technologies." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/29687.

Full text
Abstract:
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.<br>Includes bibliographical references (p. 57-58).<br>One goal of a pervasive computing environment is to allow the user to interact with the environment in an easy and natural manner. The use of spoken commands, as inputs to a speech recognition system, is one such way to naturally interact with the environment. In challenging acoustic environments, microphone arrays can improve the quality of the input audio signal by beamforming, or steering, to the location of the speaker of interest. The existence of multiple speakers, large interfering signals and/or reverberations or reflections in the audio signal(s) requires the use of advanced beamforming techniques which attempt to separate the target audio from the mixed signal received at the microphone array. In this thesis I present and evaluate a method of modeling reverberations as separate anechoic interfering sources emanating from fixed locations. This acoustic modelling technique allows for tracking of acoustic changes in the environment, such as those caused by speaker motion.<br>by Vibhav Shyam Rangarajan.<br>M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
6

Girerd, Daniel. "Strategic Selection of Training Data for Domain-Specific Speech Recognition." DigitalCommons@CalPoly, 2018. https://digitalcommons.calpoly.edu/theses/1847.

Full text
Abstract:
Speech recognition is now a key topic in computer science with the proliferation of voice-activated assistants, and voice-enabled devices. Many companies over a speech recognition service for developers to use to enable smart devices and services. These speech-to-text systems, however, have significant room for improvement, especially in domain specific speech. IBM's Watson speech-to-text service attempts to support domain specific uses by allowing users to upload their own training data for making custom models that augment Watson's general model. This requires deciding a strategy for picking the training model. This thesis experiments with different training choices for custom language models that augment Watson's speech to text service. The results show that using recent utterances is the best choice of training data in our use case of Digital Democracy. We are able to improve speech recognition accuracy by 2.3% percent over the control with no custom model. However, choosing training utterances most specific to the use case is better when large enough volumes of such training data is available.
APA, Harvard, Vancouver, ISO, and other styles
7

Tran, Thao, and Nathalie Tkauc. "Face recognition and speech recognition for access control." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-39776.

Full text
Abstract:
This project is a collaboration with the company JayWay in Halmstad. In order to enter theoffice today, a tag-key is needed for the employees and a doorbell for the guests. If someonerings the doorbell, someone on the inside has to open the door manually which is consideredas a disturbance during work time. The purpose with the project is to minimize thedisturbances in the office. The goal with the project is to develop a system that uses facerecognition and speech-to-text to control the lock system for the entrance door. The components used for the project are two Raspberry Pi’s, a 7 inch LCD-touch display, aRaspberry Pi Camera Module V2, a external sound card, a microphone and speaker. Thewhole project was written in Python and the platform used was Amazon Web Services (AWS)for storage and the face recognition while speech-to-text was provided by Google.The system is divided in three functions for employees, guests and deliveries. The employeefunction has two authentication steps, the face recognition and a random generated code that needs to be confirmed to avoid biometric spoofing. The guest function includes the speech-to-text service to state an employee's name that the guest wants to meet and the employee is then notified. The delivery function informs the specific persons in the office that are responsiblefor the deliveries by sending a notification.The test proves that the system will always match with the right person when using the facerecognition. It also shows what the threshold for the face recognition can be set to, to makesure that only authorized people enters the office.Using the two steps authentication, the face recognition and the code makes the system secureand protects the system against spoofing. One downside is that it is an extra step that takestime. The speech-to-text is set to swedish and works quite well for swedish-speaking persons.However, for a multicultural company it can be hard to use the speech-to-text service. It canalso be hard for the service to listen and translate if there is a lot of background noise or ifseveral people speak at the same time.
APA, Harvard, Vancouver, ISO, and other styles
8

Du, Toit A. (Andre). "Automatic classification of spoken South African English variants using a transcription-less speech recognition approach." Thesis, Stellenbosch : Stellenbosch University, 2004. http://hdl.handle.net/10019.1/49866.

Full text
Abstract:
Thesis (MEng)--University of Stellenbosch, 2004.<br>ENGLISH ABSTRACT: We present the development of a pattern recognition system which is capable of classifying different Spoken Variants (SVs) of South African English (SAE) using a transcriptionless speech recognition approach. Spoken Variants (SVs) allow us to unify the linguistic concepts of accent and dialect from a pattern recognition viewpoint. The need for the SAE SV classification system arose from the multi-linguality requirement for South African speech recognition applications and the costs involved in developing such applications.<br>AFRIKAANSE OPSOMMING: Ons beskryf die ontwikkeling van 'n patroon herkenning stelsel wat in staat is om verskillende Gesproke Variante (GVe) van Suid Afrikaanse Engels (SAE) te klassifiseer met behulp van 'n transkripsielose spraak herkenning metode. Gesproke Variante (GVe) stel ons in staat om die taalkundige begrippe van aksent en dialek te verenig vanuit 'n patroon her kenning oogpunt. Die behoefte aan 'n SAE GV klassifikasie stelsel het ontstaan uit die meertaligheid vereiste vir Suid Afrikaanse spraak herkenning stelsels en die koste verbonde aan die ontwikkeling van sodanige stelsels.
APA, Harvard, Vancouver, ISO, and other styles
9

Harris, Leroy W. "Feasibility study of speech recognition technologies for operating within a medical First Responder's environment." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2000. http://handle.dtic.mil/100.2/ADA386380.

Full text
Abstract:
Thesis (M.S. in Iinformation Systems Technology) Naval Postgraduate School, Dec. 2000.<br>Thesis advisor(s): Monigue P. Fargues, Ray T. Clifford, Douglas E. Brinkley. "December 2000." Includes bibliographical references (p. 55-56). Also available in print.
APA, Harvard, Vancouver, ISO, and other styles
10

De, Villiers Pieter Theunis. "Lecture transcription systems in resource-scarce environments / Pieter Theunis de Villiers." Thesis, North-West University, 2014. http://hdl.handle.net/10394/10620.

Full text
Abstract:
Classroom note taking is a fundamental task performed by learners on a daily basis. These notes provide learners with valuable offline study material, especially in the case of more difficult subjects. The use of class notes has been found to not only provide students with a better learning experience, but also leads to an overall higher academic performance. In a previous study, an increase of 10.5% in student grades was observed after these students had been provided with multimedia class notes. This is not surprising, as other studies have found that the rate of successful transfer of information to humans increases when provided with both visual and audio information. Note taking might seem like an easy task; however, students with hearing impairments, visual impairments, physical impairments, learning disabilities or even non-native listeners find this task very difficult to impossible. It has also been reported that even non-disabled students find note taking time consuming and that it requires a great deal of mental effort while also trying to pay full attention to the lecturer. This is illustrated by a study where it was found that college students were only able to record ~40% of the data presented by the lecturer. It is thus reasonable to expect an automatic way of generating class notes to be beneficial to all learners. Lecture transcription (LT) systems are used in educational environments to assist learners by providing them with real-time in-class transcriptions or recordings and transcriptions for offline use. Such systems have already been successfully implemented in the developed world where all required resources were easily obtained. These systems are typically trained on hundreds to thousands of hours of speech while their language models are trained on millions or even hundreds of millions of words. These amounts of data are generally not available in the developing world. In this dissertation, a number of approaches toward the development of LT systems in resource-scarce environments are investigated. We focus on different approaches to obtaining sufficient amounts of well transcribed data for building acoustic models, using corpora with few transcriptions and of variable quality. One approach investigates the use of alignment using a dynamic programming phone string alignment procedure to harvest as much usable data as possible from approximately transcribed speech data. We find that target-language acoustic models are optimal for this purpose, but encouraging results are also found when using models from another language for alignment. Another approach entails using unsupervised training methods where an initial low accuracy recognizer is used to transcribe a set of untranscribed data. Using this poorly transcribed data, correctly recognized portions are extracted based on a word confidence threshold. The initial system is retrained along with the newly recognized data in order to increase its overall accuracy. The initial acoustic models are trained using as little as 11 minutes of transcribed speech. After several iterations of unsupervised training, a noticeable increase in accuracy was observed (47.79% WER to 33.44% WER). Similar results were however found (35.97% WER) after using a large speaker-independent corpus to train the initial system. Usable LMs were also created using as few as 17955 words from transcribed lectures; however, this resulted in large out-of-vocabulary rates. This problem was solved by means of LM interpolation. LM interpolation was found to be very beneficial in cases where subject specific data (such as lecture slides and books) was available. We also introduce our NWU LT system, which was developed for use in learning environments and was designed using a client/server based architecture. Based on the results found in this study we are confident that usable models for use in LT systems can be developed in resource-scarce environments.<br>MSc (Computer Science), North-West University, Vaal Triangle Campus, 2014
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "Speech Recognition and Transcription Technologies"

1

Mihelič, France, and Janez Žibert. Speech recognition: Technologies and applications. I-Tech Education and Publishing, 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kutza, Patricia. Voice recognition: Technologies, markets, opportunities. Business Communications Co., 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Estonia) Baltic Conference on Human Language Technologies (5th 2012 Tartu. Human language technologies: The Baltic perspective : proceedings of the Fifth International Conference Baltic HLT 2012. IOS Press, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Baltic Conference on Human Language Technologies (2nd 2005 Tallinn, Estonia). The second Baltic Conference on Human Language Technologies: Proceedings, April 4-5, 2005, Tallinn, Estonia. Edited by Langemets Margit and Penjam Priit. Institute of Cybernetics, Tallinn University of Technology, 2005.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Baltic Conference on Human Language Technologies (4th 2010 Rīga, Latvia). Human language technologies: The Baltic perspective : proceedings of the fourth International Conference, Baltic HLT 2010. IOS Press, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Association canadienne-francaise pour l'avancement des sciences. Congrès. Les techniques d'intelligence artificielle appliquées aux technologies de l'information: Réflexions sur les approches neuroniques, symboliques et numériques appliquées à la vision, l'écrit, la parole et le biomédical : actes du Colloque multidisciplinaire L'intelligence artificielle dans les technologies de l'information tenu dans le cadre du Congrès de l'Acfas à Montréal en mai 1996. ACFAS, 1997.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rachel, Bowers, Fiscus Jonathan G, and SpringerLink (Online service), eds. Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers. Springer-Verlag Berlin Heidelberg, 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

1959-, Dybkjær Laila, Minker Wolfgang, Neumann Heiko, Pieraccini Roberto, Weber Michael, and SpringerLink (Online service), eds. Perception in Multimodal Dialogue Systems: 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, PIT 2008, Kloster Irsee, Germany, June 16-18, 2008. Proceedings. Springer-Verlag Berlin Heidelberg, 2008.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lamel, Lori, and Jean-Luc Gauvain. Speech Recognition. Edited by Ruslan Mitkov. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0016.

Full text
Abstract:
Speech recognition is concerned with converting the speech waveform, an acoustic signal, into a sequence of words. Today's approaches are based on a statistical modellization of the speech signal. This article provides an overview of the main topics addressed in speech recognition, which are, acoustic-phonetic modelling, lexical representation, language modelling, decoding, and model adaptation. Language models are used in speech recognition to estimate the probability of word sequences. The main components of a generic speech recognition system are, main knowledge sources, feature analysis, and acoustic and language models, which are estimated in a training phase, and the decoder. The focus of this article is on methods used in state-of-the-art speaker-independent, large-vocabulary continuous speech recognition (LVCSR). Primary application areas for such technology are dictation, spoken language dialogue, and transcription for information archival and retrieval systems. Finally, this article discusses issues and directions of future research.
APA, Harvard, Vancouver, ISO, and other styles
10

Keyes, Bettye A. Voice Writing Method - Dragon Professional Individual 16: Mastering Realtime Transcription with Speech Recognition. Voice Writing Method, 2023.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Speech Recognition and Transcription Technologies"

1

Wendemuth, Andreas, Bogdan Vlasenko, Ingo Siegert, Ronald Böck, Friedhelm Schwenker, and Günther Palm. "Emotion Recognition from Speech." In Cognitive Technologies. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-43665-4_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Toselli, Alejandro Héctor, Enrique Vidal, and Francisco Casacuberta. "Computer Assisted Transcription of Speech Signals." In Multimodal Interactive Pattern Recognition and Applications. Springer London, 2011. http://dx.doi.org/10.1007/978-0-85729-479-1_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Santos, Lyndainês, Nícolas de Araújo Moreira, Robson Sampaio, Raizielle Lima, and Francisco Carlos Mattos Brito Oliveira. "Speech Recognition Using HMM-CNN." In Information Systems and Technologies. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-45642-8_51.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Batliner, Anton, Björn Schuller, Dino Seppi, et al. "The Automatic Recognition of Emotions in Speech." In Cognitive Technologies. Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-15184-2_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Elmahdy, Mohamed, Rainer Gruhn, and Wolfgang Minker. "Phonetic Transcription Using the Arabic Chat Alphabet." In Novel Techniques for Dialectal Arabic Speech Recognition. Springer US, 2012. http://dx.doi.org/10.1007/978-1-4614-1906-8_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Gravier, Guillaume, Francois Yvon, Bruno Jacob, and Frédéric Bimbot. "Introducing Contextual Transcription Rules in Large Vocabulary Speech Recognition." In Text, Speech and Language Technology. Springer Netherlands, 2005. http://dx.doi.org/10.1007/1-4020-2637-4_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Lojka, Martin, Peter Viszlay, Ján Staš, Daniel Hládek, and Jozef Juhár. "Slovak Broadcast News Speech Recognition and Transcription System." In Advances in Network-Based Information Systems. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-98530-5_32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Matassoni, Marco, Fabio Brugnara, and Roberto Gretter. "Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-35828-9_30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Biagetti, Giorgio, Paolo Crippa, Alessandro Curzi, Laura Falaschetti, Simone Orcioni, and Claudio Turchetti. "Distributed Speech Recognition for Lighting System Control." In Intelligent Decision Technologies. Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-19857-6_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mo, Zhanhong, Jiangyan Qi, and Cunmi Song. "Intelligent Community Embedded Speech Recognition System Research." In Intelligent Decision Technologies. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-29920-9_39.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Speech Recognition and Transcription Technologies"

1

Borase, Tanushree, and Thamizhamuthu R. "Speech Recognition and Transcription." In 2025 International Conference on Artificial Intelligence and Data Engineering (AIDE). IEEE, 2025. https://doi.org/10.1109/aide64228.2025.10986937.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Von Neumann, Thilo, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix, and Reinhold Haeb-Umbach. "Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization." In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE, 2024. http://dx.doi.org/10.1109/icasspw62465.2024.10625894.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Popun, Jettasic, Wilaiporn Lee, and Akara Prayote. "Automatic Speech Recognition Techniques for Transcription of Thai Traditional Medicine Texts." In 2024 21st International SoC Design Conference (ISOCC). IEEE, 2024. http://dx.doi.org/10.1109/isocc62682.2024.10762147.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Saha, Diptajit, Aniket Das, Raj Saha, Mrinmoy Guria, Debrupa Pal, and Debopriya Dey. "Real-Time Voice: A Comprehensive Survey of Automatic Speech Recognition and Transcription." In 2025 International Conference on Computer, Electrical & Communication Engineering (ICCECE). IEEE, 2025. https://doi.org/10.1109/iccece61355.2025.10940301.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Almahmood, Abdulla, Hesham Al-Ammal, and Fatema Albalooshi. "Enhancing Speech-to-Text Transcription Accuracy for the Bahraini Dialect." In 2024 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). IEEE, 2024. https://doi.org/10.1109/3ict64318.2024.10824280.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

D, Sasikala, and Shaik Huzaifa Fazil. "Enhancing Communication: Utilizing Transfer Learning for Improved Speech-to-Text Transcription." In 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2024. http://dx.doi.org/10.1109/icccnt61001.2024.10725694.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Deshmukh, Jawad, Sharique Shah, Asif Shaikh, Ashfan Bargir, and Salim Shaikh. "G2OCR: Integrating Speech Recognition and Optical Character Recognition(OCR) for Automated Transcription of Gujarati Audio-Visual Content." In 2024 4th International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS). IEEE, 2024. https://doi.org/10.1109/icuis64676.2024.10866606.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Roushan, Rakesh, Harshit Mishra, Lucky Yadav, Sreeja Koppula, Nitya Tiwari, and K. S. Nataraj. "Optimizing Speech Recognition for Medical Transcription: Fine-Tuning Whisper and Developing a Web Application." In 2024 IEEE Conference on Engineering Informatics (ICEI). IEEE, 2024. https://doi.org/10.1109/icei64305.2024.10912421.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Migel, Serhii, Maksym Zaliskyi, Roman Odarchenko, Zarina Poberezhna, Alina Osipchuk, and Oleksandr Lavrynenko. "Speech Recognition System for Ukrainian Language." In 2024 14th International Conference on Advanced Computer Information Technologies (ACIT). IEEE, 2024. http://dx.doi.org/10.1109/acit62333.2024.10712557.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mujtaba, Dena, Nihar Mahapatra, Megan Arney, et al. "Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech." In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.naacl-long.269.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Speech Recognition and Transcription Technologies"

1

Holzrichter, J. F. New Ideas for Speech Recognition and Related Technologies. Office of Scientific and Technical Information (OSTI), 2002. http://dx.doi.org/10.2172/15004194.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Marx, Matt, Joshua Gans, and David Hsu. Dynamic Commercialization Strategies for Disruptive Technologies: Evidence from the Speech Recognition Industry. National Bureau of Economic Research, 2013. http://dx.doi.org/10.3386/w19764.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Fatehifar, Mohsen, Josef Schlittenlacher, David Wong, and Kevin Munro. Applications Of Automatic Speech Recognition And Text-To-Speech Models To Detect Hearing Loss: A Scoping Review Protocol. INPLASY - International Platform of Registered Systematic Review and Meta-analysis Protocols, 2023. http://dx.doi.org/10.37766/inplasy2023.1.0029.

Full text
Abstract:
Review question / Objective: This scoping review aims to identify published methods that have used automatic speech recognition or text-to-speech recognition technologies to detect hearing loss and report on their accuracy and limitations. Condition being studied: Hearing enables us to communicate with the surrounding world. According to reports by the World Health Organization, 1.5 billion suffer from some degree of hearing loss of which 430 million require medical attention. It is estimated that by 2050, 1 in every 4 people will experience some sort of hearing disability. Hearing loss can significantly impact people’s ability to communicate and makes social interactions a challenge. In addition, it can result in anxiety, isolation, depression, hindrance of learning, and a decrease in general quality of life. A hearing assessment is usually done in hospitals and clinics with special equipment and trained staff. However, these services are not always available in less developed countries. Even in developed countries, like the UK, access to these facilities can be a challenge in rural areas. Moreover, during a crisis like the Covid-19 pandemic, accessing the required healthcare can become dangerous and challenging even in large cities.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!