To see the other types of publications on this topic, follow the link: Character Error Rate (CER).

Journal articles on the topic 'Character Error Rate (CER)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Character Error Rate (CER).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Tilkar, Swati. "Generating Meeting Transcription Using Natural Language Processing." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 06 (2025): 1–9. https://doi.org/10.55041/ijsrem51091.

Full text
Abstract:
Natural Language Processing plays a pivotal role in automating the transcription of meetings. It enables machines to understand, interpret, and generate human language. In meeting transcription, NLP components such as Automatic Speech Recognition (ASR), speaker diarization, entity recognition, summarization, and sentiment analysis work together to produce accurate and readable transcripts. ASR converts spoken words into text, while NLP refines the raw output by correcting grammatical errors, identifying speakers, and structuring dialogue for readability and comprehension. Ethical consideration
APA, Harvard, Vancouver, ISO, and other styles
2

Chen, Zeyuan, Cheng Zhong, and Danyang Chen. "A syllable-character collaborative model for enhanced Pinyin and Chinese recognition." PLOS One 20, no. 7 (2025): e0325045. https://doi.org/10.1371/journal.pone.0325045.

Full text
Abstract:
In Chinese speech recognition, end-to-end speech recognition models usually use Chinese characters as direct output and perform poorly compared with other language models. The main reason for this phenomenon is that the relationship between Chinese text and pronunciation is more complex. Inspired by the learning process of Chinese beginners, who first master initials, finals, and pinyin before learning characters, we propose the Syllable-Character Collaborative Model (SCCM), which incorporates these phonetic elements into the training process. Additionally, we design a Pinyin-Ensemble module t
APA, Harvard, Vancouver, ISO, and other styles
3

Karima, Nida Aulia, Ade Nurul Aisyah, Hercio Venceslau Silla, Lekso Budi Handoko, and Ramadhan Rakhmat Sani. "Kriptografi Teks Berbasis Algoritma Substitusi Vigenere Cipher 8 Bit." Jurnal Masyarakat Informatika 15, no. 1 (2024): 1–13. http://dx.doi.org/10.14710/jmasif.15.1.60836.

Full text
Abstract:
Vigenere Cipher merupakan salah satu algoritma kriptografi klasik dalam dunia kriptografi. Penelitian ini berfokus pada penggunaan metode Vigenere Cipher dan implementasinya dalam mengamankan sebuah teks pesan berbentuk ASCII. Penelitian ini menggunakan empat metode pengujian yaitu, Avalanche Effect, Character Error Rate (CER), Bit Error Rate (BER), dan Entropi. Hasil pengujian mendapatkan bahwa nilai Avalanche Effect yang dihasilkan rata-rata berada pada angka 50% ke atas, artinya diperoleh nilai Avalanche Effect yang baik. Selain itu, CER dan BER yang dihasilkan bernilai 0, artinya tidak ter
APA, Harvard, Vancouver, ISO, and other styles
4

Maurya, Maruti, Mohd Zaheer, Nawab Mohammad, Sadaf siddiqui, Mohd Zeeshan Khan, and Mohd Ayan Akram. "Speech Recognition Technologies: Design, Challenges, and Real-World Applications." International Journal of Innovative Research in Computer Science and Technology 13, no. 3 (2025): 55–61. https://doi.org/10.55524/ijircst.2025.13.3.9.

Full text
Abstract:
This paper presents an automated speech recognition (ASR) system that transcribes audio from YouTube videos into accurate text using OpenAI's Whisper model. Leveraging tools such as yt_dlp, FFmpeg, and PyTorch, the system creates a robust speech-to-text pipeline. On receiving a video URL, the system extracts and preprocesses audio, transcribes it using Whisper, and evaluates transcription quality through metrics like Word Error Rate (WER), Character Error Rate (CER), and Match Error Rate (MER). The pipeline supports offline use, making it suitable for accessible, cost-effective deployment in e
APA, Harvard, Vancouver, ISO, and other styles
5

Cheema, Musa Dildar Ahmed, Mohammad Daniyal Shaiq, Farhaan Mirza, Ali Kamal, and M. Asif Naeem. "Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)." PeerJ Computer Science 10 (April 29, 2024): e1964. http://dx.doi.org/10.7717/peerj-cs.1964.

Full text
Abstract:
In the realm of digitizing written content, the challenges posed by low-resource languages are noteworthy. These languages, often lacking in comprehensive linguistic resources, require specialized attention to develop robust systems for accurate optical character recognition (OCR). This article addresses the significance of focusing on such languages and introduces ViLanOCR, an innovative bilingual OCR system tailored for Urdu and English. Unlike existing systems, which struggle with the intricacies of low-resource languages, ViLanOCR leverages advanced multilingual transformer-based language
APA, Harvard, Vancouver, ISO, and other styles
6

Abdallah, Abdelrahman, Mohamed Hamada, and Daniyar Nurseitov. "Attention-Based Fully Gated CNN-BGRU for Russian Handwritten Text." Journal of Imaging 6, no. 12 (2020): 141. http://dx.doi.org/10.3390/jimaging6120141.

Full text
Abstract:
This article considers the task of handwritten text recognition using attention-based encoder–decoder networks trained in the Kazakh and Russian languages. We have developed a novel deep neural network model based on a fully gated CNN, supported by multiple bidirectional gated recurrent unit (BGRU) and attention mechanisms to manipulate sophisticated features that achieve 0.045 Character Error Rate (CER), 0.192 Word Error Rate (WER), and 0.253 Sequence Error Rate (SER) for the first test dataset and 0.064 CER, 0.24 WER and 0.361 SER for the second test dataset. Our proposed model is the first
APA, Harvard, Vancouver, ISO, and other styles
7

Wicaksono, Agung, and Eka Setia Nugraha. "Desain Modem Sistem Komunikasi Digital HF Berbasis Software Defined Radio." Edu Komputika Journal 8, no. 1 (2021): 21–30. http://dx.doi.org/10.15294/edukomputika.v8i1.47297.

Full text
Abstract:
Sistem komunikasi High Frequency (HF) bekerja menggunakan gelombang radio pada frekuensi 3-30 MHz yang merambat dalam bentuk skywave dengan bantuan lapisan ionosfer. Sistem komunikasi HF saat ini masih terbatas pada pengiriman suara, diharapkan dapat mengirimkan pesan berupa teks dengan menerapkan sistem komunikasi digital. Penelitian ini melaporkan desain modem sistem komunikasi digital HF menggunakan perangkat Software Defined Radio (SDR) untuk implementasi yang mudah. Modulasi dan Demodulasi memiliki peranan penting dalam sistem komunikasi digital. Evaluasi sistem dilakukan dengan eksperime
APA, Harvard, Vancouver, ISO, and other styles
8

Drobac, Senka, and Krister Lindén. "Optical character recognition with neural networks and post-correction with finite state methods." International Journal on Document Analysis and Recognition (IJDAR) 23, no. 4 (2020): 279–95. http://dx.doi.org/10.1007/s10032-020-00359-9.

Full text
Abstract:
Abstract The optical character recognition (OCR) quality of the historical part of the Finnish newspaper and journal corpus is rather low for reliable search and scientific research on the OCRed data. The estimated character error rate (CER) of the corpus, achieved with commercial software, is between 8 and 13%. There have been earlier attempts to train high-quality OCR models with open-source software, like Ocropy (https://github.com/tmbdev/ocropy) and Tesseract (https://github.com/tesseract-ocr/tesseract), but so far, none of the methods have managed to successfully train a mixed model that
APA, Harvard, Vancouver, ISO, and other styles
9

Darpito, Muhammad Noko, Kartika Firdausy, and Abdul Fadlil. "Perbandingan Unjuk Kerja Library Optical Character Recognition (OCR) dalam Pengenalan Teks pada Dokumen Digital." Jurnal Informatika Polinema 11, no. 3 (2025): 273–82. https://doi.org/10.33795/jip.v11i3.7025.

Full text
Abstract:
Optical Character Recognition (OCR) merupakan teknologi yang digunakan untuk mengubah teks dalam dokumen digital menjadi teks yang dapat dikenali oleh mesin. Pemilihan metode OCR yang tepat sangat bergantung pada efisiensi pemrosesan dan akurasi pengenalan teks, terutama dalam penerapan yang membutuhkan kecepatan tinggi dan tingkat kesalahan minimal. Dalam penelitian ini, dilakukan perbandingan performa antara Tesseract dan EasyOCR melalui metode penelitian yang mencakup tahapan pengumpulan data, ekstraksi teks, implementasi OCR menggunakan kedua library tersebut, dan evaluasi hasil ekstraksi
APA, Harvard, Vancouver, ISO, and other styles
10

Tadesse, Direselign Addis, Chuan-Ming Liu, and Van-Dai Ta. "Gated Convolution and Stacked Self-Attention Encoder–Decoder-Based Model for Offline Handwritten Ethiopic Text Recognition." Information 14, no. 12 (2023): 654. http://dx.doi.org/10.3390/info14120654.

Full text
Abstract:
Offline handwritten text recognition (HTR) is a long-standing research project for a wide range of applications, including assisting visually impaired users, humans and robot interactions, and the automatic entry of business documents. However, due to variations in writing styles, visual similarities between different characters, overlap between characters, and source document noise, designing an accurate and flexible HTR system is challenging. The problem becomes serious when the algorithm has a low learning capacity and when the text used is complex and has a lot of characters in the writing
APA, Harvard, Vancouver, ISO, and other styles
11

Bimurat, Mukhtar, Ardak Shalkarbaiuly, Akhmediyar Kazhymukhanuly, and Aierke Myrzabayeva. "SVTR model for Kazakh Handwritten Text Recognition." Suleyman Demirel University Bulletin Natural and Technical Sciences 65, no. 2 (2024): 5–14. https://doi.org/10.47344/sdubnts.v65i2.1183.

Full text
Abstract:
Handwritten Text Recognition (HTR) plays a crucial role in transforming historical and contemporary handwritten documents into digital formats, facilitating easier access, searchability, and analysis. The SVTR model, known for its state-of-the-art performance in scene text recognition (STR), stands out for its minimal resource use, and quick inference time. In this study, we apply the SVTR model to the Kazakh Offline Handwritten Text Dataset (KOHTD) to assess its capability in handwritten text recognition. Achieving a Character Error Rate (CER) of 4.59% and a Word Error Rate (WER) of 20%, our
APA, Harvard, Vancouver, ISO, and other styles
12

Vinokurov, Igor Victorovich. "Recognition of cadastral coordinates using convolutional recurrent neural networks." Program Systems: Theory and Applications 15, no. 1 (2024): 3–30. http://dx.doi.org/10.25209/2079-3316-2024-15-1-3-30.

Full text
Abstract:
В статье исследуется применение свёрточно/̄рекуррентных нейронных сетей (CRNN) для распознавания изображений кадастровых координат объектов на отсканированных документах ППК «Роскадастр». Комбинированная архитектура CRNN, объединяющая свёрточные нейронные сети (CNN) и рекуррентные нейронные сети (RNN), позволяет использовать преимущества каждой из них для обработки изображений и распознавания содержащихся в них непрерывных цифровых последовательностей. При проведении экспериментальных исследований были формированы изображения, состоящие из заданного количества цифр, построена и исследована CRN
APA, Harvard, Vancouver, ISO, and other styles
13

Rista, Amarildo, and Arbana Kadriu. "A Model for Albanian Speech Recognition Using End-to-End Deep Learning Techniques." Interdisciplinary Journal of Research and Development 9, no. 3 (2022): 1. http://dx.doi.org/10.56345/ijrdv9n301.

Full text
Abstract:
End-to-end Automatic Speech Recognition (ASR) system folds the acoustic model (AM), language model (LM), and pronunciation model (PM) into a single neural network. The joint optimization of all these components optimizes performance of the model. In this paper, we introduce a model for Albanian speech recognition (SR) using end-to-end deep learning techniques. The two main modules that build this model are: Residual Convolutional Neural Networks (ResCNN), which aims to learn the relevant features and Bidirectional Recurrent Neural Networks (BiRNN) aiming to leverage the learned ResCNN audio fe
APA, Harvard, Vancouver, ISO, and other styles
14

Jeong, Jiho, S. I. M. M. Raton Mondol, Yeon Wook Kim, and Sangmin Lee. "An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech." Electronics 10, no. 7 (2021): 807. http://dx.doi.org/10.3390/electronics10070807.

Full text
Abstract:
The automatic speech recognition (ASR) model usually requires a large amount of training data to provide better results compared with the ASR models trained with a small amount of training data. It is difficult to apply the ASR model to non-standard speech such as that of cochlear implant (CI) patients, owing to privacy concerns or difficulty of access. In this paper, an effective finetuning and augmentation ASR model is proposed. Experiments compare the character error rate (CER) after training the ASR model with the basic and the proposed method. The proposed method achieved a CER of 36.03%
APA, Harvard, Vancouver, ISO, and other styles
15

Alsayadi, Hamzah A., and Mohammed Hadwan. "Automatic Speech Recognition for Qur’an Verses using Traditional Technique." Journal of Artificial Intelligence and Metaheuristics 1, no. 2 (2022): 17–23. http://dx.doi.org/10.54216/jaim.010202.

Full text
Abstract:
Deep learning is the one of approaches of machine learning that uses algorithms for building a model based on complex unstructured data. The Muslims Holy Qur’an book is written using Arabic diacritized text. In this paper, a traditional method to build a robust Qur’an versus recognition is proposed. The MFCC is used to extract features. These features are adapted using minimum phone error (MPE) as a discriminative model. The acoustic model was built using the deep neural network (DNN) model. We present an n-gram language model (LM). The dataset of Qur’an verses is used for training and evaluat
APA, Harvard, Vancouver, ISO, and other styles
16

Casas-Huamanta, Edwin Roi, Lloy Pinedo, Enrique Alejandro Barbachán-Ruales, Angel Cardenas-García, Luis Alberth Rossel-Bernedo, and Jose Gabriel Seijas-Díaz. "Optical character recognition system with natural language processing for data recovery on scanned old academic card reports." Acta Scientiarum. Technology 47, no. 1 (2024): e69814. https://doi.org/10.4025/actascitechnol.v47i1.69814.

Full text
Abstract:
In the digital age, preserving and effectively retrieving historical academic records has a significant challenge, especially when these documents only exist in deteriorated physical formats. We propose an approach to recover data from scanned documents of grade records, by using image processing and Natural Language Processing (NLP) to enhance the accuracy of Optical Character Recognition (OCR) in these documents, essential for the preservation of digital records. Our three-step methodology: first, improves the quality of the scanned image; then, extracts text using OCR and NLP techniques to
APA, Harvard, Vancouver, ISO, and other styles
17

Tilkar, Swati. "A Review on Natural Language Processing (NLP) Models for Generating Meeting Transcription." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem45415.

Full text
Abstract:
ABSTRACT: Without a loss of generality, it can be stated that meetings play a vital role in collaboration and decision-making in diverse domains and organizations. However, manually documenting these meetings is time-consuming and prone to errors or omissions. To address this challenge, Natural Language Processing (NLP), a subfield of artificial intelligence, has emerged as a powerful tool for automating the transcription of spoken content. With NLP, meeting conversations can be converted into accurate, readable text in real-time or post-meeting, improving productivity, accessibility, and reco
APA, Harvard, Vancouver, ISO, and other styles
18

Armaisya, Dimas Dwi, Panca Dewi Pamungkasari, Achmad Pratama Rifai, Ira Diana Sholihati, and Gopal Sakarkar. "Comparison Of Feature Extraction Techniques For Long Short-Term Memory Models In Indonesian Automatic Speech Recognition." Green Intelligent Systems and Applications 5, no. 1 (2025): 74–92. https://doi.org/10.53623/gisa.v5i1.605.

Full text
Abstract:
Automatic Speech Recognition (ASR) faced challenges in accuracy and noise robustness, particularly in Bahasa Indonesia. This research addressed the limitations of single feature extraction methods, such as Mel-Frequency Cepstral Coefficients (MFCC), which were sensitive to noise, and Relative Spectral Transform - Perceptual Linear Predictive (RASTA-PLP), which was less effective in frequency representation, by proposing a hybrid approach that combined both techniques using Long Short-Term Memory (LSTM) models. MFCC enhanced spectral accuracy, while RASTA-PLP improved noise robustness, resultin
APA, Harvard, Vancouver, ISO, and other styles
19

BARKOVSKA, Olesia, and Vladyslav KHOLIEV. "NEURAL NETWORK ARCHITECTURE FOR TEXT DECODING BASED ON SPEAKER'S LIP MOVEMENTS." Computer systems and information technologies, no. 4 (December 28, 2023): 52–59. http://dx.doi.org/10.31891/csit-2023-4-7.

Full text
Abstract:
In this paper, we tested a command recognition system using the SSI approach and conducted a series of experiments on modern solutions based on ALR interfaces. The main goal was to improve the accuracy of speech recognition in cases where it is not possible to use the speaker's non-noisy audio sequence, for example, at a great distance from the speaker or in a noisy environment. The obtained results showed that training the neural network on a GPU accelerator allowed to reduce the training time by 26.2 times using a high-resolution training sample with a size of the selected mouth area of 150
APA, Harvard, Vancouver, ISO, and other styles
20

Czyzewski, Andrzej. "Strategies for preprocessing speech to enhance neural model efficiency in speech-to-text applications." Journal of the Acoustical Society of America 156, no. 4_Supplement (2024): A26. https://doi.org/10.1121/10.0034984.

Full text
Abstract:
A comprehensive study on the impact of advanced speech preprocessing strategies on the performance of speech-to-text models is presented. Our approach incorporates noise augmentation, speech rate descreasing, and anonymization to create a more robust training dataset. Additionally, we utilized voice cloning techniques to generate thousands of supplementary recordings, significantly expanding our dataset. These preprocessing strategies aim to improve the accuracy and efficiency of speech recognition systems. Our experiments demonstrate a notable reduction in Word Error Rate (WER) by an average
APA, Harvard, Vancouver, ISO, and other styles
21

Yeleussinov, Arman, Yedilkhan Amirgaliyev, and Lyailya Cherikbayeva. "Improving OCR Accuracy for Kazakh Handwriting Recognition Using GAN Models." Applied Sciences 13, no. 9 (2023): 5677. http://dx.doi.org/10.3390/app13095677.

Full text
Abstract:
This paper aims to increase the accuracy of Kazakh handwriting text recognition (KHTR) using the generative adversarial network (GAN), where a handwriting word image generator and an image quality discriminator are constructed. In order to obtain a high-quality image of handwritten text, the multiple losses are intended to encourage the generator to learn the structural properties of the texts. In this case, the quality discriminator is trained on the basis of the relativistic loss function. Based on the proposed structure, the resulting document images not only preserve texture details but al
APA, Harvard, Vancouver, ISO, and other styles
22

Buoy, Rina, Nguonly Taing, Sovisal Chenda, and Sokchea Kor. "Khmer printed character recognition using attention-based Seq2Seq network." HO CHI MINH CITY OPEN UNIVERSITY JOURNAL OF SCIENCE - ENGINEERING AND TECHNOLOGY 12, no. 1 (2022): 3–16. http://dx.doi.org/10.46223/hcmcoujs.tech.en.12.1.2217.2022.

Full text
Abstract:
This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attent
APA, Harvard, Vancouver, ISO, and other styles
23

Lee, Geon Woo, and Hong Kook Kim. "Two-Step Joint Optimization with Auxiliary Loss Function for Noise-Robust Speech Recognition." Sensors 22, no. 14 (2022): 5381. http://dx.doi.org/10.3390/s22145381.

Full text
Abstract:
In this paper, a new two-step joint optimization approach based on the asynchronous subregion optimization method is proposed for training a pipeline model composed of two different models. The first-step processing of the proposed joint optimization approach trains the front-end model only, and the second-step processing trains all the parameters of the combined model together. In the asynchronous subregion optimization method, the first-step processing only supports the goal of the front-end model. However, the first-step processing of the proposed approach works with a new loss function to
APA, Harvard, Vancouver, ISO, and other styles
24

Tan, Yee Fan, Tee Connie, Michael Kah Ong Goh, and Andrew Beng Jin Teoh. "A Pipeline Approach to Context-Aware Handwritten Text Recognition." Applied Sciences 12, no. 4 (2022): 1870. http://dx.doi.org/10.3390/app12041870.

Full text
Abstract:
Despite concerted efforts towards handwritten text recognition, the automatic location and transcription of handwritten text remain a challenging task. Text detection and segmentation methods are often prone to errors, affecting the accuracy of the subsequent recognition procedure. In this paper, a pipeline that locates texts on a page and recognizes the text types, as well as the context of the texts within the detected region, is proposed. Clinical receipts are used as the subject of study. The proposed model is comprised of an object detection neural network that extracts text sequences pre
APA, Harvard, Vancouver, ISO, and other styles
25

Kubiak, Ireneusz. "Font Design—Shape Processing of Text Information Structures in the Process of Non-Invasive Data Acquisition." Computers 8, no. 4 (2019): 70. http://dx.doi.org/10.3390/computers8040070.

Full text
Abstract:
Computer fonts can be a solution that supports the protection of information against electromagnetic penetration; however, not every font has features that counteract this process. The distinctive features of a font’s characters define the font. This article presents two new sets of computer fonts. These fonts are fully usable in everyday work. Additionally, they make it impossible to obtain information using non-invasive methods. The names of these fonts are directly related to the shapes of their characters. Each character in these fonts is built using only vertical and horizontal lines. The
APA, Harvard, Vancouver, ISO, and other styles
26

Rosyadi, Ahmad Wahyu, Siti Ma’shumah, Muhammad Qomaruz Zaman, and Moh. Rizki Fajar. "Ingredients Identification Through Label Scanning Using PaddleOCR and ChatGPT for Information Retrieval." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 8, no. 6 (2024): 758–67. https://doi.org/10.29207/resti.v8i6.6119.

Full text
Abstract:
Human health depends on choosing food ingredients that align with dietary needs and avoid allergens. However, consumers often encounter unfamiliar ingredients that require additional information. Traditionally, they search online by typing in the ingredient's name which can be time-consuming and may not yield relevant results. Therefore, a system to identify and display ingredient information is necessary. This study proposes a new system that identifies ingredients by scanning the composition label on packaging using PaddleOCR and retrieving information through ChatGPT on a smartphone. The pr
APA, Harvard, Vancouver, ISO, and other styles
27

Kostek, Bozena. "Enunciation—An important factor in speech-to-text medical transcription systems." Journal of the Acoustical Society of America 156, no. 4_Supplement (2024): A126. https://doi.org/10.1121/10.0035339.

Full text
Abstract:
This study aims to explore the extent to which enunciation plays a crucial role in a speech-to-text (STT) system, especially when dealing with medical terminology. To achieve this, an audio dataset was recorded containing Polish medical terms and spoken diagnoses pronounced by healthcare professionals, including general practitioners and specialists in various fields such as cardiology, pulmonology, and radiology. The next step involved comprehensive acoustical and lexical analyses of the audio recordings. Features such as harmonic-to-noise ratio, spectral tilt, zero-crossing rate, formant dis
APA, Harvard, Vancouver, ISO, and other styles
28

R, Geetha Rajakumari, Karthika Renuka D, and Ashok Kumar L. "ENHANCING ASR ACCURACY AND COHERENCE ACROSS INDIAN LANGUAGES WITH WAV2VEC2 AND GPT-2." ICTACT Journal on Data Science and Machine Learning 6, no. 2 (2025): 761–64. https://doi.org/10.21917/ijdsml.2025.0156.

Full text
Abstract:
This paper presents a comprehensive framework for automatic speech recognition (ASR) and text refinement that leverages advanced deep learning models to improve transcription accuracy and contextual coherence across multiple languages, including Tamil, Kannada, Telugu, Malayalam, and English. The framework integrates three primary models: Wav2Vec2 for ASR, Sentence Transformer for semantic retrieval, and GPT-2 for text generation. Initially, the Wav2Vec2 model is employed to convert audio inputs into text, achieving a Word Error Rate (WER) of 8% and a Character Error Rate (CER) of 5%. This mod
APA, Harvard, Vancouver, ISO, and other styles
29

Ухачевич, Т. Я., та Н. О. Кустра. "Дослідження впливу обрізання та тонкого налаштування моделі автоматичного розпізнавання мовлення на її точність". Scientific Bulletin of UNFU 34, № 5 (2024): 104–9. http://dx.doi.org/10.36930/40340514.

Full text
Abstract:
Досліджено вплив методів обрізання моделі та тонкого її налаштування на точність автоматичного розпізнавання мовлення ASR (англ. Automatic Speech Recognition) для мови з низьким ресурсом. Використану модель "wav2vec2-xls-r-300 m-uk", попередньо навчено на великому багатомовному наборі даних і тонко налаштовано на українському наборі даних із Common Voice. Метод обрізання за L1-нормою було застосовано на різних рівнях (10, 20, 30, 40, 50 %%) без подальшого налаштування, що виявило значне зниження точності (метрика WER (англ. Word Error Rate) збільшилася з 18,53 до 35,96 %%, метрика CER (англ. C
APA, Harvard, Vancouver, ISO, and other styles
30

Munawaroh, Anisatul, and Eko Rudiawan Jamzuri. "Automatic optical inspection for detecting keycaps misplacement using Tesseract optical character recognition." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 5 (2023): 5147. http://dx.doi.org/10.11591/ijece.v13i5.pp5147-5155.

Full text
Abstract:
<span lang="EN-US">This research study aims to develop automatic optical inspection (AOI) for detecting keycaps misplacement on the keyboard. The AOI hardware has been designed using an industrial camera with an additional mechanical jig and lighting system. Optical character recognition (OCR) using the Tesseract OCR engine is the proposed method to detect keycaps misplacement. In addition, captured images were cropped using a predefined region of interest (ROI) during the setup. Subsequently, the cropped ROIs were processed to acquire binary images. Furthermore, Tesseract processed thes
APA, Harvard, Vancouver, ISO, and other styles
31

Munawaroh, Anisatul, and Eko Rudiawan Jamzuri. "Automatic optical inspection for detecting keycaps misplacement using Tesseract optical character recognition." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 5 (2023): 5147–55. https://doi.org/10.11591/ijece.v13i5.pp5147-5155.

Full text
Abstract:
This research study aims to develop automatic optical inspection (AOI) for detecting keycaps misplacement on the keyboard. The AOI hardware has been designed using an industrial camera with an additional mechanical jig and lighting system. Optical character recognition (OCR) using the Tesseract OCR engine is the proposed method to detect keycaps misplacement. In addition, captured images were cropped using a predefined region of interest (ROI) during the setup. Subsequently, the cropped ROIs were processed to acquire binary images. Furthermore, Tesseract processed these binary images to recogn
APA, Harvard, Vancouver, ISO, and other styles
32

Aimin Zhou, Chenghuan Xie,. "SVCGAN: Speaker Voice Conversion Generative Adversarial Network for Children’s Speech Conversion and Recognition." Journal of Electrical Systems 20, no. 3s (2024): 2182–96. http://dx.doi.org/10.52783/jes.1841.

Full text
Abstract:
Automatic speech recognition (ASR) refers to a technological process that entails the conversion of spoken language into written text. However, the acoustic distinctions between children’s speech and adult speech are substantial, rendering the automatic speech recognition system trained on adult speech inadequate for effectively recognizing children’s speech. To overcome this issue, in this study, we propose speaker conversion generative adversarial network (SVCGAN). SVCGAN is a novel non-parallel voice conversion model, which enhances three key areas: log-cosh loss, semantic-similarity loss,
APA, Harvard, Vancouver, ISO, and other styles
33

Pandey, Subham, Sumaiya Tahseen, Rohit Pathak, Hina Parveen, and Maruti Maurya. "Real-Time Vision-Based Indian Sign Language Translation Using Deep Learning Techniques." International Journal of Innovative Research in Computer Science and Technology 13, no. 3 (2025): 35–46. https://doi.org/10.55524/ijircst.2025.13.3.6.

Full text
Abstract:
This work proposes a vision-based approach to real-time sign language translation for Indian Sign Language (ISL). The system uses state-of-the-art deep learning architectures such as CNN (Convolutional Neural Networks), LSTM (Long Short-Term Memory) networks, and Transformer-based encoder-decoder models for gesture recognition in both isolated and continuous forms. Data preprocessing techniques such as DTW (Dynamic Time Warping) were applied to augment and normalize gesture sequences from custom ISL and public ASL datasets. The model performance was quantitatively evaluated using precision, re
APA, Harvard, Vancouver, ISO, and other styles
34

Toro, Javier Villena, and Mehdi Tarkian. "Optimizing Text Recognition in Mechanical Drawings: A Comprehensive Approach." Machines 13, no. 3 (2025): 254. https://doi.org/10.3390/machines13030254.

Full text
Abstract:
The digitalization of engineering drawings is a pivotal step toward automating and improving the efficiency of product design and manufacturing systems (PDMSs). This study presents eDOCr2, a framework that combines traditional OCR and image processing to extract structured information from mechanical drawings. It segments drawings into key elements—such as information blocks, dimensions, and feature control frames—achieving a text recall of 93.75% and a character error rate (CER) below 1% in a benchmark with drawings from different sources. To improve semantic understanding and reasoning, eDOC
APA, Harvard, Vancouver, ISO, and other styles
35

Saputra, The Manuel Eric, Ajib Susanto, and Bastiaans Jessica Carmelita. "Implementation of Tesseract OCR and Bounding Box for Text Extraction on Food Nutrition Labels." Building of Informatics, Technology and Science (BITS) 6, no. 3 (2024): 1403–12. https://doi.org/10.47065/bits.v6i3.6107.

Full text
Abstract:
This study focuses on implementing Optical Character Recognition (OCR) using the Tesseract engine, integrated with bounding box detection, to extract nutritional information from food nutrition labels. The research addresses the challenge of limited consumer access to and understanding of nutritional data, a factor contributing to health issues such as obesity and related metabolic disorders. Studies indicate that although Indonesian consumers generally have a good level of knowledge and positive attitudes toward nutritional labels, the actual behavior of reading and understanding these labels
APA, Harvard, Vancouver, ISO, and other styles
36

Silber Varod, Vered, Ingo Siegert, Oliver Jokisch, Yamini Sinha, and Nitza Geri. "A cross-language study of speech recognition systems for English, German, and Hebrew." Online Journal of Applied Knowledge Management 9, no. 1 (2021): 1–15. http://dx.doi.org/10.36965/ojakm.2021.9(1)1-15.

Full text
Abstract:
Despite the growing importance of Automatic Speech Recognition (ASR), its application is still challenging, limited, language-dependent, and requires considerable resources. The resources required for ASR are not only technical, they also need to reflect technological trends and cultural diversity. The purpose of this research is to explore ASR performance gaps by a comparative study of American English, German, and Hebrew. Apart from different languages, we also investigate different speaking styles – utterances from spontaneous dialogues and utterances from frontal lectures (TED-like genre).
APA, Harvard, Vancouver, ISO, and other styles
37

Yin, Bing, Shutong Niu, Haitao Tang, et al. "An Investigation into Audio–Visual Speech Recognition under a Realistic Home–TV Scenario." Applied Sciences 13, no. 7 (2023): 4100. http://dx.doi.org/10.3390/app13074100.

Full text
Abstract:
Robust speech recognition in real world situations is still an important problem, especially when it is affected by environmental interference factors and conversational multi-speaker interactions. Supplementing audio information with other modalities, such as audio–visual speech recognition (AVSR), is a promising direction for improving speech recognition. The end-to-end (E2E) framework can learn information between multiple modalities well; however, the model is not easy to train, especially when the amount of data is relatively small. In this paper, we focus on building an encoder–decoder-b
APA, Harvard, Vancouver, ISO, and other styles
38

Liang, Haijun, Hanwen Chang, and Jianguo Kong. "Speech Recognition for Air Traffic Control Utilizing a Multi-Head State-Space Model and Transfer Learning." Aerospace 11, no. 5 (2024): 390. http://dx.doi.org/10.3390/aerospace11050390.

Full text
Abstract:
In the present study, a novel end-to-end automatic speech recognition (ASR) framework, namely, ResNeXt-Mssm-CTC, has been developed for air traffic control (ATC) systems. This framework is built upon the Multi-Head State-Space Model (Mssm) and incorporates transfer learning techniques. Residual Networks with Cardinality (ResNeXt) employ multi-layered convolutions with residual connections to augment the extraction of intricate feature representations from speech signals. The Mssm is endowed with specialized gating mechanisms, which incorporate parallel heads that acquire knowledge of both loca
APA, Harvard, Vancouver, ISO, and other styles
39

Sucipto, Aidina Ristyawan, Dwi Harini, Wahid Ibnu Zaman, Muhammad Najibulloh Muzaki, and Mohamed Naeem Antharathara Abdulnazar. "Integrating Cryptographic Security Features in Information System Barcodes for Self-Service Systems." Advance Sustainable Science Engineering and Technology 6, no. 4 (2024): 02404012. http://dx.doi.org/10.26877/asset.v6i4.850.

Full text
Abstract:
Integrating services in an information system is necessary to provide services that can optimize an information system. One of the systems in PKKMB activities that will be combined with information security features is the attendance system. This research uses the Liner Sequential Model (LSM) method to integrate the QR Code attendance system with security features. This research aims to integrate QR Codes by optimizing increased security by combining the Advanced Encryption Standard (AES) algorithm with base64 with a dynamic data model to complicate the QR Code manipulation process. Contributi
APA, Harvard, Vancouver, ISO, and other styles
40

Li, Xuchen, Yiqun Wang, Xiao-Yang Liu, et al. "JLMS25 and Jiao-Liao Mandarin Speech Recognition Based on Multi-Dialect Knowledge Transfer." Applied Sciences 15, no. 3 (2025): 1670. https://doi.org/10.3390/app15031670.

Full text
Abstract:
Jiao-Liao Mandarin, a distinguished dialect in China, reflects the linguistic features and cultural heritage of the Jiao-Liao region. However, the labor-intensive and costly nature of manual transcription limits the scale of transcribed corpora, posing challenges for speech recognition. We present JLMS25, a transcribed corpus for Jiao-Liao Mandarin, alongside a novel multi-dialect knowledge transfer (MDKT) framework for low-resource speech recognition. By leveraging phonetic and linguistic knowledge from neighboring dialects, the MDKT framework improves recognition in resource-constrained sett
APA, Harvard, Vancouver, ISO, and other styles
41

Mamyrbayev, O., and T. Kurmetkan. "Analysis of the use of the hiformer model for kazakh speech recognition." Bulletin of the National Engineering Academy of the Republic of Kazakhstan 94, no. 4 (2024): 290–301. https://doi.org/10.47533/2024.1606-146x.024.

Full text
Abstract:
This article presents an overview of automatic speech recognition (ASR) technologies and describes the use of an advanced version of the Transformer model, the Hiformer model, in Kazakh speech recognition. A literature review of Kazakh speech recognition systems was made. The structure of the Hiformer model is described and how it can be used in different parts of the structure of an advanced attention mechanism (AED) (encoder, decoder, cross-coder attention). An experiment was carried out on the execution of tasks of recognition of Kazakh speech using the Hiformer model. This study also detai
APA, Harvard, Vancouver, ISO, and other styles
42

Bjerring-Hansen, Jens, Ross Deans Kristensen-McLachlan, Philip Diderichsen, and Dorte Haltrup Hansen. "Mending Fractured Texts." Digital Humanities in the Nordic and Baltic Countries Publications 4, no. 1 (2022): 177–86. http://dx.doi.org/10.5617/dhnbpub.11285.

Full text
Abstract:
In this paper we present an OCR correction pipeline for 19th century printed Danish fraktur (gothic/blackletter). The work has been carried out at the University of Copenhagen in relation to a research project involving digital explorations of a corpus of some 900 Danish and Norwegian novels from 1870 to 1899, totalling app. 65 million words. Roughly 25% of these novels are printed in the traditional fraktur font, which was almost totally dominating in the beginning of the 19th century. These texts are important culturally, since they represent mostly forgotten, popular novels, however they po
APA, Harvard, Vancouver, ISO, and other styles
43

Romero, Monica, Sandra Gómez-Canaval, and Ivan G. Torre. "Automatic Speech Recognition Advancements for Indigenous Languages of the Americas." Applied Sciences 14, no. 15 (2024): 6497. http://dx.doi.org/10.3390/app14156497.

Full text
Abstract:
Indigenous languages are a fundamental legacy in the development of human communication, embodying the unique identity and culture of local communities in America. The Second AmericasNLP Competition Track 1 of NeurIPS 2022 proposed the task of training automatic speech recognition (ASR) systems for five Indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa’ikhana. In this paper, we describe the fine-tuning of a state-of-the-art ASR model for each target language, using approximately 36.65 h of transcribed speech data from diverse sources enriched with data augmentation methods. We sy
APA, Harvard, Vancouver, ISO, and other styles
44

Li, Huiyan, Haohong Lin, You Wang, et al. "Sequence-to-Sequence Voice Reconstruction for Silent Speech in a Tonal Language." Brain Sciences 12, no. 7 (2022): 818. http://dx.doi.org/10.3390/brainsci12070818.

Full text
Abstract:
Silent speech decoding (SSD), based on articulatory neuromuscular activities, has become a prevalent task of brain–computer interfaces (BCIs) in recent years. Many works have been devoted to decoding surface electromyography (sEMG) from articulatory neuromuscular activities. However, restoring silent speech in tonal languages such as Mandarin Chinese is still difficult. This paper proposes an optimized sequence-to-sequence (Seq2Seq) approach to synthesize voice from the sEMG-based silent speech. We extract duration information to regulate the sEMG-based silent speech using the audio length. Th
APA, Harvard, Vancouver, ISO, and other styles
45

Tikhonov, Aleksej, and Achim Rabus. "Handwritten Text Recognition of Ukrainian Manuscripts in the 21st Century: Possibilities, Challenges, and the Future of the First Generic AI-based Model." Kyiv-Mohyla Humanities Journal, no. 11 (December 30, 2024): 226–47. https://doi.org/10.18523/2313-4895.11.2024.226-247.

Full text
Abstract:
This article reports on developing and evaluating a generic Handwritten Text Recognition (HTR) model created for the automatic computer-assisted transcription of Ukrainian handwriting publicly available via the HTR platform Transkribus. The model’s training process encompasses diverse datasets, including historical manuscripts by renowned poets Taras Shevchenko and Lesya Ukrainka, along with private correspondence used for the General Regionally Annotated Corpus of Ukrainian (GRAC) and a diary procured at the Holodomor Museum collection. We evaluate the model’s performance by comparing its the
APA, Harvard, Vancouver, ISO, and other styles
46

Muslih Muslih and Lekso Budi Handoko. "PENGUJIAN AVALANCHE EFFECT PADA KRIPTOGRAFI TEKS MENGGUNAKAN AUTOKEY CIPHER." Seminar Nasional Teknologi dan Multidisiplin Ilmu (SEMNASTEKMU) 2, no. 1 (2022): 127–34. http://dx.doi.org/10.51903/semnastekmu.v2i1.162.

Full text
Abstract:
Di era informasi digital, sebuah informasi menjadi bagian dalam segala aspek kehidupan yang memilikinilai yang tinggi apabila menyangkut tentang informasi pribadi sampai informasi keuangan dikarenakan informasitersebut sangat diminati oleh beberapa pihak yang memiliki kepentingan dengan informasi tersebut. Salah satuancaman yang sering terjadi di era infromasi digital ini adalah pembobolan data dikarenakan data tersebutmemiliki nilai yang tinggi untuk memperkuat keamanan data yang kita kirimkan salah satunya adalah menggunakankriptografi. Kriptografi memiliki banyak jenis metode dalam menyadik
APA, Harvard, Vancouver, ISO, and other styles
47

Hussain, Ibrar, Riaz Ahmad, Khalil Ullah, Siraj Muhammad, Rasha Elhassan, and Ikram Syed. "Deep learning-based recognition system for pashto handwritten text: benchmark on PHTI." PeerJ Computer Science 10 (March 27, 2024): e1925. http://dx.doi.org/10.7717/peerj-cs.1925.

Full text
Abstract:
This article introduces a recognition system for handwritten text in the Pashto language, representing the first attempt to establish a baseline system using the Pashto Handwritten Text Imagebase (PHTI) dataset. Initially, the PHTI dataset underwent pre-processed to eliminate unwanted characters, subsequently, the dataset was divided into training 70%, validation 15%, and test sets 15%. The proposed recognition system is based on multi-dimensional long short-term memory (MD-LSTM) networks. A comprehensive empirical analysis was conducted to determine the optimal parameters for the proposed MD-
APA, Harvard, Vancouver, ISO, and other styles
48

Antor, M. H., N. V. Chudinovskikh, M. V. Bachurin, A. A. Shurpikov, N. A. Khlebnikov, and B. A. Bredikhin. "Machine learning-based voice assistant: optimizing the efficiency of speech conversion for people with speech disorders." Computer Optics 49, no. 1 (2025): 124–31. https://doi.org/10.18287/2412-6179-co-1482.

Full text
Abstract:
An automatic speech recognition system has the possibility of enhancing the standard of living for persons with disabilities by solving issues such as dysarthria, stuttering, and other speech defects. In this paper, we introduce a voice assistant using hyperkinetic dysarthria (HD) defect speeches. It contains the data preprocessing steps and the development of a novel convolutional recurrent network (CRN) model that is built depending on the convolutional neural networks and recurrent neural networks. We implemented data preprocessing methods, including filtering, down-sampling, and splitting,
APA, Harvard, Vancouver, ISO, and other styles
49

Fang, Fuming, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa, Sadaoki Furui, and Toshimitsu Musha. "Improving Eye Motion Sequence Recognition Using Electrooculography Based on Context-Dependent HMM." Computational Intelligence and Neuroscience 2016 (2016): 1–9. http://dx.doi.org/10.1155/2016/6898031.

Full text
Abstract:
Eye motion-based human-machine interfaces are used to provide a means of communication for those who can move nothing but their eyes because of injury or disease. To detect eye motions, electrooculography (EOG) is used. For efficient communication, the input speed is critical. However, it is difficult for conventional EOG recognition methods to accurately recognize fast, sequentially input eye motions because adjacent eye motions influence each other. In this paper, we propose a context-dependent hidden Markov model- (HMM-) based EOG modeling approach that uses separate models for identical ey
APA, Harvard, Vancouver, ISO, and other styles
50

Fazira, Nabila Dwi, and Achmad Fauzan. "TRANSFORMER-BASED OPTICAL CHARACTER RECOGNITION APPROACH FOR IDENTIFYING MOTOR VEHICLES WITH OVERDUE TAXES." BAREKENG: Jurnal Ilmu Matematika dan Terapan 19, no. 3 (2025): 1597–608. https://doi.org/10.30598/barekengvol19iss3pp1597-1608.

Full text
Abstract:
The high growth in the number of motorized vehicles in Indonesia has given rise to special attention in managing traffic administration, especially in relation to vehicle taxes. To present innovative solutions in vehicle tax administration, this research was conducted to detect the five-year tax status of motor vehicles in Indonesia using the Transformer Optical Character Recognition (TrOCR) model. The aim of this research is to evaluate the performance of the TrOCR model in recognizing text on motor vehicle number plates in Indonesia and classifying number plates that have and have not paid t
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!