Academic literature on the topic 'Character Error Rate (CER)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Character Error Rate (CER).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Character Error Rate (CER)"

1

Abdallah, Abdelrahman, Mohamed Hamada, and Daniyar Nurseitov. "Attention-Based Fully Gated CNN-BGRU for Russian Handwritten Text." Journal of Imaging 6, no. 12 (December 18, 2020): 141. http://dx.doi.org/10.3390/jimaging6120141.

Full text
Abstract:
This article considers the task of handwritten text recognition using attention-based encoder–decoder networks trained in the Kazakh and Russian languages. We have developed a novel deep neural network model based on a fully gated CNN, supported by multiple bidirectional gated recurrent unit (BGRU) and attention mechanisms to manipulate sophisticated features that achieve 0.045 Character Error Rate (CER), 0.192 Word Error Rate (WER), and 0.253 Sequence Error Rate (SER) for the first test dataset and 0.064 CER, 0.24 WER and 0.361 SER for the second test dataset. Our proposed model is the first work to handle handwriting recognition models in Kazakh and Russian languages. Our results confirm the importance of our proposed Attention-Gated-CNN-BGRU approach for training handwriting text recognition and indicate that it can lead to statistically significant improvements (p-value < 0.05) in the sensitivity (recall) over the tests dataset. The proposed method’s performance was evaluated using handwritten text databases of three languages: English, Russian, and Kazakh. It demonstrates better results on the Handwritten Kazakh and Russian (HKR) dataset than the other well-known models.
APA, Harvard, Vancouver, ISO, and other styles
2

Drobac, Senka, and Krister Lindén. "Optical character recognition with neural networks and post-correction with finite state methods." International Journal on Document Analysis and Recognition (IJDAR) 23, no. 4 (August 20, 2020): 279–95. http://dx.doi.org/10.1007/s10032-020-00359-9.

Full text
Abstract:
Abstract The optical character recognition (OCR) quality of the historical part of the Finnish newspaper and journal corpus is rather low for reliable search and scientific research on the OCRed data. The estimated character error rate (CER) of the corpus, achieved with commercial software, is between 8 and 13%. There have been earlier attempts to train high-quality OCR models with open-source software, like Ocropy (https://github.com/tmbdev/ocropy) and Tesseract (https://github.com/tesseract-ocr/tesseract), but so far, none of the methods have managed to successfully train a mixed model that recognizes all of the data in the corpus, which would be essential for an efficient re-OCRing of the corpus. The difficulty lies in the fact that the corpus is printed in the two main languages of Finland (Finnish and Swedish) and in two font families (Blackletter and Antiqua). In this paper, we explore the training of a variety of OCR models with deep neural networks (DNN). First, we find an optimal DNN for our data and, with additional training data, successfully train high-quality mixed-language models. Furthermore, we revisit the effect of confidence voting on the OCR results with different model combinations. Finally, we perform post-correction on the new OCR results and perform error analysis. The results show a significant boost in accuracy, resulting in 1.7% CER on the Finnish and 2.7% CER on the Swedish test set. The greatest accomplishment of the study is the successful training of one mixed language model for the entire corpus and finding a voting setup that further improves the results.
APA, Harvard, Vancouver, ISO, and other styles
3

Jeong, Jiho, S. I. M. M. Raton Mondol, Yeon Wook Kim, and Sangmin Lee. "An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech." Electronics 10, no. 7 (March 29, 2021): 807. http://dx.doi.org/10.3390/electronics10070807.

Full text
Abstract:
The automatic speech recognition (ASR) model usually requires a large amount of training data to provide better results compared with the ASR models trained with a small amount of training data. It is difficult to apply the ASR model to non-standard speech such as that of cochlear implant (CI) patients, owing to privacy concerns or difficulty of access. In this paper, an effective finetuning and augmentation ASR model is proposed. Experiments compare the character error rate (CER) after training the ASR model with the basic and the proposed method. The proposed method achieved a CER of 36.03% on the CI patient’s speech test dataset using only 2 h and 30 min of training data, which is a 62% improvement over the basic method.
APA, Harvard, Vancouver, ISO, and other styles
4

Kubiak, Ireneusz. "Font Design—Shape Processing of Text Information Structures in the Process of Non-Invasive Data Acquisition." Computers 8, no. 4 (September 23, 2019): 70. http://dx.doi.org/10.3390/computers8040070.

Full text
Abstract:
Computer fonts can be a solution that supports the protection of information against electromagnetic penetration; however, not every font has features that counteract this process. The distinctive features of a font’s characters define the font. This article presents two new sets of computer fonts. These fonts are fully usable in everyday work. Additionally, they make it impossible to obtain information using non-invasive methods. The names of these fonts are directly related to the shapes of their characters. Each character in these fonts is built using only vertical and horizontal lines. The differences between the fonts lie in the widths of the vertical lines. The Safe Symmetrical font is built from vertical lines with the same width. The Safe Asymmetrical font is built from vertical lines with two different line widths. However, the appropriate proportions of the widths of the lines and clearances of each character need to be met for the safe fonts. The structures of the characters of the safe fonts ensure a high level of similarity between the characters. Additionally, these fonts do not make it difficult to read text in its primary form. However, sensitive transmissions are free from distinctive features, and the recognition of each character in reconstructed images is very difficult in contrast to traditional fonts, such as the Sang Mun font and Null Pointer font, which have many distinctive features. The usefulness of the computer fonts was assessed by the character error rate (CER); an analysis of this parameter was conducted in this work. The CER obtained very high values for the safe fonts; the values for traditional fonts were much lower. This article aims to presentat of a new solution in the area of protecting information against electromagnetic penetration. This is a new approach that could replace old solutions by incorporating heavy shielding, power and signal filters, and electromagnetic gaskets. Additionally, the application of these new fonts is very easy, as a user only needs to ensure that either the Safe Asymmetrical font or the Safe Symmetrical font is installed on the computer station that processes the text data.
APA, Harvard, Vancouver, ISO, and other styles
5

Silber Varod, Vered, Ingo Siegert, Oliver Jokisch, Yamini Sinha, and Nitza Geri. "A cross-language study of speech recognition systems for English, German, and Hebrew." Online Journal of Applied Knowledge Management 9, no. 1 (July 26, 2021): 1–15. http://dx.doi.org/10.36965/ojakm.2021.9(1)1-15.

Full text
Abstract:
Despite the growing importance of Automatic Speech Recognition (ASR), its application is still challenging, limited, language-dependent, and requires considerable resources. The resources required for ASR are not only technical, they also need to reflect technological trends and cultural diversity. The purpose of this research is to explore ASR performance gaps by a comparative study of American English, German, and Hebrew. Apart from different languages, we also investigate different speaking styles – utterances from spontaneous dialogues and utterances from frontal lectures (TED-like genre). The analysis includes a comparison of the performance of four ASR engines (Google Cloud, Google Search, IBM Watson, and WIT.ai) using four commonly used metrics: Word Error Rate (WER); Character Error Rate (CER); Word Information Lost (WIL); and Match Error Rate (MER). As expected, findings suggest that English ASR systems provide the best results. Contrary to our hypothesis regarding ASR’s low performance for under-resourced languages, we found that the Hebrew and German ASR systems have similar performance. Overall, our findings suggest that ASR performance is language-dependent and system-dependent. Furthermore, ASR may be genre-sensitive, as our results showed for German. This research contributes a valuable insight for improving ubiquitous global consumption and management of knowledge and calls for corporate social responsibility of commercial companies, to develop ASR under Fair, Reasonable, and Non-Discriminatory (FRAND) terms
APA, Harvard, Vancouver, ISO, and other styles
6

Fang, Fuming, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa, Sadaoki Furui, and Toshimitsu Musha. "Improving Eye Motion Sequence Recognition Using Electrooculography Based on Context-Dependent HMM." Computational Intelligence and Neuroscience 2016 (2016): 1–9. http://dx.doi.org/10.1155/2016/6898031.

Full text
Abstract:
Eye motion-based human-machine interfaces are used to provide a means of communication for those who can move nothing but their eyes because of injury or disease. To detect eye motions, electrooculography (EOG) is used. For efficient communication, the input speed is critical. However, it is difficult for conventional EOG recognition methods to accurately recognize fast, sequentially input eye motions because adjacent eye motions influence each other. In this paper, we propose a context-dependent hidden Markov model- (HMM-) based EOG modeling approach that uses separate models for identical eye motions with different contexts. Because the influence of adjacent eye motions is explicitly modeled, higher recognition accuracy is achieved. Additionally, we propose a method of user adaptation based on a user-independent EOG model to investigate the trade-off between recognition accuracy and the amount of user-dependent data required for HMM training. Experimental results show that when the proposed context-dependent HMMs are used, the character error rate (CER) is significantly reduced compared with the conventional baseline under user-dependent conditions, from 36.0 to 1.3%. Although the CER increases again to 17.3% when the context-dependent but user-independent HMMs are used, it can be reduced to 7.3% by applying the proposed user adaptation method.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Qian, Dong Wang, Run Zhao, Yinggang Yu, and JiaZhen Jing. "Write, Attend and Spell." Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, no. 3 (September 9, 2021): 1–25. http://dx.doi.org/10.1145/3478100.

Full text
Abstract:
Text entry on a smartwatch is challenging due to its small form factor. Handwriting recognition using the built-in sensors of the watch (motion sensors, microphones, etc.) provides an efficient and natural solution to deal with this issue. However, prior works mainly focus on individual letter recognition rather than word recognition. Therefore, they need users to pause between adjacent letters for segmentation, which is counter-intuitive and significantly decreases the input speed. In this paper, we present 'Write, Attend and Spell' (WriteAS), a word-level text-entry system which enables free-style handwriting recognition using the motion signals of the smartwatch. First, we design a multimodal convolutional neural network (CNN) to abstract motion features across modalities. After that, a stacked dilated convolutional network with an encoder-decoder network is applied to get around letter segmentation and output words in an end-to-end way. More importantly, we leverage a multi-task sequence learning method to enable handwriting recognition in a streaming way. We construct the first sequence-to-sequence handwriting dataset using smartwatch. WriteAS can yield 9.3% character error rate (CER) on 250 words for new users and 3.8% CER for words unseen in the training set. In addition, WriteAS can handle various writing conditions very well. Given the promising performance, we envision that WriteAS can be a fast and accurate input tool for smartwatch.
APA, Harvard, Vancouver, ISO, and other styles
8

Laptev, Aleksandr, Andrei Andrusenko, Ivan Podluzhny, Anton Mitrofanov, Ivan Medennikov, and Yuri Matveev. "Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition." Sensors 21, no. 9 (April 28, 2021): 3063. http://dx.doi.org/10.3390/s21093063.

Full text
Abstract:
With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.
APA, Harvard, Vancouver, ISO, and other styles
9

Masasi, Gianino, James Purnama, and Maulahikmah Galinium. "Development of an on-Premise Indonesian Handwriting Recognition Backend System Using Open Source Deep Learning Solution For Mobile User." Journal of Applied Information, Communication and Technology 7, no. 2 (March 17, 2021): 91–97. http://dx.doi.org/10.33555/jaict.v7i2.109.

Full text
Abstract:
Existing handwriting recognition solution on mobile app provides off premise service which means the handwriting is processed in overseas servers. Data sent to abroad servers are not under our control and could be possibly mishandled or misused. As recognizing handwriting is a complex problem, deep learning is needed. This research has the objective of developing an on premise Indonesian handwriting recognition using open source deep learning solution. Comparison of various deep learning solution to be used in the development are done. The deep learning solution will be used to build architectures. Various database format are also compared to decide which format is suitable to gather Indonesian handwriting database. The gathered Indonesian handwriting database and built architectures are used for experiments which consists of number of Convolutional Neural Network (CNN) layers, rotation and noise data augmentation, and Gated Recurrent Unit (GRU) vs Long Short Term Memory (LSTM). Experiment results shows that rotation data augmentation is the parameter to be change to improve word accuracy and Character Error Rate (CER). The improvement is 64.8% and 23.2% to 69.6% and 20.6% respectively.
APA, Harvard, Vancouver, ISO, and other styles
10

Kaur, Jagroop, and Jaswinder Singh. "Roman to Gurmukhi Social Media Text Normalization." International Journal of Intelligent Computing and Cybernetics 13, no. 4 (October 12, 2020): 407–35. http://dx.doi.org/10.1108/ijicc-08-2020-0096.

Full text
Abstract:
PurposeNormalization is an important step in all the natural language processing applications that are handling social media text. The text from social media poses a different kind of problems that are not present in regular text. Recently, a considerable amount of work has been done in this direction, but mostly in the English language. People who do not speak English code mixed the text with their native language and posted text on social media using the Roman script. This kind of text further aggravates the problem of normalizing. This paper aims to discuss the concept of normalization with respect to code-mixed social media text, and a model has been proposed to normalize such text.Design/methodology/approachThe system is divided into two phases – candidate generation and most probable sentence selection. Candidate generation task is treated as machine translation task where the Roman text is treated as source language and Gurmukhi text is treated as the target language. Character-based translation system has been proposed to generate candidate tokens. Once candidates are generated, the second phase uses the beam search method for selecting the most probable sentence based on hidden Markov model.FindingsCharacter error rate (CER) and bilingual evaluation understudy (BLEU) score are reported. The proposed system has been compared with Akhar software and RB\_R2G system, which are also capable of transliterating Roman text to Gurmukhi. The performance of the system outperforms Akhar software. The CER and BLEU scores are 0.268121 and 0.6807939, respectively, for ill-formed text.Research limitations/implicationsIt was observed that the system produces dialectical variations of a word or the word with minor errors like diacritic missing. Spell checker can improve the output of the system by correcting these minor errors. Extensive experimentation is needed for optimizing language identifier, which will further help in improving the output. The language model also seeks further exploration. Inclusion of wider context, particularly from social media text, is an important area that deserves further investigation.Practical implicationsThe practical implications of this study are: (1) development of parallel dataset containing Roman and Gurmukhi text; (2) development of dataset annotated with language tag; (3) development of the normalizing system, which is first of its kind and proposes translation based solution for normalizing noisy social media text from Roman to Gurmukhi. It can be extended for any pair of scripts. (4) The proposed system can be used for better analysis of social media text. Theoretically, our study helps in better understanding of text normalization in social media context and opens the doors for further research in multilingual social media text normalization.Originality/valueExisting research work focus on normalizing monolingual text. This study contributes towards the development of a normalization system for multilingual text.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Character Error Rate (CER)"

1

Laryea, Joycelyn, and Nipunika Jayasundara. "Automatic Speech Recognition System for Somali in the interest of reducing Maternal Morbidity and Mortality." Thesis, Högskolan Dalarna, Mikrodataanalys, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:du-34436.

Full text
Abstract:
Developing an Automatic Speech Recognition (ASR) system for the Somali language, though not novel, is not actively explored; hence there has been no success in a model for conversational speech. Neither are related works accessible as open-source. The unavailability of digital data is what labels Somali as a low resource language and poses the greatest impediment to the development of an ASR for Somali. The incentive to develop an ASR system for the Somali language is to contribute to reducing the Maternal Mortality Rate (MMR) in Somalia. Researchers acquire interview audio data regarding maternal health and behaviour in the Somali language; to be able to engage the relevant stakeholders to bring about the needed change, these audios must be transcribed into text, which is an important step towards translation into any language. This work investigates available ASR for Somali and attempts to develop a prototype ASR system to convert Somali audios into Somali text. To achieve this target, we first identified the available open-source systems for speech recognition and selected the DeepSpeech engine for the implementation of the prototype. With three hours of audio data, the accuracy of transcription is not as required and cannot be deployed for use. This we attribute to insufficient training data and estimate that the effort towards an ASR for Somali will be more significant by acquiring about 1200 hours of audio to train the DeepSpeech engine
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Character Error Rate (CER)"

1

Strankale, Laine, and Pēteris Paikens. "OCR Challenges for a Latvian Pronunciation Dictionary." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2020. http://dx.doi.org/10.3233/faia200623.

Full text
Abstract:
This paper covers the devlopment of a custom OCR solution based on the Tesseract open source engine developed for digitization of a Latvian pronunciation dictionary where the pronunciation data is described using a large variety of diacritic markings not supported by standard OCR solutions. We describe our efforts in training a model for these symbols without the additional support of preexisting dictionaries and illustrate how word error rate (WER) and character error rate (CER) are affected by changes in the dataset content and size. We also provide an error analysis and postulate possible causes for common pitfalls. The resulting model achieved a CER of 2.07%, making it suitable for digitization of the whole dictionary in combination with heuristic post-processing and proofreading, resulting in a useful resource for further development of speech technology for Latvian.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Character Error Rate (CER)"

1

Kimura, Yoshimasa, Hiroyuki Nishi, and Eiich Mukai. "A reliability estimation method of character recognition using maximization of error-rejection rate." In 2010 IEEE Region 10 Conference (TENCON 2010). IEEE, 2010. http://dx.doi.org/10.1109/tencon.2010.5686555.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bratić, Diana, and Nikolina Stanić Loknar. "AI driven OCR: Resolving handwritten fonts recognizability problems." In 10th International Symposium on Graphic Engineering and Design. University of Novi Sad, Faculty of technical sciences, Department of graphic engineering and design,, 2020. http://dx.doi.org/10.24867/grid-2020-p82.

Full text
Abstract:
Optical Character Recognition (OCR) is the electronic or mechanical conversion of images of typed, handwritten, or printed text into machine-encoded text. Advanced systems are capable to produce a high degree of recognition accuracy for most technic fonts, but when it comes to handwritten forms there is a problem occur in recognizing certain characters and limitations with conventional OCR processes persist. It is most pronounced in ascenders (k, b, l, d, h, t) and descenders (g, j, p, q, y). If the characters are linked by ligatures, the ascending and descending strokes are even less recognizable to the scanners. In order to reduce the likelihood of a recognition error, it is a necessary to create a large database of stored characters and their glyphs. Feature extraction decomposes glyphs into features like lines, closed loops, line direction, and line intersections. A Multilayer Perceptron (MLP) neural network based on Back Propagation Neural Network (BPNN) algorithm as a method of Artificial Intelligence (AI) has been used in text identification, classification and recognition using various methods: image pattern based, text-based, mark-based etc. Also, the application of AI generates of a large database of different letter cuts, and modifications, and variation of the same letter character structure. For this purpose, the recognizability test of handwritten fonts was performed. Within main group, subgroups of independent letter characters and letter characters linked by ligatures are created, and reading errors were observed. In each subgroup, four different font families (bold stroke, alternating stroke, monoline stroke, and brush stroke) were tested. In subgroup of independent letter characters, errors were observed in similar rounded lines such as the characters a, and e. In the subgroup of letter characters linked by ligatures, errors were also observed in similar rounded lines such as the letter characters a and e, m and n, but also in ascenders b and l, and descenders g and q. Furthermore, seven letter cuts were made from each basic test letters, and up to are thin, ultra-light, light, regular, semi-bold, bold, and ultra-bold, and stored in the existing EMNIST database. The scanning test was repeated, and recently obtained results showed a decrease in the deviation rate, i.e. higher accuracy. Reducing the number of deviations shows that the neural network gives acceptable answers but requires creation of a larger database within about 56,000 different characters.
APA, Harvard, Vancouver, ISO, and other styles
3

Simeunovic´, Goran, and Ivo Bukovsky. "The Implementation of the Dynamic-Order-Extended Time-Delay Dynamic Neural Units to Heat Transfer System Modelling." In 16th International Conference on Nuclear Engineering. ASMEDC, 2008. http://dx.doi.org/10.1115/icone16-48414.

Full text
Abstract:
The paper introduces linear dynamic-order extended time-delay dynamic neural unit (DOE TmD-DNU) whose adaptation by the dynamic backpropagation learning rule is enhanced by the genetic algorithm. DOE TmD-DNU is a possible customization of novel class of artificial neurons called time-delay dynamic neural units (TmD-DNU). In standalone implementations, these artificial dynamic neural architectures can be viewed as analogies to continuous time-delay differential equations, where the equation parameters are unknown and are adaptable such as neural weights and other parameters of artificial neurons. Time delays on neural inputs of a unit and in the state feedback of a unit are also considered as unit’s adaptable neural parameters. These new neural units equipped with adaptable time delays can identify all parameters of a continuous time-delay dynamic system including unknown time delays both in the unit’s inputs as well as in its state variables. Incorporation of adaptable time delays into neural units significantly increases approximation capability of individual neural units. It results in simplification of a neural architecture and minimization of the number of neural parameters, and thus possibly in better understanding the obtained neural model. It has been shown, that stable adaptation of all parameters of TmD-DNU including time delays can be achieved by dynamic modification of backpropagation learning algorithm. However, sometimes the relatively slow convergence rate of the neural parameters and the convergence rather toward local minima of error function can be considered as drawbacks of the adaptation. This paper focuses the improvement of the backpropagation learning algorithm of TmD-DNU by the genetic algorithm and its application to heat transfer system modeling. The adaptation learning algorithm based on the simultaneous combination of dynamic backpropagation and genetic algorithm has been designed to accelerate the convergence of time-delay parameters of a neural unit and to achieve the global character of minimization of error function. The neural weights and parameters, except the time-delays, are adapted by dynamic modification of backpropagation learning algorithm, and those that represent time-delays can be adapted by the genetic algorithm. Results on system identification of an unknown system with dynamics of higher-order including unknown time delays are shown in comparison to achievements by common identification methods applied to the same system. The robust identification capabilities, the aspects of network implementation of TmD-DNU, and the prospects of their nonlinear versions, i.e. higher-order nonlinear time delay dynamic neural units (TmD-HONNU) are briefly discussed with respect to the learning technique presented in this paper.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography