Academic literature on the topic 'Thai printed text recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Thai printed text recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Thai printed text recognition"

1

Mookdarsanit, Lawankorn, and Pakpoom Mookdarsanit. "Combating the hate speech in Thai textual memes." Indonesian Journal of Electrical Engineering and Computer Science 21, no. 3 (2021): 1493–502. https://doi.org/10.11591/ijeecs.v21.i3.pp1493-1502.

Full text
Abstract:
Thai textual memes have been popular in social media, as a form of image information summarization. Unfortunately, many memes contain some hateful content that easily causes the controversy in Thailand. For global protection, the hateful memes challengeis also provided by Facebook AI to enable researchers to compete their algorithms for combating the hate speech on memes as one of NeurIPS’20 competitions. As well as in Thailand, this paper introduces the Thai textual meme detection as a new research problem in Thai natural language processing (Thai-NLP) that is the settlement of transmission linkage between scene text localization, Thai optical recognition (Thai-OCR) and language understanding. From the results, both regular and irregular text position can be localized by one-stage detection pipeline. More scene text can be augmented by different resolution and rotation. The accuracy of Thai-OCR using convolutional neural network (CNN) can be improved by recurrent neural network (RNN). Since misspelling Thai words are frequently used in social, this paper categorizes them as synonyms to train on multi-task pre-trained language model.
APA, Harvard, Vancouver, ISO, and other styles
2

Lin, Cheng-Jian, Yu-Cheng Liu, and Chin-Ling Lee. "Automatic Receipt Recognition System Based on Artificial Intelligence Technology." Applied Sciences 12, no. 2 (2022): 853. http://dx.doi.org/10.3390/app12020853.

Full text
Abstract:
In this study, an automatic receipt recognition system (ARRS) is developed. First, a receipt is scanned for conversion into a high-resolution image. Receipt characters are automatically placed into two categories according to the receipt characteristics: printed and handwritten characters. Images of receipts with these characters are preprocessed separately. For handwritten characters, template matching and the fixed features of the receipts are used for text positioning, and projection is applied for character segmentation. Finally, a convolutional neural network is used for character recognition. For printed characters, a modified You Only Look Once (version 4) model (YOLOv4-s) executes precise text positioning and character recognition. The proposed YOLOv4-s model reduces downsampling, thereby enhancing small-object recognition. Finally, the system produces recognition results in a tax declaration format, which can upload to a tax declaration system. Experimental results revealed that the recognition accuracy of the proposed system was 80.93% for handwritten characters. Moreover, the YOLOv4-s model had a 99.39% accuracy rate for printed characters; only 33 characters were misjudged. The recognition accuracy of the YOLOv4-s model was higher than that of the traditional YOLOv4 model by 20.57%. Therefore, the proposed ARRS can considerably improve the efficiency of tax declaration, reduce labor costs, and simplify operating procedures.
APA, Harvard, Vancouver, ISO, and other styles
3

Deepika Kongara. "A Framework for Character Recognition APP Using ML Kit." Journal of Information Systems Engineering and Management 10, no. 41s (2025): 940–52. https://doi.org/10.52783/jisem.v10i41s.8021.

Full text
Abstract:
Character recognition applications are pivotal in enabling real-time translation and digitization of printed and handwritten text. Text recognition can be implemented using a variety of technologies found in the field of software development, but here for Android mobile development, the ML Kit is used. The goal is to create an Android character recognition app, especially for Devanagari script (with language converter software included as a feature), using the ML kit without Firebase that will make recognizing, learning, and language translation easier and will promote stress-free communication. Using ML Kit is a boon for developers without extensive knowledge of machine learning, as it simplifies the integration of complex ML features and saves significant time in learning and implementation. This app has the ability to recognise in real time as well as storage-based recognition on both handwritten and printed scripts. The app incorporates real-time recognition and offline translation features, supporting recognition in six languages and translating text into 59 languages. This app will perform more effectively than other existing programmes since it will use optimized code for the translation and recognition process. The major goal is to have the software operate even when it is not connected to the internet.
APA, Harvard, Vancouver, ISO, and other styles
4

Et.al, Siddharth Salar. "Automate Identification and Recognition of Handwritten Text from an Image." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 3 (2021): 3800–3808. http://dx.doi.org/10.17762/turcomat.v12i3.1666.

Full text
Abstract:
Handwritten text acknowledgment is yet an open examination issue in the area of Optical Character Recognition (OCR). This paper proposes a productive methodology towards the advancement of handwritten text acknowledgment frameworks. The primary goal of this task is to create AI calculation to empower element and information extraction from records with manually written explanations, with an, expect to distinguish transcribed words on a picture.
 The main aim of this project is to extract text, this text can be handwritten text or it can machine printed text and convert it into computer understandable or wNe can say computer editable format. To implement thais project we have used PyTesseract which is an open-sourcemOCR engine used to recognize handwritten text and OpenCV a library in python used to solve computer vision problems. So the input image is executed in various steps, first there is pre-processing of an image then there is text localization after that there is character segmentation and character recognition and finally we have post-processing of image. Further image processingalgorithms can also be used to deal with the multiple characters input in a single image, tilt image, or rotated image. The prepared framework gives a normal precision of more than 95 % with the concealed test picture.
APA, Harvard, Vancouver, ISO, and other styles
5

Miyao, Hidetoshi, Yasuaki Nakano, Atsuhiko Tani, Hirosato Tabaru, and Toshihiro Hananoi. "Printed Japanese Character Recognition Using Multiple Commercial OCRs." Journal of Advanced Computational Intelligence and Intelligent Informatics 8, no. 2 (2004): 200–207. http://dx.doi.org/10.20965/jaciii.2004.p0200.

Full text
Abstract:
This paper proposes two algorithms for maintaining matching between lines and characters in text documents output by multiple commercial optical character readers (OCRs). (1) a line matching algorithm using dynamic programming (DP) matching and (2) a character matching algorithm using character string division and standard character strings. The paper proposes a method that introduces majority logic and reject processing in character recognition. To demonstrate the feasibility of the method, we conducted experiments on line matching recognition for 127 document images using five commercial OCRs. Results demonstrated that the method extracted character areas with more accuracy than a single OCR along with appropriate line matching. The proposed method enhanced recognition from 97.61% provided by a single OCR to 98.83% in experiments using the character matching algorithm and character recognition. This method is expected to be highly useful in correcting locations at which unwanted lines or characters occur or required lines or characters disappear.
APA, Harvard, Vancouver, ISO, and other styles
6

Sable, Prof A. V., Avantika Patil, Mayur Rathi, and Ayush Shriwas. "Interpreting Doctor Notes using Handwriting Recognition." International Journal for Research in Applied Science and Engineering Technology 12, no. 4 (2024): 3118–23. http://dx.doi.org/10.22214/ijraset.2024.60663.

Full text
Abstract:
Abstract: Handwriting recognition of medical prescriptions has been a challenging problem over the recent years with constant research in providing possible accurate solutions. Indecipherable handwritten prescription and inefficiency of Pharmacist to understand the medical prescription can lead to serious and harmful effect to the patients. Even in the recognition of handwriting, mainly doctors notes, they are very difficult for everyone to understand and it takes time for a person to analyse it. So, this idea mainly focused on interpreting doctor’s notes using handwritten recognition and deep learning techniques. The handwritten or printed document pictures are transformed into their electronic counterparts using an optical character recognition (OCR) system. Due to individuals' inconsistent writing styles, dealing with handwritten texts is significantly more difficult than dealing with printed ones. Handwritten text recognition could be done by Image processing, Machine Learning or Deep Learning Techniques. Out of these Deep Learning remains to be the most popular and prominent. Some of the Deep Learning techniques includes Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). This gives a review of the various recognition methodologies used for interpreting handwritten texts. It includes the most important algorithms that could be used for detecting the handwritten word/text/character by using various approaches for the recognition process. In the end we are thus comparing the accuracies provided by these systems.
APA, Harvard, Vancouver, ISO, and other styles
7

BS, Ujwala, and Ujwala K. "A REVIEW PAPER ON OCR USING CONVOLUTIONAL NEURAL NETWORKS." International Journal of Engineering Applied Sciences and Technology 7, no. 7 (2022): 102–6. http://dx.doi.org/10.33564/ijeast.2022.v07i07.018.

Full text
Abstract:
this paper presents a literature review on OCR for different languages using convolutional neural network techniques. Optical Character Recognition is the process of converting an input text image into a machine encoded format. Different methods are used in OCR for different languages. The main steps of optical character recognition are pre-processing, segmentation and recognition. Recognizing handwritten text is harder than recognizing printed text. Convolutional Neural Network has shown remarkable improvement in recognizing characters of different languages. The novelty of the OCR is its robustness to image quality, image contrast, font style and font size. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms.
APA, Harvard, Vancouver, ISO, and other styles
8

Shafique, A. Awan, Nawaz Hakro Dil, Lashari Intzar, H. Jalbani Akhtar, and Hameed Maryam. "A Complete Off-line Sindhi Handwritten Text Recognition: A Survey." International Journal of Management Sciences and Business Research 6, no. 4 (2017): 131–38. https://doi.org/10.5281/zenodo.3469359.

Full text
Abstract:
Artificial Intelligence is finding ways to make machines more intelligent and work like human being. Image processing, Natural language processing and Optical Character Recognition (OCR) are the active fields of computer vision, where the computers are made more versatile to understand, read and write natural human languages spoken around the word. Optical Characters Recognition (OCR) and Intelligent Characters Recognition (ICR) differ in recognizing printed and handwritten characters respectively. Intelligent Characters Recognition (ICR) is an active field in which handwritten characters are converted into editable text from the image, and remain the point of interest for researchers around the world. Many of the languages of the world possess their Intelligent Characters Recognition (ICR) or their ICR systems are in process. Latin scripts possess their ICR and are near to perfect whereas Arabic script and its adopting languages need more attention for the development of ICR systems. Sindhi language is a language having rich background and culture of more than 5000 years still lacks the ICR system. As there is no any handwritten recognition system for Sindhi Language, so there is no handwritten database is available for testing and training. Enhanced segmentation and feature extraction algorithms are needed which can fully suit with Sindhi script. An integrated handwritten system will be the output of this system in which handwritten text is recognized and editable text will be available for the further processing.
APA, Harvard, Vancouver, ISO, and other styles
9

Shafiro, Valeriy, Daniel Fogerty, Kimberly Smith, and Stanley Sheft. "Perceptual Organization of Interrupted Speech and Text." Journal of Speech, Language, and Hearing Research 61, no. 10 (2018): 2578–88. http://dx.doi.org/10.1044/2018_jslhr-h-17-0477.

Full text
Abstract:
Purpose Visual recognition of interrupted text may predict speech intelligibility under adverse listening conditions. This study investigated the nature of the linguistic information and perceptual processes underlying this relationship. Method To directly compare the perceptual organization of interrupted speech and text, we examined the recognition of spoken and printed sentences interrupted at different rates in 14 adults with normal hearing. The interruption method approximated deletion and retention of rate-specific linguistic information (0.5–64 Hz) in speech by substituting either white space or silent intervals for text or speech in the original sentences. Results A similar U-shaped pattern of cross-rate variation in performance was observed in both modalities, with minima at 2 Hz. However, at the highest and lowest interruption rates, recognition accuracy was greater for text than speech, whereas the reverse was observed at middle rates. An analysis of word duration and the frequency of word sampling across interruption rates suggested that the location of the function minima was influenced by perceptual reconstruction of whole words. Overall, the findings indicate a high degree of similarity in the perceptual organization of interrupted speech and text. Conclusion The observed rate-specific variation in the perception of speech and text may potentially affect the degree to which recognition accuracy in one modality is predictive of the other.
APA, Harvard, Vancouver, ISO, and other styles
10

Rakesh T M. "Hybrid CNN-BiLSTM with CTC for Enhanced Text Recognition in Complex Background Images." Journal of Information Systems Engineering and Management 10, no. 50s (2025): 89–102. https://doi.org/10.52783/jisem.v10i50s.10121.

Full text
Abstract:
The problems that robotic reading of text faces such as poor light, messy backgrounds and blurriness, resemble those found in human vision. Addressing these concerns results in applications such as document digitization and assistive technology. The study introduces a way to help identify text by joining CNNs, BiLSTMs and a CTC decoder. This CNN part is able to detect spatial features of text even from crowded images, while BiLSTMs help recognize text printed in different styles, turned over and in varying sizes. Because the CTC decoder does not require separate segmentation of characters, the text is aligned accurately. On ICDAR 2015 and SVT datasets, the approach demonstrated by this study shows very high accuracy of 98.50% and 98.80%. Quality measurements reveal high accuracy of the model on motion-blurred (no more than 15 pixels), partially occluded (40%) and distorted (half of text is skewed by up to 30 degrees) images. It proposes a method that helps to identify text by using CNNs, BiLSTMs and a CTC decoder.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Thai printed text recognition"

1

Namane, Abderrahmane. "Degraded printed text and handwritten recognition methods : Application to automatic bank check recognition." Université Louis Pasteur (Strasbourg) (1971-2008), 2007. http://www.theses.fr/2007STR13048.

Full text
Abstract:
La reconnaissance des caractères est une étape importante dans tout système de reconnaissances de document. Cette reconnaissance de caractère est considérée comme un problème d'affectation et de décision de caractères, et a fait l'objet de recherches dans de nombreuses disciplines. Cette thèse porte principalement sur la reconnaissance du caractère imprimé dégradé et manuscrit. De nouvelles solutions ont été apportées au domaine de l'analyse du document image (ADI). On trouve en premier lieu, le développement de deux méthodes de reconnaissance du chiffre manuscrit, notamment, la méthode basée sur l'utilisation de la transformée de Fourier-Mellin (TFM) et la carte auto-organisatrice (CAO), et l'utilisation de la combinaison parallèle basée sur les HMMs comme classificateurs de bases, avec comme extracteur de paramètres une nouvelle technique de projection. En deuxième lieu, on trouve une nouvelle méthode de reconnaissance holistique de mots manuscrits appliquée au montant légal Français. En troisième lieu, deux travaux basés sur les réseaux de neurones ont étés réalisés sur la reconnaissance du caractère imprimé dégradé et appliqués au chèque postal Algérien. Le premier travail est basé sur la combinaison séquentielle et le deuxième a fait l'objet d'une combinaison série basé sur l'introduction d'une distance relative pour la mesure de qualité du caractère dégradé. Lors de l'élaboration de ce travail, des méthodes de prétraitement ont été aussi développées, notamment, la correction de l'inclinaison du chiffre manuscrit, la détection de la zone centrale du mot manuscrit ainsi que sa pente<br>Character recognition is a significant stage in all document recognition systems. Character recognition is considered as an assignment problem and decision of a given character, and is an active research subject in many disciplines. This thesis is mainly related to the recognition of degraded printed and handwritten characters. New solutions were brought to the field of document image analysis (DIA). The first solution concerns the development of two recognition methods for handwritten numeral character, namely, the method based on the use of Fourier-Mellin transform (FMT) and the self-organization map (SOM), and the parallel combination of HMM-based classifiers using as parameter extraction a new projection technique. In the second solution, one finds a new holistic recognition method of handwritten words applied to French legal amount. The third solution presents two recognition methods based on neural networks for the degraded printed character applied to the Algerian postal check. The first work is based on sequential combination and the second used a serial combination based mainly on the introduction of a relative distance for the quality measurement of the degraded character. During the development of this thesis, methods of preprocessing were also developed, in particular, the handwritten numeral slant correction, the handwritten word central zone detection and its slope
APA, Harvard, Vancouver, ISO, and other styles
2

Sae-Tang, Sutat. "A systematic study of offline recognition of Thai printed and handwritten characters." Thesis, University of Southampton, 2011. https://eprints.soton.ac.uk/206079/.

Full text
Abstract:
Thai characters pose some unique problems, which differ from English and other oriental scripts. The structure of Thai characters consists of small loops combined with curves and there is an absence of spaces between each word and sentence. In each line, moreover, Thai characters can be composed on four levels, depending on the type of character being written. This research focuses on OCR for the Thai language: printed and offline handwritten character recognition. An attempt to overcome the problems by simple but effective methods is the main consideration. A printed OCR developed by the National Electronics and Computer Technology Center (NECTEC) uses Kohonen self- organising maps (SOMs) for rough classification and back-propagation neural networks for fine classification. An evaluation of the NECTEC OCR is performed on a printed dataset that contains over 0.6 million tokens. Comparisons of the classifier, with and without the aspect ratio, and with and without SOMs, yield small, but statistically significant differences in recognition rate. A very straightforward classifier, the nearest neighbour, was examined to evaluate overall recognition performance and to compare with the classifier. It shows a significant improvement in recognition rate (about 98%) over the NECTEC classifier (about 96%) on both the original and distorted data (rotated and noisy), but at the expense of longer recognition times. For offline handwritten character recognition, three different classifiers are evaluated on three different datasets that contain, on average, approximately 10,000 tokens each. The neural network and HMMs are more effective and give higher recognition rates than the nearest neighbour classifier on three datasets. The best result obtained from the HMMs is 91.1% on ThaiCAM dataset. However, when evaluated on a different dataset, the recognition rates drastically reduce, due to differences in many aspects of online and offline handwritten data. An improvement in classification rates was obtained by adjusting the stroke width of a character in the online handwritten dataset (12 percentage points) and combining the training sets from the three datasets (7.6 percentage points). A boosting algorithm called AdaBoost yields a slight improvement in recognition rate (1.2 percentage points) over the original classifiers (without applying the AdaBoost algorithm).
APA, Harvard, Vancouver, ISO, and other styles
3

Al-Muhtaseb, Husni Abdulghani. "Arabic text recognition of printed manuscripts : efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/4426.

Full text
Abstract:
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.
APA, Harvard, Vancouver, ISO, and other styles
4

Al-Muhtaseb, Husni A. "Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/4426.

Full text
Abstract:
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.<br>King Fahd University of Petroleum and Minerals (KFUPM)
APA, Harvard, Vancouver, ISO, and other styles
5

Al-Muhtaseb, Husni A., Sabri A. Mahmoud, and Rami S. R. Qahwaji. "Recognition of off-line printed Arabic text using Hidden Markov Models." Elsevier, 2008. http://hdl.handle.net/10454/4105.

Full text
Abstract:
yes<br>This paper describes a technique for automatic recognition of off-line printed Arabic text using Hidden Markov Models. In this work different sizes of overlapping and non-overlapping hierarchical windows are used to generate 16 features from each vertical sliding strip. Eight different Arabic fonts were used for testing (viz. Arial, Tahoma, Akhbar, Thuluth, Naskh, Simplified Arabic, Andalus, and Traditional Arabic). It was experimentally proven that different fonts have their highest recognition rates at different numbers of states (5 or 7) and codebook sizes (128 or 256). Arabic text is cursive, and each character may have up to four different shapes based on its location in a word. This research work considered each shape as a different class, resulting in a total of 126 classes (compared to 28 Arabic letters). The achieved average recognition rates were between 98.08% and 99.89% for the eight experimental fonts. The main contributions of this work are the novel hierarchical sliding window technique using only 16 features for each sliding window, considering each shape of Arabic characters as a separate class, bypassing the need for segmenting Arabic text, and its applicability to other languages.
APA, Harvard, Vancouver, ISO, and other styles
6

Lund, William B. "Ensemble Methods for Historical Machine-Printed Document Recognition." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4024.

Full text
Abstract:
The usefulness of digitized documents is directly related to the quality of the extracted text. Optical Character Recognition (OCR) has reached a point where well-formatted and clean machine- printed documents are easily recognizable by current commercial OCR products; however, older or degraded machine-printed documents present problems to OCR engines resulting in word error rates (WER) that severely limit either automated or manual use of the extracted text. Major archives of historical machine-printed documents are being assembled around the globe, requiring an accurate transcription of the text for the automated creation of descriptive metadata, full-text searching, and information extraction. Given document images to be transcribed, ensemble recognition methods with multiple sources of evidence from the original document image and information sources external to the document have been shown in this and related work to improve output. This research introduces new methods of evidence extraction, feature engineering, and evidence combination to correct errors from state-of-the-art OCR engines. This work also investigates the success and failure of ensemble methods in the OCR error correction task, as well as the conditions under which these ensemble recognition methods reduce the Word Error Rate (WER), improving the quality of the OCR transcription, showing that the average document word error rate can be reduced below the WER of a state-of-the-art commercial OCR system by between 7.4% and 28.6% depending on the test corpus and methods. This research on OCR error correction contributes within the larger field of ensemble methods as follows. Four unique corpora for OCR error correction are introduced: The Eisenhower Communiqués, a collection of typewritten documents from 1944 to 1945; The Nineteenth Century Mormon Articles Newspaper Index from 1831 to 1900; and two synthetic corpora based on the Enron (2001) and the Reuters (1997) datasets. The Reverse Dijkstra Heuristic is introduced as a novel admissible heuristic for the A* exact alignment algorithm. The impact of the heuristic is a dramatic reduction in the number of nodes processed during text alignment as compared to the baseline method. From the aligned text, the method developed here creates a lattice of competing hypotheses for word tokens. In contrast to much of the work in this field, the word token lattice is created from a character alignment, preserving split and merged tokens within the hypothesis columns of the lattice. This alignment method more explicitly identifies competing word hypotheses which may otherwise have been split apart by a word alignment. Lastly, this research explores, in order of increasing contribution to word error rate reduction: voting among hypotheses, decision lists based on an in-domain training set, ensemble recognition methods with novel feature sets, multiple binarizations of the same document image, and training on synthetic document images.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Thai printed text recognition"

1

Stanley, Timothy. Printing Religion after the Enlightenment. Rowman & Littlefield, 2022. https://doi.org/10.5040/9781978721517.

Full text
Abstract:
Over the course of the seventeenth to eighteenth centuries, an interior private notion of religion gained wide public recognition. It then spread through settler colonial contexts around the world. It has since been criticized for its abstract, immaterial nature as well as its irrelevance to traditions beyond the European context. However, such critiques obscure the contradiction between religion’s definition as a matter of interior privacy and its public visibility in various printed publications. Timothy Stanley responds by re-evaluating the cultural impact of the exterior forms in which religious texts were printed, such as pamphlets, broadsheets, books, and journals. He also applies that evidence to critical studies of religion shaped by the crisis of representation in the human sciences. While Jacques Derrida is oft-cited as a progenitor of that crisis, the opposite case is made. Additionally, Stanley draws on Derrida’s thought to reframe the relation between a religious text’s internal hermeneutic interests and its external forms. In sum, this book provides a new model of how people printed religion in ways that can be compared to other material cultures around the world.
APA, Harvard, Vancouver, ISO, and other styles
2

Stanton, Edward F., and Daniel Stanton, eds. Contemporary Hispanic Quotations. Greenwood, 2003. http://dx.doi.org/10.5040/9798216962670.

Full text
Abstract:
This is the first collection of quotations from Hispanics who have made their mark on the world. Included are more than 1,000 quotations from over 200 notable Hispanics—writers, politicians, artists, entertainers, activists, physicians, educators, soldiers, and others. The editors have culled quotations from a variety of print and non-print sources, though some original quotations are included. The editors have culled quotations from a variety of print and non-print sources, though some original quotations are included. Two special features add variety to the volume, containing excerpts from poetry and fiction, anonymous graffiti, and proverbs. Numerous photos accompany the text. Also included are a small number of quotations from noteworthy Latin Americans who have received significant recognition from Americans, making this a rich, inspirational resource for students and general readers alike.
APA, Harvard, Vancouver, ISO, and other styles
3

Schadee, Hester. ‘I Don’t Know Who You Call Tyrants’. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780199394852.003.0011.

Full text
Abstract:
This chapter returns to Latin discourse, seeking to define what fifteenth-century humanist treatments of tyranny have in common, and what distinguishes them from their classical and medieval counterparts. To this end, the chapter confronts the numerous (self-)contradictions in the works of Poggio Bracciolini, educated in republican Florence, and Giovanni Pontano, employed in Naples by the royal dynasty. While their arguments range from the rejection of all rulers as tyrants to the education of the ideal prince, both authors depend on the same philosophical frameworks (Aristotle and the Stoa) and rhetorical form (epideictic oratory). Typical for the Quattrocento, these texts make no claim to universal validity, and the chapter argues that they should be read with due recognition of the conventions of literary genre and humanism’s culture of debate.
APA, Harvard, Vancouver, ISO, and other styles
4

Markovits, Stefanie. The Number Sense of Nineteenth-Century British Literature. Oxford University PressOxford, 2025. https://doi.org/10.1093/9780198937821.001.0001.

Full text
Abstract:
Abstract The Number Sense of Nineteenth-Century British Literature considers how the avalanche of printed numbers characterizing the period affected its literature. While it touches on the rise of statistics and developments in politics and mathematics, this book takes as its starting point the presence of actual numbers—ordinal and cardinal, Arabic, Roman, or spelled out in words—within the century’s literary texts. It is through the deployment of such figures that texts display their number sense; similarly, readers develop the faculty of number sense by paying attention to their presence. And while it often takes us back to a specific historical context, attention to a text’s use of numbers also enables more fundamental recognitions about how literature makes meaning. The book asks what kinds of work, intellectual and ethical, literature’s numerical figures perform. Why are some writers especially numbery? What affordances do numbers wield in various literary environments and against a specific historical backdrop? How do they relate to aspects like plot and character, narrative and lyric? How do they interact with seriality, so central to nineteenth-century publication? When do the numbers really count, and when do they ask us to keep count? Lingering over texts’ measures illuminates the way numbers help shape literary works into the recognizable forms we call genres; one marks both lyric and the Bildungsroman but looks very different in each setting. Number sense uncovers how numbers can serve both as valves, releasing cultural pressures, and as fulcrums, places where pressures coincide to create new forms of literary agency.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Thai printed text recognition"

1

Filì, Valeria. "Introduzione." In La lingua italiana in una prospettiva di genere. Firenze University Press, 2023. http://dx.doi.org/10.36253/979-12-215-0138-4.07.

Full text
Abstract:
In introducing the first session, the A. stresses that gender issues are now central to our country’s cultural, political, and social-economic agenda, and how the promotion of inclusive language is important for the social and legal recognition of categories of discriminated or otherwise marginalized. by an autograph manuscript by Lope and another one transmitted through printed books and sueltas. On the other hand, I will focus my attention on the dialogues in the play, since they are useful to better understand the work of composition and preparation of the copy of the comedia by the playwright, as well as the company’s work on the text.
APA, Harvard, Vancouver, ISO, and other styles
2

Ahmed, Irfan, Sabri A. Mahmoud, and Mohammed Tanvir Parvez. "Printed Arabic Text Recognition." In Guide to OCR for Arabic Scripts. Springer London, 2012. http://dx.doi.org/10.1007/978-1-4471-4072-6_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Woraratpanya, Kuntpong, and Taravichet Titijaroonrog. "Printed Thai Character Recognition Using Standard Descriptor." In The 9th International Conference on Computing and InformationTechnology (IC2IT2013). Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37371-8_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

De Gregorio, Mario. "«Un arancio in gennaio». La Vita del beato Giovanni Colombini di Feo Belcari da racconto agiografico a testo di lingua." In Le vestigia dei gesuati. Firenze University Press, 2020. http://dx.doi.org/10.36253/978-88-5518-228-7.10.

Full text
Abstract:
Composed between 1448 and 1449, the Vita del beato Giovanni Colombini by Feo Belcari was printed for the first time in 1477 and was destined to growing publishing success over the following centuries both as an independent text and in miscellany dedicated to the blessed. It will be the third edition of the Accademia della Crusca Vocabolario, published in 1691, to increase its affirmation by admitting the entire poetic and prose work of Belcari among the privileged references for the Tuscan vernacular and the Italian language. A recognition that will lead to a Veronese edition of the work in 1817, edited by authoritative scholars of Italian humanistic literature, which will condition many reprints of the work during the first half of the nineteenth century and which will persist even later, when it will be included in series very different, between religious and linguistic / literary interests.
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Chun, Hongjian Zhan, Kun Zhao, and Yue Lu. "Thai Scene Text Recognition with Character Combination." In Pattern Recognition and Computer Vision. Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-18913-5_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Vijay Kumar, B., and A. G. Ramakrishnan. "Machine Recognition of Printed Kannada Text." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2002. http://dx.doi.org/10.1007/3-540-45869-7_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jaiem, Faten Kallel, Slim Kanoun, Maher Khemakhem, Haikal El Abed, and Jihain Kardoun. "Database for Arabic Printed Text Recognition Research." In Image Analysis and Processing – ICIAP 2013. Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-41181-6_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Woraratpanya, Kuntpong, and Taravichet Titijaroonroj. "Adaptive Histogram of Oriented Gradient for Printed Thai Character Recognition." In Advances in Intelligent Systems and Computing. Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-06538-0_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Amin, Adnan. "Recognition of Printed Arabic Text via Machine Learning." In International Conference on Advances in Pattern Recognition. Springer London, 1999. http://dx.doi.org/10.1007/978-1-4471-0833-7_32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Eglin, Veronique, Stéphane Bres, and Hubert Emptoz. "Characterization and classification of printed text in a multiscale context." In Advances in Pattern Recognition. Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0033325.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Thai printed text recognition"

1

Mudgal, Ananya, Anshul Sharma, and Yugnanda Malhotra. "Bridging the Gap: Towards Contextualised Optical Character Recognition using Large Language Models." In Frontiers in Optics. Optica Publishing Group, 2024. https://doi.org/10.1364/fio.2024.jd4a.106.

Full text
Abstract:
Optical Character Recognition is used to convert handwritten/printed text to digitised text, but it lacks refinement. We propose to integrate it with a Large Language Model, to create a context-driven word/sentence detection program.
APA, Harvard, Vancouver, ISO, and other styles
2

Yu, Haiyang, Xiaocong Wang, Bin Li, and Xiangyang Xue. "Orientation-Independent Chinese Text Recognition in Scene Images." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/185.

Full text
Abstract:
Scene text recognition (STR) has attracted much attention due to its broad applications. The previous works pay more attention to dealing with the recognition of Latin text images with complex backgrounds by introducing language models or other auxiliary networks. Different from Latin texts, many vertical Chinese texts exist in natural scenes, which brings difficulties to current state-of-the-art STR methods. In this paper, we take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images, thus recognizing both horizontal and vertical texts robustly in natural scenes. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information. We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information. To further validate the effectiveness of our method, we additionally collect a Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show that the proposed method achieves 45.63\% improvement on VCTR when introducing CIRN to the baseline model.
APA, Harvard, Vancouver, ISO, and other styles
3

Klitz, T. S., J. S. Mansfield, and G. E. Legge. "Font “pop out” in text images." In OSA Annual Meeting. Optica Publishing Group, 1992. http://dx.doi.org/10.1364/oam.1992.thvv5.

Full text
Abstract:
What is the role of font information in reading? Words printed in a bold or italic font “pop out” from a page of text, suggesting that fonts are involved in global text page analysis. We have investigated the perceptual distinctiveness of different pairs of fonts. Text stimuli (subtending 17° × 13°) were rendered entirely with a single font or with a 6.5° × 6.5° target region in a different font from the remaining text. In each trial, the subject indicated whether the stimulus contained a target region. The decision reaction time was recorded. Every pairwise combination of AvantGarde, Bookman, Courier, Helvetica, Helvetica-Narrow, New Century Schoolbook, Palatino, and Times was tested. All seven subjects showed the same pattern of reaction times for detecting targets in each font combination. Further, the font pairs that gave the fastest reaction times also “popped out” in a second experiment in which the text was blurred (so that letter recognition was impossible). These results suggest that font analysis is a fast, global process that may use spatial frequencies lower than those critical for letter recognition.
APA, Harvard, Vancouver, ISO, and other styles
4

"The Efficacy of Tesseract OCR: Insights from a Practical Application Study." In International Conference on Cutting-Edge Developments in Engineering Technology and Science. ICCDETS, 2024. http://dx.doi.org/10.62919/hdsg3874.

Full text
Abstract:
— This study evaluates the efficacy of the Tesseract Optical Character Recognition (OCR) system through a practical application lens. Tesseract, an open-source OCR tool, is widely recognized for its adaptability and broad usage across various digital imaging and text recognition domains. This paper explores Tesseract's performance in converting scanned documents into editable text formats, emphasizing its accuracy, efficiency, and usability in diverse scenarios, including complex document layouts and varied text quality. By conducting systematic tests across multiple data sets, including printed and handwritten texts, the study provides quantitative and qualitative assessments of Tesseract's capabilities. Additionally, comparative analysis with other leading OCR tools offers a comprehensive understanding of Tesseract's positioning in the OCR landscape. The results indicate that while Tesseract performs robustly in standard text recognition tasks, challenges remain in handling intricately styled fonts and backgrounds. The paper concludes with suggestions for potential improvements and areas for future research, aiming to enhance the practical applications of OCR technology.
APA, Harvard, Vancouver, ISO, and other styles
5

Chomphuwiset, Phatthanaphong. "Printed thai character segmentation and recognition." In 2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI). IEEE, 2017. http://dx.doi.org/10.1109/iscmi.2017.8279611.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Pornpanomchai, Chomtip, and Montri Daveloh. "Printed Thai Character Recognition by Genetic Algorithm." In 2007 International Conference on Machine Learning and Cybernetics. IEEE, 2007. http://dx.doi.org/10.1109/icmlc.2007.4370727.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Bratić, Diana, and Nikolina Stanić Loknar. "AI driven OCR: Resolving handwritten fonts recognizability problems." In 10th International Symposium on Graphic Engineering and Design. University of Novi Sad, Faculty of technical sciences, Department of graphic engineering and design,, 2020. http://dx.doi.org/10.24867/grid-2020-p82.

Full text
Abstract:
Optical Character Recognition (OCR) is the electronic or mechanical conversion of images of typed, handwritten, or printed text into machine-encoded text. Advanced systems are capable to produce a high degree of recognition accuracy for most technic fonts, but when it comes to handwritten forms there is a problem occur in recognizing certain characters and limitations with conventional OCR processes persist. It is most pronounced in ascenders (k, b, l, d, h, t) and descenders (g, j, p, q, y). If the characters are linked by ligatures, the ascending and descending strokes are even less recognizable to the scanners. In order to reduce the likelihood of a recognition error, it is a necessary to create a large database of stored characters and their glyphs. Feature extraction decomposes glyphs into features like lines, closed loops, line direction, and line intersections. A Multilayer Perceptron (MLP) neural network based on Back Propagation Neural Network (BPNN) algorithm as a method of Artificial Intelligence (AI) has been used in text identification, classification and recognition using various methods: image pattern based, text-based, mark-based etc. Also, the application of AI generates of a large database of different letter cuts, and modifications, and variation of the same letter character structure. For this purpose, the recognizability test of handwritten fonts was performed. Within main group, subgroups of independent letter characters and letter characters linked by ligatures are created, and reading errors were observed. In each subgroup, four different font families (bold stroke, alternating stroke, monoline stroke, and brush stroke) were tested. In subgroup of independent letter characters, errors were observed in similar rounded lines such as the characters a, and e. In the subgroup of letter characters linked by ligatures, errors were also observed in similar rounded lines such as the letter characters a and e, m and n, but also in ascenders b and l, and descenders g and q. Furthermore, seven letter cuts were made from each basic test letters, and up to are thin, ultra-light, light, regular, semi-bold, bold, and ultra-bold, and stored in the existing EMNIST database. The scanning test was repeated, and recently obtained results showed a decrease in the deviation rate, i.e. higher accuracy. Reducing the number of deviations shows that the neural network gives acceptable answers but requires creation of a larger database within about 56,000 different characters.
APA, Harvard, Vancouver, ISO, and other styles
8

Suwanbandit, Artit, Jaturong Chitiyaphol, Sutthinan Chuenchom, et al. "Thai-Dialect: Low Resource Thai Dialectal Speech to Text Corpora." In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023. http://dx.doi.org/10.1109/asru57964.2023.10389792.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Chamchong, Rapeeporn, Wei Gao, and Mark D. McDonnell. "Thai Handwritten Recognition on Text Block-Based from Thai Archive Manuscripts." In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2019. http://dx.doi.org/10.1109/icdar.2019.00217.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Srinilta, Chutimet, and Suchakree Chatpoch. "Multi-task Learning and Thai Handwritten Text Recognition." In 2020 6th International Conference on Engineering, Applied Sciences and Technology (ICEAST). IEEE, 2020. http://dx.doi.org/10.1109/iceast50382.2020.9165315.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography