Academic literature on the topic 'Tesseract ocr engine'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Tesseract ocr engine.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Tesseract ocr engine"

1

Chesley, Emily, Jillian Marcantonio, and Abigail Pearson. "Towards Syriac Digital Corpora: Evaluation of Tesseract 4.0 for Syriac OCR." Hugoye: Journal of Syriac Studies 22, no. 1 (2019): 109–92. http://dx.doi.org/10.31826/hug-2019-220105.

Full text
Abstract:
Abstract This paper summarizes the results of an extensive test of Tesseract 4.0, an open-source Optical Character Recognition (OCR) engine with Syriac capabilities, and ascertains the current state of Syriac OCR technology. Three popular print types (S14, W64, and E22) representing the Syriac type styles Estrangela, Serto, and East Syriac were OCRed using Tesseract’s two different OCR modes (Syriac Language and Syriac Script). Handwritten manuscripts were also preliminarily tested for OCR. The tests confirm that Tesseract 4.0 may be relied upon for printed Estrangela texts but should be used
APA, Harvard, Vancouver, ISO, and other styles
2

Mubeen, Dr Suraya, Jally Brahmani, Datha Pavan Kalyan, Ayesha Jagirdar, and A. Praveen Kumar. "Optical Character Recognition Using Tesseract." International Journal for Research in Applied Science and Engineering Technology 10, no. 11 (2022): 672–75. http://dx.doi.org/10.22214/ijraset.2022.47414.

Full text
Abstract:
Abstract: Optical Character Recognition (OCR) is a process or technology in which text within a digital image is recognized. With rapid pace of technology, people want quicker, handy and reliable tools, which can fulfil their daily needs. With this moto we had gone forward and analyzed the existing tools and made up this Android App, which provides seamless experience (No ads and easy-to-use), and great accuracy. The main objective of this project is to allow automatic extraction of the information that a user wants from the paper document and using it wherever it is needed. In this project, O
APA, Harvard, Vancouver, ISO, and other styles
3

Benaissa, Ali, Abdelkhalak Bahri, Ahmad El Allaoui, and My Abdelouahab Salahddine. "Build a Trained Data of Tesseract OCR engine for Tifinagh Script Recognition." Data and Metadata 2 (December 9, 2023): 185. http://dx.doi.org/10.56294/dm2023185.

Full text
Abstract:
This article introduces a methodology for constructing a trained dataset to facilitate Tifinagh script recognition using the Tesseract OCR engine. The Tifinagh script, widely used in North Africa, poses a challenge due to the lack of built-in recognition capabilities in Tesseract. To overcome this limitation, our approach focuses on image generation, box generation, manual editing, charset extraction, and dataset compilation. By leveraging Python scripting, specialized software tools, and Tesseract's training utilities, we systematically create a comprehensive dataset for Tifinagh script recog
APA, Harvard, Vancouver, ISO, and other styles
4

Tiwari, Anurag. "Data Extraction from Images through OCR." International Journal for Research in Applied Science and Engineering Technology 9, no. VIII (2021): 435–37. http://dx.doi.org/10.22214/ijraset.2021.37377.

Full text
Abstract:
The paperwork used in maintaining various types of documents in our daily lives is tiresome and inefficient, it consumes a lot of time and it is difficult to maintain and remember the concerned documents. This project provides a solution to these problems by introducing Optical Character Recognition Technology (OCR) which runs on Tesseract OCR Engine. The project specifically aims at increasing data accessibility, usability and improving customer experience by decreasing the time spent to process, save, and maintain user data. Another objective of this project is to nullify the human error, wh
APA, Harvard, Vancouver, ISO, and other styles
5

Patience, Okechukwu Ogochukwu, Eziechina Malachy Amaechi, Onyemachi George, and Onuwa Nnachi Isaac. "Enhanced Text Recognition in Images Using Tesseract OCR within the Laravel Framework." Asian Journal of Research in Computer Science 17, no. 9 (2024): 58–69. http://dx.doi.org/10.9734/ajrcos/2024/v17i9499.

Full text
Abstract:
This research explores the integration of Tesseract OCR (Optical Character Recognition) within the Laravel framework to enhance text recognition capabilities in images. Tesseract OCR, an open-source OCR engine, is renowned for its accuracy and efficiency in converting various image formats into editable and searchable text. However, leveraging its full potential within a robust web application framework presents unique challenges and opportunities. This implementation focuses on creating a seamless, user-friendly application that processes images uploaded by users and accurately extracts text
APA, Harvard, Vancouver, ISO, and other styles
6

Joshi, Kartik. "Study of Tesseract OCR." GLS KALP: Journal of Multidisciplinary Studies 1, no. 2 (2024): 41–50. http://dx.doi.org/10.69974/glskalp.01.02.54.

Full text
Abstract:
In the current Internet and Digitization era, a huge amount of information is available in different forms like books, newspapers, etc. To preserve the contents of such documents, these documents are converted to a digital format by scanning them as images. Detection of text from the scanned images and correct identification of characters is a challenging problem in such cases. Tesseract is a recognition engine based upon open source license which uses some novel techniques for optical character recognition. Tesseract has been designed to recognize more than 100 languages. Few of these languag
APA, Harvard, Vancouver, ISO, and other styles
7

Clausner, Christian, Apostolos Antonacopoulos, and Stefan Pletschacher. "Efficient and effective OCR engine training." International Journal on Document Analysis and Recognition (IJDAR) 23, no. 1 (2019): 73–88. http://dx.doi.org/10.1007/s10032-019-00347-8.

Full text
Abstract:
Abstract We present an efficient and effective approach to train OCR engines using the Aletheia document analysis system. All components required for training are seamlessly integrated into Aletheia: training data preparation, the OCR engine’s training processes themselves, text recognition, and quantitative evaluation of the trained engine. Such a comprehensive training and evaluation system, guided through a GUI, allows for iterative incremental training to achieve best results. The widely used Tesseract OCR engine is used as a case study to demonstrate the efficiency and effectiveness of th
APA, Harvard, Vancouver, ISO, and other styles
8

Alan Jiju, Shaun Tuscano, and Chetana Badgujar. "OCR Text Extraction." International Journal of Engineering and Management Research 11, no. 2 (2021): 83–86. http://dx.doi.org/10.31033/ijemr.11.2.11.

Full text
Abstract:
This research tries to find out a methodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on – such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritt
APA, Harvard, Vancouver, ISO, and other styles
9

Sporici, Dan, Elena Cușnir, and Costin-Anton Boiangiu. "Improving the Accuracy of Tesseract 4.0 OCR Engine Using Convolution-Based Preprocessing." Symmetry 12, no. 5 (2020): 715. http://dx.doi.org/10.3390/sym12050715.

Full text
Abstract:
Optical Character Recognition (OCR) is the process of identifying and converting texts rendered in images using pixels to a more computer-friendly representation. The presented work aims to prove that the accuracy of the Tesseract 4.0 OCR engine can be further enhanced by employing convolution-based preprocessing using specific kernels. As Tesseract 4.0 has proven great performance when evaluated against a favorable input, its capability of properly detecting and identifying characters in more realistic, unfriendly images is questioned. The article proposes an adaptive image preprocessing step
APA, Harvard, Vancouver, ISO, and other styles
10

Ibrahim, Ahmed. "Dhivehi OCR: Character Recognition of Thaana Script using Machine-Generated Text and Tesseract OCR Engine." International Journal of Social Research and Innovation 1, no. 1 (2018): 83–94. http://dx.doi.org/10.55712/ijsri.v1i1.23.

Full text
Abstract:
This paper provides technical aspects and the context of recognising Dhivehi characters using Tesseract OCR Engine, which is a freely available OCR engine with remarkable accuracy and support for multiple languages. The experiments that were conducted showed promising results with 69.46% accuracy and, more importantly, highlighted limitations that are unique to Dhivehi. These issues have been discussed in detail and possible directions for future research are presented.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Tesseract ocr engine"

1

Nell, Henrik. "Quantifying the noise tolerance of the OCR engine Tesseract using a simulated environment." Thesis, Blekinge Tekniska Högskola, Institutionen för kreativa teknologier, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4028.

Full text
Abstract:
->Context. Optical Character Recognition (OCR), having a computer recognize text from an image, is not as intuitive as human recognition. Even small (to human eyes) degradations can thwart the OCR result. The problem is that random unknown degradations are unavoidable in a real-world setting. ->Objectives. The noise tolerance of Tesseract, a state-of-the-art OCR engine, is evaluated in relation to how well it handles salt and pepper noise, a type of image degradation. Noise tolerance is measured as the percentage of aberrant pixels when comparing two images (one with noise and the other
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Tesseract ocr engine"

1

Joshi, Kartik, and Harshal Arolkar. "A Review of Usage of Tesseract OCR Engine with Vernacular Indian Languages." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Springer Nature Switzerland, 2025. https://doi.org/10.1007/978-3-031-77081-4_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chakraborty, Partha, Md Rakib Mia, Humayun Kabir Sumon, et al. "Recognize Meaningful Words and Idioms from the Images Based on OCR Tesseract Engine and NLTK." In Lecture Notes in Electrical Engineering. Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-1520-8_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Koistinen, Mika, Kimmo Kettunen, and Jukka Kervinen. "How to Improve Optical Character Recognition of Historical Finnish Newspapers Using Open Source Tesseract OCR Engine – Final Notes on Development and Evaluation." In Human Language Technology. Challenges for Computer Science and Linguistics. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-66527-2_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Strankale, Laine, and Pēteris Paikens. "OCR Challenges for a Latvian Pronunciation Dictionary." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2020. http://dx.doi.org/10.3233/faia200623.

Full text
Abstract:
This paper covers the devlopment of a custom OCR solution based on the Tesseract open source engine developed for digitization of a Latvian pronunciation dictionary where the pronunciation data is described using a large variety of diacritic markings not supported by standard OCR solutions. We describe our efforts in training a model for these symbols without the additional support of preexisting dictionaries and illustrate how word error rate (WER) and character error rate (CER) are affected by changes in the dataset content and size. We also provide an error analysis and postulate possible c
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Tesseract ocr engine"

1

Krishna, Gopal, Vineeta Singh, Rajkamal Upadhyaya, Harishchander Anandaram, Dilipkumar Jang Bahadur Saini, and Alok Kumar. "Boosting Image-Text Detection Performance with Python Tesseract and the Tesseract OCR Engine." In 2024 International Conference on Artificial Intelligence and Emerging Technology (Global AI Summit). IEEE, 2024. https://doi.org/10.1109/globalaisummit62156.2024.10947909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Smith, R. "An Overview of the Tesseract OCR Engine." In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2. IEEE, 2007. http://dx.doi.org/10.1109/icdar.2007.4376991.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Smith, Ray, Daria Antonova, and Dar-Shyang Lee. "Adapting the Tesseract open source OCR engine for multilingual OCR." In the International Workshop. ACM Press, 2009. http://dx.doi.org/10.1145/1577802.1577804.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Unnikrishnan, Ranjith, and Ray Smith. "Combined script and page orientation estimation using the Tesseract OCR engine." In the International Workshop. ACM Press, 2009. http://dx.doi.org/10.1145/1577802.1577809.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Smith, Ray W. "History of the Tesseract OCR engine: what worked and what didn't." In IS&T/SPIE Electronic Imaging, edited by Richard Zanibbi and Bertrand Coüasnon. SPIE, 2013. http://dx.doi.org/10.1117/12.2010051.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kaur, Jaspreet, Vishal Goyal, and Manish Kumar. "Improving the accuracy of tesseract OCR engine for machine printed Hindi documents." In INNOVATIONS AND RESEARCH IN MARINE ELECTRICAL AND ELECTRONICS ENGINEERING: ICIRMEEE 2021. AIP Publishing, 2022. http://dx.doi.org/10.1063/5.0101164.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Thapliyal, Tejas, Sarthak Bhatt, Vandana Rawat, and Sudhanshu Maurya. "Automatic License Plate Recognition (ALPR) using YOLOv5 model and Tesseract OCR engine." In 2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI). IEEE, 2023. http://dx.doi.org/10.1109/icaeeci58247.2023.10370919.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Qi, Weihua An, Anmi Zhou, and Lehui Ma. "Recognition of Offline Handwritten Chinese Characters Using the Tesseract Open Source OCR Engine." In 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC). IEEE, 2016. http://dx.doi.org/10.1109/ihmsc.2016.239.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Seitaj, Hansi, and Vinayak Elangovan. "Information Extraction from Product Labels: A Machine Vision Approach." In 11th International Conference on Computer Science, Engineering and Information Technology. Academy & Industry Research Collaboration Center, 2024. http://dx.doi.org/10.5121/csit.2024.141419.

Full text
Abstract:
This research tackles the challenge of manual data extraction from product labels by employing a blend of computer vision and Natural Language Processing (NLP). We introduce an enhanced model that combines Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in a Convolutional Recurrent Neural Network (CRNN) for reliable text recognition. Our model is further refined by incorporating the Tesseract OCR engine, enhancing its applicability in Optical Character Recognition (OCR) tasks. The methodology is augmented by NLP techniques and extended through the Open Food Facts API (A
APA, Harvard, Vancouver, ISO, and other styles
10

Vasantharajan, Charangan, Laksika Tharmalingam, and Uthayasanker Thayasivam. "Adapting the Tesseract Open-Source OCR Engine for Tamil and Sinhala Legacy Fonts and Creating a Parallel Corpus for Tamil-Sinhala-English." In 2022 International Conference on Asian Language Processing (IALP). IEEE, 2022. http://dx.doi.org/10.1109/ialp57159.2022.9961304.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!