To see the other types of publications on this topic, follow the link: Tesseract OCR.

Journal articles on the topic 'Tesseract OCR'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Tesseract OCR.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Dupent, Sébastien. "Tesseract-OCR." Revue Cyber & Conformité N° 2, no. 2 (2021): 23–24. http://dx.doi.org/10.3917/cyco.002.0025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Baruah, Priyankush Kaushik, and Dr Pranabjyoti Haloi. "Development and Implementation of a Custom License Plate Detection and Recognition System Using YOLOv10 and Tesseract OCR: A Comprehensive Study in Computer Vision and Optical Character Recognition Technologies." International Journal of Innovative Technology and Exploring Engineering 14, no. 6 (2025): 20–26. https://doi.org/10.35940/ijitee.e1083.14060525.

Full text
Abstract:
This study presents an automated license plate detection and recognition system, combining YOLOv10 for Realtime object detection and Tesseract OCR for robust text extraction. The methodology involves training a customised YOLOv10 model on annotated vehicle datasets to localize license plates, followed by region-of-interest (ROI) filtering to enhance accuracy. Detected plates are processed with Tesseract OCR to convert visual data into machine-readable text. Evaluated using precision, recall, and inference speed metrics, the system achieves 97 Parsant detection accuracy and real-time performanc
APA, Harvard, Vancouver, ISO, and other styles
3

Chesley, Emily, Jillian Marcantonio, and Abigail Pearson. "Towards Syriac Digital Corpora: Evaluation of Tesseract 4.0 for Syriac OCR." Hugoye: Journal of Syriac Studies 22, no. 1 (2019): 109–92. http://dx.doi.org/10.31826/hug-2019-220105.

Full text
Abstract:
Abstract This paper summarizes the results of an extensive test of Tesseract 4.0, an open-source Optical Character Recognition (OCR) engine with Syriac capabilities, and ascertains the current state of Syriac OCR technology. Three popular print types (S14, W64, and E22) representing the Syriac type styles Estrangela, Serto, and East Syriac were OCRed using Tesseract’s two different OCR modes (Syriac Language and Syriac Script). Handwritten manuscripts were also preliminarily tested for OCR. The tests confirm that Tesseract 4.0 may be relied upon for printed Estrangela texts but should be used
APA, Harvard, Vancouver, ISO, and other styles
4

Patience, Okechukwu Ogochukwu, Eziechina Malachy Amaechi, Onyemachi George, and Onuwa Nnachi Isaac. "Enhanced Text Recognition in Images Using Tesseract OCR within the Laravel Framework." Asian Journal of Research in Computer Science 17, no. 9 (2024): 58–69. http://dx.doi.org/10.9734/ajrcos/2024/v17i9499.

Full text
Abstract:
This research explores the integration of Tesseract OCR (Optical Character Recognition) within the Laravel framework to enhance text recognition capabilities in images. Tesseract OCR, an open-source OCR engine, is renowned for its accuracy and efficiency in converting various image formats into editable and searchable text. However, leveraging its full potential within a robust web application framework presents unique challenges and opportunities. This implementation focuses on creating a seamless, user-friendly application that processes images uploaded by users and accurately extracts text
APA, Harvard, Vancouver, ISO, and other styles
5

Joshi, Kartik. "Study of Tesseract OCR." GLS KALP: Journal of Multidisciplinary Studies 1, no. 2 (2024): 41–50. http://dx.doi.org/10.69974/glskalp.01.02.54.

Full text
Abstract:
In the current Internet and Digitization era, a huge amount of information is available in different forms like books, newspapers, etc. To preserve the contents of such documents, these documents are converted to a digital format by scanning them as images. Detection of text from the scanned images and correct identification of characters is a challenging problem in such cases. Tesseract is a recognition engine based upon open source license which uses some novel techniques for optical character recognition. Tesseract has been designed to recognize more than 100 languages. Few of these languag
APA, Harvard, Vancouver, ISO, and other styles
6

Tiwari, Anurag. "Data Extraction from Images through OCR." International Journal for Research in Applied Science and Engineering Technology 9, no. VIII (2021): 435–37. http://dx.doi.org/10.22214/ijraset.2021.37377.

Full text
Abstract:
The paperwork used in maintaining various types of documents in our daily lives is tiresome and inefficient, it consumes a lot of time and it is difficult to maintain and remember the concerned documents. This project provides a solution to these problems by introducing Optical Character Recognition Technology (OCR) which runs on Tesseract OCR Engine. The project specifically aims at increasing data accessibility, usability and improving customer experience by decreasing the time spent to process, save, and maintain user data. Another objective of this project is to nullify the human error, wh
APA, Harvard, Vancouver, ISO, and other styles
7

Benaissa, Ali, Abdelkhalak Bahri, Ahmad El Allaoui, and My Abdelouahab Salahddine. "Build a Trained Data of Tesseract OCR engine for Tifinagh Script Recognition." Data and Metadata 2 (December 9, 2023): 185. http://dx.doi.org/10.56294/dm2023185.

Full text
Abstract:
This article introduces a methodology for constructing a trained dataset to facilitate Tifinagh script recognition using the Tesseract OCR engine. The Tifinagh script, widely used in North Africa, poses a challenge due to the lack of built-in recognition capabilities in Tesseract. To overcome this limitation, our approach focuses on image generation, box generation, manual editing, charset extraction, and dataset compilation. By leveraging Python scripting, specialized software tools, and Tesseract's training utilities, we systematically create a comprehensive dataset for Tifinagh script recog
APA, Harvard, Vancouver, ISO, and other styles
8

Mubeen, Dr Suraya, Jally Brahmani, Datha Pavan Kalyan, Ayesha Jagirdar, and A. Praveen Kumar. "Optical Character Recognition Using Tesseract." International Journal for Research in Applied Science and Engineering Technology 10, no. 11 (2022): 672–75. http://dx.doi.org/10.22214/ijraset.2022.47414.

Full text
Abstract:
Abstract: Optical Character Recognition (OCR) is a process or technology in which text within a digital image is recognized. With rapid pace of technology, people want quicker, handy and reliable tools, which can fulfil their daily needs. With this moto we had gone forward and analyzed the existing tools and made up this Android App, which provides seamless experience (No ads and easy-to-use), and great accuracy. The main objective of this project is to allow automatic extraction of the information that a user wants from the paper document and using it wherever it is needed. In this project, O
APA, Harvard, Vancouver, ISO, and other styles
9

Akhsa, Alvian Tri Putra Darti, Muhammad Agus, Rosmiati Rosmiati, and Andi Muhammad Bahrul Ulum. "Perancangan E-Office Pelayanan Dan Pengarsipan Digital Menggunakan Metode OCR Berbasis Web." INTECOMS: Journal of Information Technology and Computer Science 7, no. 1 (2024): 218–26. http://dx.doi.org/10.31539/intecoms.v7i1.8367.

Full text
Abstract:
Penelitian ini bertujuan untuk memberikan dukungan kepada pemerintah dan masyarakat dalam mengoptimalkan pelayanan dan pengarsipan dokumen sesuai dengan tujuan E-Government. Fokus utamanya adalah meningkatkan efisiensi dan pengorganisasian dalam proses pelayanan publik dan pengarsipan dokumen melalui pemanfaatan metode Optical Character Recognition (OCR). Metode otomatisasi pengelolaan arsip yang diimplementasikan dalam platform ini adalah OCR, yang memiliki peran penting dalam mengubah gambar dokumen menjadi teks yang dapat diolah. Kami menggunakan library tesseract sebagai basis data karakte
APA, Harvard, Vancouver, ISO, and other styles
10

Darpito, Muhammad Noko, Kartika Firdausy, and Abdul Fadlil. "Perbandingan Unjuk Kerja Library Optical Character Recognition (OCR) dalam Pengenalan Teks pada Dokumen Digital." Jurnal Informatika Polinema 11, no. 3 (2025): 273–82. https://doi.org/10.33795/jip.v11i3.7025.

Full text
Abstract:
Optical Character Recognition (OCR) merupakan teknologi yang digunakan untuk mengubah teks dalam dokumen digital menjadi teks yang dapat dikenali oleh mesin. Pemilihan metode OCR yang tepat sangat bergantung pada efisiensi pemrosesan dan akurasi pengenalan teks, terutama dalam penerapan yang membutuhkan kecepatan tinggi dan tingkat kesalahan minimal. Dalam penelitian ini, dilakukan perbandingan performa antara Tesseract dan EasyOCR melalui metode penelitian yang mencakup tahapan pengumpulan data, ekstraksi teks, implementasi OCR menggunakan kedua library tersebut, dan evaluasi hasil ekstraksi
APA, Harvard, Vancouver, ISO, and other styles
11

Sengar, Abhishek Singh. "Multilingual Handwritten OCR using CLIP and Tesseract." International Journal for Research in Applied Science and Engineering Technology 13, no. 4 (2025): 2168–72. https://doi.org/10.22214/ijraset.2025.68700.

Full text
Abstract:
Optical Character Recognition (OCR) of handwritten text is an extremely challenging problem, particularly in multilingual and low-resource environments. Conventional OCR engines like Tesseract work well for printed text but not for handwriting because of extreme variations in style, language, and noise. The breakthroughs in multimodal models, especially CLIP (Contrastive Language–Image Pretraining), provide new avenues agnostic knowledge This paper discusses the possibility of combining CLIP with Tesseract to improve multilingual handwritten OCR, covering current methods, limitations, and futu
APA, Harvard, Vancouver, ISO, and other styles
12

Priyankush, Kaushik Baruah. "Development and Implementation of a Custom License Plate Detection and Recognition System Using YOLOv10 and Tesseract OCR: A Comprehensive Study in Computer Vision and Optical Character Recognition Technologies." International Journal of Innovative Technology and Exploring Engineering (IJITEE) 14, no. 6 (2025): 20–26. https://doi.org/10.35940/ijitee.E1083.14060525.

Full text
Abstract:
<strong>Abstract: </strong>This study presents an automated license plate detection and recognition system, combining YOLOv10 for realtime object detection and Tesseract OCR for robust text extraction. The methodology involves training a customised YOLOv10 model on annotated vehicle datasets to localize license plates, followed by region-of-interest (ROI) filtering to enhance accuracy. Detected plates are processed with Tesseract OCR to convert visual data into machine-readable text. Evaluated using precision, recall, and inference speed metrics, the system achieves 97% detection accuracy and
APA, Harvard, Vancouver, ISO, and other styles
13

Priyankush, Kaushik Baruah. "Development and Implementation of a Custom License Plate Detection and Recognition System Using YOLOv10 and Tesseract OCR: A Comprehensive Study in Computer Vision and Optical Character Recognition Technologies." International Journal of Innovative Technology and Exploring Engineering (IJITEE) 14, no. 6 (2025): 20–26. https://doi.org/10.35940/ijitee.E1083.14060525/.

Full text
Abstract:
<strong>Abstract: </strong>This study presents an automated license plate detection and recognition system, combining YOLOv10 for realtime object detection and Tesseract OCR for robust text extraction. The methodology involves training a customised YOLOv10 model on annotated vehicle datasets to localize license plates, followed by region-of-interest (ROI) filtering to enhance accuracy. Detected plates are processed with Tesseract OCR to convert visual data into machine-readable text. Evaluated using precision, recall, and inference speed metrics, the system achieves 97% detection accuracy and
APA, Harvard, Vancouver, ISO, and other styles
14

Prakisya, Nurcahya Pradana Taufik, Bintang Timur Kusmanto, and Puspanda Hatta. "Comparative Analysis of Google Vision OCR with Tesseract on Newspaper Text Recognition." Media of Computer Science 1, no. 1 (2024): 31–46. http://dx.doi.org/10.69616/mcs.v1i1.178.

Full text
Abstract:
Optical Character Recognition (OCR) is a technique used to convert image files into machine-readable text. There are two Optical Character Recognition (OCR) algorithms that are currently well known and widely used, namely Google Vision's Optical Character Recognition (OCR) and Tesseract. The purpose of this study is to compare the Optical Character Recognition (OCR) algorithms of Google Vision and Tesseract so that people can more easily find out which algorithm is the right one to implement on the system they are going to build. The method used in this research is Research and Development (R&
APA, Harvard, Vancouver, ISO, and other styles
15

Rozi, Imam Fahrur, Ahmadi Yuli Ananta, Endah Septa Sintiya, Astrifidha Rahma Amalia, Yuri Ariyanto, and Arin Kistia Nugraeni. "Analyzing the Application of Optical Character Recognition: A Case Study in International Standard Book Number Detection." MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer 24, no. 2 (2025): 195–206. https://doi.org/10.30812/matrik.v24i2.4367.

Full text
Abstract:
In the era of advanced education, assessing lecturer performance is crucial to maintaining educational quality. One aspect of this assessment involves evaluating the textbooks authored by lecturers. This study addresses the problem of efficiently detecting International Standard Book Numbers (ISBNs) within these textbooks using optical character recognition (OCR) as a potential solution. The objective is to determine the effectiveness of OCR, specifically the Tesseract platform, in facilitating ISBN detection to support lecturer performance assessments. The research method involves automated d
APA, Harvard, Vancouver, ISO, and other styles
16

Akshya, Kandula, Karri Pranush, Kavuloori Sai Praghnesh Kumar, and Koppisetti N. V. Satya Sai Rohit4. "OCRXBot: Optimizing Image-to-Text Conversion with Tesseract." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 03 (2025): 1–9. https://doi.org/10.55041/ijsrem42408.

Full text
Abstract:
Text extraction from images is essential for document digitization, automated data entry, and assistive technologies. Traditional OCR systems often struggle with low-quality images and noise, reducing accuracy. To overcome these limitations, a deep learning-based system enhances Tesseract OCR using advanced preprocessing techniques. The method applies grayscale conversion, Gaussian Blur, Otsu’s thresholding, and CLAHE to reduce noise and improve contrast. These preprocessing techniques refine text regions, minimize distortions, and enhance OCR. Instead, a deep CNN was utilized for training the
APA, Harvard, Vancouver, ISO, and other styles
17

Jais, Ron. "Description VEHICLE PLATE DETECTION USING RASPBERRY PI." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem47460.

Full text
Abstract:
Abstract- This project introduces a cost-effective, portable, and adaptable license plate detection and recognition system using Raspberry Pi, leveraging computer vision and machine learning technologies. Unlike traditional stationary systems relying on high-end hardware and specialized software, this solution addresses cost, flexibility, and en vironmental adaptability challenges. Featuring five modules—image acquisition, pre processing, license plate detection, Optical Character Recognition (OCR), and result output—the system utilizes OpenCV, Tesseract OCR, and Python with TensorFlow/K eras
APA, Harvard, Vancouver, ISO, and other styles
18

Thammarak, Karanrat, Prateep Kongkla, Yaowarat Sirisathitkul, and Sarun Intakosum. "Comparative analysis of Tesseract and Google Cloud Vision for Thai vehicle registration certificate." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 2 (2022): 1849. http://dx.doi.org/10.11591/ijece.v12i2.pp1849-1858.

Full text
Abstract:
Optical character recognition (OCR) is a technology to digitize a paper-based document to digital form. This research studies the extraction of the characters from a Thai vehicle registration certificate via a Google Cloud Vision API and a Tesseract OCR. The recognition performance of both OCR APIs is also examined. The 84 color image files comprised three image sizes/resolutions and five image characteristics. For suitable image type comparison, the greyscale and binary image are converted from color images. Furthermore, the three pre-processing techniques, sharpening, contrast adjustment, an
APA, Harvard, Vancouver, ISO, and other styles
19

Karanrat, Thammarak, Kongkla Prateep, Sirisathitkul Yaowarat, and Intakosum Sarun. "Comparative analysis of Tesseract and Google Cloud Vision for Thai vehicle registration certificate." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 2 (2022): 1849–58. https://doi.org/10.11591/ijece.v12i2.pp1849-1858.

Full text
Abstract:
Optical character recognition (OCR) is a technology to digitize a paperbased document to digital form. This research studies the extraction of the characters from a Thai vehicle registration certificate via a Google Cloud Vision API and a Tesseract OCR. The recognition performance of both OCR APIs is also examined. The 84 color image files comprised three image sizes/resolutions and five image characteristics. For suitable image type comparison, the greyscale and binary image are converted from color images. Furthermore, the three pre-processing techniques, sharpening, contrast adjustment, and
APA, Harvard, Vancouver, ISO, and other styles
20

Tsimpiris, Alkiviadis, Dimitrios Varsamis, and George Pavlidis. "Tesseract OCR Evaluation on Greek Food Menus Datasets." International Journal of Computing and Optimization 9, no. 1 (2022): 13–32. https://doi.org/10.12988/ijco.2022.9829.

Full text
Abstract:
This article presents a procedure for optical character recognition (OCR) improvement, after image preprocessing of Greek food menus images. To achieve this goal, many well-known and other more so- phisticated techniques for image preprocessing have been used. The performance of the Tesseract OCR engine has been studied for selected binarization, thresholding, noise and morphological filtering methods that applied to menu images before OCR feeding. The output text is compared to the reference text of each image (ground text) and the val- ues of evaluation indices indicate the appropriate prepr
APA, Harvard, Vancouver, ISO, and other styles
21

Had, Iqbaluddin Syam, Wiga Maulana Baihaqi, and Dwi Putriana Nuramanah Kinding. "Improving Tesseract OCR Accuracy Using SymSpell Algorithm on Passport Data." sinkron 9, no. 1 (2025): 374–81. https://doi.org/10.33395/sinkron.v9i1.14395.

Full text
Abstract:
Optical Character Recognition (OCR) is a technology used to recognize text from images or digital documents, such as passports. One popular OCR tool is Tesseract as it offers high accuracy. However, OCR accuracy is often affected by various factors, including image noise and/or non-text elements. This article discusses the application of the SymSpell algorithm for post processing to improve OCR accuracy on standard Indonesian passports. OCR will be focused on the Visual Inspection Zone, specifically the Place of Birth and Issuing Office values. Unlike the Machine Readable Zone which is compose
APA, Harvard, Vancouver, ISO, and other styles
22

Joshi, Kalpesh. "Handwritten Text Recognition from Image." International Journal for Research in Applied Science and Engineering Technology 11, no. 6 (2023): 1528–30. http://dx.doi.org/10.22214/ijraset.2023.53364.

Full text
Abstract:
Abstract: A computer vision program called Handwritten Text Recognition (HTR) attempts to recognize and translate handwritten text from scanned or photographed images. In this project, we suggest implementing an HTR system using Tesseract and OpenCV. English, Chinese, and Arabic are all supported by the popular open-source optical character recognition (OCR) engine known as Tesseract. It is employed to find and identify printed text within photographs. On the other hand, OpenCV is a well-liked computer vision library that offers several tools for processing and analyzing images. The pre-proces
APA, Harvard, Vancouver, ISO, and other styles
23

Sun, Yueyue, and Xuechen Zhao. "Research and implementation of license plate recognition based on android platform." MATEC Web of Conferences 309 (2020): 03034. http://dx.doi.org/10.1051/matecconf/202030903034.

Full text
Abstract:
This paper studies and optimizes license plate location and recognition in license plate recognition. A license plate recognition system based on Android platform is designed and implemented. Opencv and Tesseract OCR are integrated in Android studio environment. The license plate number is located by combining Laplace algorithm and HSV model. On the basis of fully understanding the principle of Tesseract OCR recognition, a large number of training pictures are generated by license plate number simulation generator, and license plate character library is generated by using jtessboxeditor tool,
APA, Harvard, Vancouver, ISO, and other styles
24

Drobac, Senka, and Krister Lindén. "Optical character recognition with neural networks and post-correction with finite state methods." International Journal on Document Analysis and Recognition (IJDAR) 23, no. 4 (2020): 279–95. http://dx.doi.org/10.1007/s10032-020-00359-9.

Full text
Abstract:
Abstract The optical character recognition (OCR) quality of the historical part of the Finnish newspaper and journal corpus is rather low for reliable search and scientific research on the OCRed data. The estimated character error rate (CER) of the corpus, achieved with commercial software, is between 8 and 13%. There have been earlier attempts to train high-quality OCR models with open-source software, like Ocropy (https://github.com/tmbdev/ocropy) and Tesseract (https://github.com/tesseract-ocr/tesseract), but so far, none of the methods have managed to successfully train a mixed model that
APA, Harvard, Vancouver, ISO, and other styles
25

Alan Jiju, Shaun Tuscano, and Chetana Badgujar. "OCR Text Extraction." International Journal of Engineering and Management Research 11, no. 2 (2021): 83–86. http://dx.doi.org/10.31033/ijemr.11.2.11.

Full text
Abstract:
This research tries to find out a methodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on – such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritt
APA, Harvard, Vancouver, ISO, and other styles
26

Кравец, Алла Григорьевна, Дмитрий Олегович Семёночкин та Андрей Константинович Марков. "Разработка нового экспериментального метода оценки OCR инструментов для задачи классификации цифровых документов". Вестник ВГУ. Серия: Системный анализ и информационные технологии, № 3 (14 листопада 2024): 114–26. https://doi.org/10.17308/sait/1995-5499/2024/3/114-126.

Full text
Abstract:
В статье приводится описание разработанного экспериментального метода оценки существующих OCR инструментов для решения проблемы присутствия сканированных документов в наборах данных, использующихся для задач классификации текста. Для классификации документов сканированные документы и документы, в которых невозможно получить текст с помощью программных средств извлечения текста, необходимо преобразовать в машиночитаемый текст, и для этой задачи используется технология оптического распознавания символов (OCR). Цель данной статьи заключается в том, чтобы экспериментально сравнить существующие OCR
APA, Harvard, Vancouver, ISO, and other styles
27

TV, Keerthana. "OCR Based Facilitator for the Visually Challenged." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem31564.

Full text
Abstract:
Your proposal for an OCR-based smart book reader catering to the visually challenged fills a crucial need for affordable accessibility solutions, particularly given the global population of approximately 285 million visually impaired individuals, with a majority residing in developing countries. Leveraging a Raspberry Pi 3 board and Tesseract OCR technology, your device converts printed or handwritten text into machine-encoded text, enhanced by Computer Vision libraries like OpenCV for image preprocessing. Beyond text conversion, features such as obstacle detection via a smart stick and an eme
APA, Harvard, Vancouver, ISO, and other styles
28

Padmaja, G., Swetha Pesaru, Desidi Narsimha Reddy, D. Anitha Kumari, and Shiva Prasad Maram. "Robust Vehicle Number Plate Text Recognition and Data Analysis Using Tesseract Ocr." ITM Web of Conferences 74 (2025): 01009. https://doi.org/10.1051/itmconf/20257401009.

Full text
Abstract:
To detect the vehicle number plate the system should understand the character and integers determined on vehicles. The proposed methodology holds three phases: pre-process, extraction of features and recognition of text. These phases include some operations like grey scale, adaptive threshold, morphological for extraction of characters and numbers from different quality of the images in pre-processing stage. By transforming the images to grey scale, this can remove the extraneous colours and extracts the appropriate values. In morphological removes the borders and removes the background noises
APA, Harvard, Vancouver, ISO, and other styles
29

Wydyanto, Wydyanto, Norshita Mat Nayan, Riza Sulaiman, Deshinta Arrova Dewi, and Tri Basuki Kurniawan. "A Hybrid Approach to Detect and Identify Text in Picture." Emerging Science Journal 8, no. 1 (2024): 218–38. http://dx.doi.org/10.28991/esj-2024-08-01-016.

Full text
Abstract:
In order to create computer systems that can automatically read text from images or pictures, researchers focus on detecting and recognizing text in images. This issue is particularly difficult because images often have complicated backgrounds and a wide range of properties, including color, size, shape, orientation, and texture. Our proposed approach is based on morphology, which consists of a dilation and erosion process to extract text and recognize black-and-white text areas that contain document text or images. This suggested approach has been investigated for its ability to automatically
APA, Harvard, Vancouver, ISO, and other styles
30

Indrawan, Gede, Ahmad Asroni, Luh Joni Erawati Dewi, I. Gede Aris Gunadi, and I. Ketut Paramarta. "Balinese Script Recognition Using Tesseract Mobile Framework." Lontar Komputer : Jurnal Ilmiah Teknologi Informasi 13, no. 3 (2022): 160. http://dx.doi.org/10.24843/lkjiti.2022.v13.i03.p03.

Full text
Abstract:
One of the main factors causing the decline in the use of Balinese Script is that Balinese people are less interested in reading Balinese Script because of their reluctance to learn Balinese Script, which is relatively complicated in the recognition process. The development of computer technology has now been used to help by performing character recognition or known as Optical Character Recognition (OCR). Developing the OCR application for Balinese Script is an effort to help preserve, from the technology side, as a means of education related to Balinese Script. In this study, that development
APA, Harvard, Vancouver, ISO, and other styles
31

Sporici, Dan, Elena Cușnir, and Costin-Anton Boiangiu. "Improving the Accuracy of Tesseract 4.0 OCR Engine Using Convolution-Based Preprocessing." Symmetry 12, no. 5 (2020): 715. http://dx.doi.org/10.3390/sym12050715.

Full text
Abstract:
Optical Character Recognition (OCR) is the process of identifying and converting texts rendered in images using pixels to a more computer-friendly representation. The presented work aims to prove that the accuracy of the Tesseract 4.0 OCR engine can be further enhanced by employing convolution-based preprocessing using specific kernels. As Tesseract 4.0 has proven great performance when evaluated against a favorable input, its capability of properly detecting and identifying characters in more realistic, unfriendly images is questioned. The article proposes an adaptive image preprocessing step
APA, Harvard, Vancouver, ISO, and other styles
32

Cahyani, Trisiwi Indra, Mochammad Zakiyamani, Dwiza Riana, and Sri Hardianti. "Perbandingan Akurasi Pengenalan Karakter Plat Nomor Menggunakan Tesseract Dan Data Latih Emnist." INTECOMS: Journal of Information Technology and Computer Science 5, no. 2 (2022): 18–27. http://dx.doi.org/10.31539/intecoms.v5i2.4463.

Full text
Abstract:
Plat nomor merupakan identitas wajib terdiri dari huruf dan angka yang ada pada kendaraan. Plat nomor dapat dimanfaatkan dalam berbagai kebutuhan seperti sistem parkir, pengawasan lalu lintas, dan pengecekan identitas ketika terjadi kecelakaan. Pengenalan karakter dapat menggunakan Optical Character Recognition (OCR) yang melakukan metode template matching pada huruf dan angka. Menggunakan Convolutional Neural Network dengan melatih data EMINST untuk melakukan pengenalan karakter. Tujuan penelitian ini sebagai perbandingan penggunaan metode OCR menggunakan Tesseract dan CNN dalam melakukan pen
APA, Harvard, Vancouver, ISO, and other styles
33

Rawat, Sukhbindra Singh, Ashutosh Sharma, and Rachana Gusain. "ANALYSIS OF IMAGE PREPROCESSING TECHNIQUES TO IMPROVE OCR OF GARHWALI TEXT OBTAINED USING THE HINDI TESSERACT MODEL." ICTACT Journal on Image and Video Processing 12, no. 2 (2021): 2588–94. http://dx.doi.org/10.21917/ijivp.2021.0366.

Full text
Abstract:
A huge amount of information exists in the form of textbooks, paper documents, newspapers, and other physical forms, that is required to be digitized for its effective access and long-time availability. Optical Character Recognition (OCR) is an effective way to digitize the text. In this study, we have used Google’s Tesseract as the OCR tool. The focus of our study is to improve Tesseract’s accuracy on machine-printed Garhwali documents by using image pre-processing techniques including Super-Resolution (SR), different binarization methods (Otsu and adaptive thresholding), skew correction, mor
APA, Harvard, Vancouver, ISO, and other styles
34

Muthusundari, Muthusundari, A. Velpoorani, S. Venkata Kusuma, Trisha L, and Om k. Rohini. "Optical character recognition system using artificial intelligence." LatIA 2 (August 13, 2024): 98. http://dx.doi.org/10.62486/latia202498.

Full text
Abstract:
Abstract A technique termed optical character recognition, or OCR, is used to extract text from images. An OCR the system's primary goal is to transform already present paper-based paperwork or picture data into usable papers. Character as well as word detection are the two main phases of an OCR, which is designed using many algorithms. An OCR also maintains a document's structure by focusing on sentence identification, which is a more sophisticated approach. Research has demonstrated that despite the efforts of numerous scholars, no error-free Bengali OCR has been produced. This issue is addr
APA, Harvard, Vancouver, ISO, and other styles
35

Manggau, Fransiskus Xaverius, Sumarni Hamid Aly, Muhammad Isran Ramli, and Muhammad Niswar. "A YOLO-Tesseract Module Recognizing System for an Android-based Smart Parking App in Urban On-Street Parking." Engineering, Technology & Applied Science Research 15, no. 3 (2025): 22969–75. https://doi.org/10.48084/etasr.10819.

Full text
Abstract:
This study describes an advanced recognition system embedded in an Android smart parking software application for Makassar City. The system augments the recognition of the parking space and the navigation as well as the payment of the parking fee using a Tesseract OCR module in conjunction with YOLO object detection. The ability of Tesseract OCR to recognize parking spaces, road signs, and vehicle registration plates in real time improves the accuracy of availability updates and assists drivers in finding parking spaces quickly. The application was developed using multiple programming language
APA, Harvard, Vancouver, ISO, and other styles
36

Haji, Chiai Mohammed. "Linguistic Analysis on Cursive Characters." Journal of duhok university 25, no. 2 (2022): 33–40. http://dx.doi.org/10.26682/sjuod.2022.25.2.3.

Full text
Abstract:
Document Analysis has major importance in Information Retrieval Systems. Dredged with vaults of paper and material documents, to protect very important information and the summaries, without losing their meaning and importance, each document need to be properly curated and processed. Ancient written documents possess many types of cursive language character sets, which are very tedious to discriminate the characters and subsequently the right meaning. To overcome the difficulties of reading the cursive language characters and prevent misunderstanding the meaning and the importance of documents
APA, Harvard, Vancouver, ISO, and other styles
37

Kleimenkin, D. V., and N. A. Dmitrienko. "Using OCR for Russian texts." ТЕНДЕНЦИИ РАЗВИТИЯ НАУКИ И ОБРАЗОВАНИЯ 92, no. 10 (2022): 9–12. http://dx.doi.org/10.18411/trnio-12-2022-456.

Full text
Abstract:
The article discusses optical character recognition. Traditional approaches to text recognition and optical recognition techniques include computer vision-based approaches. To obtain the most appropriate result, a comparison is made in terms of speed and accuracy of the commonly used PaddleOCR and Tesseract libs. The result is a measurement of values on the selected database of Russian-language sentences.
APA, Harvard, Vancouver, ISO, and other styles
38

Bhanu, Mohammad Shinaz, Durgam Varshini, Poosala Srikanth, and Payyavula Lokesh. "Exploiting Vulnerabilities in Weak CAPTCHA Mechanisms within DVWA." Journal of Information Technology and Digital World 7, no. 2 (2025): 119–29. https://doi.org/10.36548/jitdw.2025.2.003.

Full text
Abstract:
This research focuses on identifying vulnerabilities in the CAPTCHA implementation of the Damn Vulnerable Web Application (DVWA). We utilize Optical Character Recognition (OCR) with Tesseract, capture internet traffic using OWASP ZAP, and develop Python-based automated scripts to bypass substandard CAPTCHA implementations. Throughout the study, we uncover critical vulnerabilities, including the lack of CAPTCHA verification for sensitive actions such as password changes. We provide a detailed step-by-step analysis of how attackers can exploit these vulnerabilities. We conclude by comparing thes
APA, Harvard, Vancouver, ISO, and other styles
39

Ibrahim, Ahmed. "Dhivehi OCR: Character Recognition of Thaana Script using Machine-Generated Text and Tesseract OCR Engine." International Journal of Social Research and Innovation 1, no. 1 (2018): 83–94. http://dx.doi.org/10.55712/ijsri.v1i1.23.

Full text
Abstract:
This paper provides technical aspects and the context of recognising Dhivehi characters using Tesseract OCR Engine, which is a freely available OCR engine with remarkable accuracy and support for multiple languages. The experiments that were conducted showed promising results with 69.46% accuracy and, more importantly, highlighted limitations that are unique to Dhivehi. These issues have been discussed in detail and possible directions for future research are presented.
APA, Harvard, Vancouver, ISO, and other styles
40

Sharmin, Sabrina, Tasauf Mim, and Mohammad Rahman. "Bangla Optical Character Recognition for Mobile Platforms: A Comprehensive Cross-Platform Approach." American Journal of Electrical and Computer Engineering 8, no. 2 (2024): 31–42. http://dx.doi.org/10.11648/j.ajece.20240802.12.

Full text
Abstract:
The development of Optical Character Recognition (OCR) systems for Bangla script has been an area of active research since the 1980s. This study presents a comprehensive analysis and development of a cross-platform mobile application for Bangla OCR, leveraging the Tesseract OCR engine. The primary objective is to enhance the recognition accuracy of Bangla characters, achieving rates between 90% and 99%. The application is designed to facilitate the automatic extraction of text from images selected from the device&amp;apos;s photo library, promoting the preservation and accessibility of Bangla
APA, Harvard, Vancouver, ISO, and other styles
41

Oudah, Nabeel, Maher Faik Esmaile, and Estabraq Abdulredaa. "Optical Character Recognition Using Active Contour Segmentation." Journal of Engineering 24, no. 1 (2018): 146–58. http://dx.doi.org/10.31026/j.eng.2018.01.10.

Full text
Abstract:
Document analysis of images snapped by camera is a growing challenge. These photos are often poor-quality compound images, composed of various objects and text; this makes automatic analysis complicated. OCR is one of the image processing techniques which is used to perform automatic identification of texts. Existing image processing techniques need to manage many parameters in order to clearly recognize the text in such pictures. Segmentation is regarded one of these essential parameters. This paper discusses the accuracy of segmentation process and its effect over the recognition process. Ac
APA, Harvard, Vancouver, ISO, and other styles
42

Asroni, Ahmad, Gede Indrawan, and Luh Joni Erawati Dewi. "Implementasi Hirarki Dataset Dalam Membangun Model Language Aksara Bali Menggunakan Framework Tesseract OCR." Jurnal RESISTOR (Rekayasa Sistem Komputer) 6, no. 1 (2023): 20–28. http://dx.doi.org/10.31598/jurnalresistor.v6i1.1345.

Full text
Abstract:
Penurunan penggunaan Aksara Bali saat ini disebabkan oleh kurangnya minat masyarakat Bali dalam mempelajarinya, karena proses pengenalan Aksara Bali relatif rumit. Oleh karena itu, teknologi Optical Character Recognition (OCR) telah dikembangkan untuk membantu mengatasi masalah ini. Penelitian ini bertujuan untuk mengimplementasikan salah satu mesin OCR terkemuka, yaitu Tesseract OCR, untuk mengenali karakter Aksara Bali. Proses percobaan terdiri dari empat tahap, yaitu menyusun dataset, membangkitkan dataset menggunakan metode Web Scraping, melatih dataset, dan mengimplementasikan bahasa mode
APA, Harvard, Vancouver, ISO, and other styles
43

M L, Prof Smitha, Dr Antony P J, and Sachin D N. "Document Image Analysis Using Imagemagick and Tesseract-ocr." IARJSET 3, no. 5 (2016): 108–12. http://dx.doi.org/10.17148/iarjset.2016.3523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

More, Swapnil, Rishabh Jain, and Harshil Kanakia. "File Conversion Application using Kivy and Tesseract – OCR." International Journal of Advanced Engineering, Management and Science 7, no. 5 (2021): 65–68. http://dx.doi.org/10.22161/ijaems.75.9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Songa, Akhil, Rahul Bolineni, Harish Reddy, Sohini Korrapolu, and Vani Jayasri Geddada. "Vehicle Number Plate Recognition System Using TESSERACT-OCR." International Journal for Research in Applied Science and Engineering Technology 10, no. 4 (2022): 323–27. http://dx.doi.org/10.22214/ijraset.2022.41198.

Full text
Abstract:
Abstract: With the increase in the number of vehicles, automated systems to store vehicle information are becoming increasingly necessary. Communication is critical for traffic management and crime reduction, and it cannot be overlooked. Automatic vehicle identification using number plate recognition is a reliable method of identifying vehicles. It requires a lengthy time and a lot of practice to develop satisfactory results using present algorithms that are based on the idea of learning. Even so, accuracy is not a significant concern. It has been devised as an efficient approach for recognizi
APA, Harvard, Vancouver, ISO, and other styles
46

Tsimpiris, Alkiviadis, Dimitrios Varsamis, and Georgios Pavlidis. "Tesseract OCR evaluation on Greek food menus datasets." International Journal of Computing and Optimization 9, no. 1 (2022): 13–32. http://dx.doi.org/10.12988/ijco.2022.9829.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Setyadi, Alpha Fausta Ikrar, and Yeremia Alfa Susetyo. "Implementasi Algoritma LSTM pada Aplikasi Optical Character Recognition Berbasis Website Menggunakan Tesseract OCR." Jurnal Teknologi Sistem Informasi dan Aplikasi 6, no. 2 (2023): 63–71. http://dx.doi.org/10.32493/jtsi.v6i2.29235.

Full text
Abstract:
Pengolahan dokumen digital yang lebih praktis membuat berbagai instansi dan organisasi beralih dokumen fisik menjadi digital. Namun proses ekstraksi data dari dokumen fisik secara manual membutuhkan usaha yang tidak mudah dan rentan akan terjadinya kesalahan input akibat human error. Teknologi Optical Character Recognition (OCR) dapat menjadi solusi dari permasalahan ini. OCR digunakan untuk mengenali huruf atau karakter yang ada pada suatu gambar, untuk kemudian disimpan menjadi data teks pada komputer. Pada penelitian ini, dilakukan implementasi teknologi OCR pada aplikasi berbasis website d
APA, Harvard, Vancouver, ISO, and other styles
48

Ircham Aji Nugroho, Bety Hayat Susanti, Mareta Wahyu Ardyani, and Nadia Paramita R.A. "The Design of a C1 Document Data Extraction Application Using a Tesseract-Optical Character Recognition Engine." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 8, no. 1 (2024): 42–53. http://dx.doi.org/10.29207/resti.v8i1.5151.

Full text
Abstract:
The 2019 election process used the Vote Counting Information System, also known as Sistem Informasi Penghitungan Suara (Situng), to provide transparency in the recapitulation process. The data displayed in Situng is from document C1 for 813,336 voting stations in Indonesia. The data collected from the C1 document is entered and uploaded into Situng by the officers of the Municipal General Election Commission (GEC). Since this process is performed by humans, it is not immune to errors. In the recapitulation process of the 2019 election results, there were 269 data entry errors, and the data ent
APA, Harvard, Vancouver, ISO, and other styles
49

Silfverberg, Miikka, and Jack Rueter. "Can Morphological Analyzers Improve the Quality of Optical Character Recognition?" Septentrio Conference Series, no. 2 (June 17, 2015): 45. http://dx.doi.org/10.7557/5.3467.

Full text
Abstract:
Optical Character Recognition (OCR) can substantially improve the usability of digitized documents. Language modeling using word lists is known to improve OCR quality for English. For morphologically rich languages, however, even large word lists do not reach high coverage on unseen text. Morphological analyzers offer a more sophisticated approach, which is useful in many language processing applications. is paper investigates language modeling in the open-source OCR engine Tesseract using morphological analyzers. We present experiments on two Uralic languages Finnish and Erzya. According to
APA, Harvard, Vancouver, ISO, and other styles
50

Et. al., Rekha M,. "Educational Training For Processing Invoice Of Vendor Identification And Payments Using Python-Tesseract." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 11 (2021): 224–28. http://dx.doi.org/10.17762/turcomat.v12i11.5864.

Full text
Abstract:
The aim of the project is to recognize the invoices of receipts from various vendors, by using automated invoice processing using various learning educational tools. This automated invoice processing is far better than manual invoice processing, it saves a serious amount of time and money creating efficiencies and increasing the accuracy of captured data. Basically, the invoices were calculated from the scanned receipts by using python-tesseract software. Python- tesseract is an optical character recognition (OCR) tool for python. It will recognize and read the text embedded in images. So, thi
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!