Log in

Relevant bibliographies by topics / Object Character Recognition (OCR) / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Object Character Recognition (OCR).

Dissertations / Theses on the topic 'Object Character Recognition (OCR)'

Author: Grafiati

Published: 5 June 2025

Last updated: 24 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Object Character Recognition (OCR).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Lamberti, Lorenzo. "A deep learning solution for industrial OCR applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19777/.

Full text

Abstract:

This thesis describes a project developed throughout a six months internship in the Machine Vision Laboratory of Datalogic based in Pasadena, California. The project aims to develop a deep learning system as a possible solution for industrial optical character recognition applications. In particular, the focus falls on a specific algorithm called You Only Look Once (YOLO), which is a general-purpose object detector based on convolutional neural networks that currently offers state-of-the-art performances in terms of trade-off between speed and accuracy. This algorithm is indeed well known for reaching impressive processing speeds, but its intrinsic structure makes it struggle in detecting small objects clustered together, which unfortunately matches our scenario: we are trying to read alphanumerical codes by detecting each single character and then reconstructing the final string. The final goal of this thesis is to overcome this drawback and push the accuracy performances of a general object detector convolutional neural network to its limits, in order to meet the demanding requirements of industrial OCR applications. To accomplish this, first YOLO's unique detecting approach was mastered in its original framework called Darknet, written in C and CUDA, then all the code was translated into Python programming language for a better flexibility, which also allowed the deployment of a custom architecture. Four different datasets with increasing complexity were used as case-studies and the final performances reached were surprising: the accuracy varies between 99.75\% and 99.97\% with a processing time of 15 ms for images $1000\times1000$ big, largely outperforming in speed the current deep learning solution deployed by Datalogic. On the downsides, the training phase usually requires a very large amount of data and time and YOLO also showed some memorization behaviours if not enough variability is given at training time.

APA, Harvard, Vancouver, ISO, and other styles

2

McDonald, Mercedes Terre. "OCR: A STATISTICAL MODEL OF MULTI-ENGINE OCR SYSTEMS." Master's thesis, University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4459.

Full text

Abstract:

This thesis is a benchmark performed on three commercial Optical Character Recognition (OCR) engines. The purpose of this benchmark is to characterize the performance of the OCR engines with emphasis on the correlation of errors between each engine. The benchmarks are performed for the evaluation of the effect of a multi-OCR system employing a voting scheme to increase overall recognition accuracy. This is desirable since currently OCR systems are still unable to recognize characters with 100% accuracy. The existing error rates of OCR engines pose a major problem for applications where a single error can possibly effect significant outcomes, such as in legal applications. The results obtained from this benchmark are the primary determining factor in the decision of implementing a voting scheme. The experiment performed displayed a very high accuracy rate for each of these commercial OCR engines. The average accuracy rate found for each engine was near 99.5% based on a less than 6,000 word document. While these error rates are very low, the goal is 100% accuracy in legal applications. Based on the work in this thesis, it has been determined that a simple voting scheme will help to improve the accuracy rate.<br>M.S.<br>Department of Electrical and Computer Engineering<br>Engineering and Computer Science<br>Electrical and Computer Engineering

APA, Harvard, Vancouver, ISO, and other styles

3

Granlund, Oskar, and Kai Böhrnsen. "Improving character recognition by thresholding natural images." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-208899.

Full text

Abstract:

The current state of the art optical character recognition (OCR) algorithms are capable of extracting text from images in predefined conditions. OCR is extremely reliable for interpreting machine-written text with minimal distortions, but images taken in a natural scene are still challenging. In recent years the topic of improving recognition rates in natural images has gained interest because more powerful handheld devices are used. The main problem faced dealing with recognition in natural images are distortions like illuminations, font textures, and complex backgrounds. Different preprocessing approaches to separate text from its background have been researched lately. In our study, we assess the improvement reached by two of these preprocessing methods called k-means and Otsu by comparing their results from an OCR algorithm. The study showed that the preprocessing made some improvement on special occasions, but overall gained worse accuracy compared to the unaltered images.<br>Dagens optisk teckeninläsnings (OCR) algoritmer är kapabla av att extrahera text från bilder inom fördefinierade förhållanden. De moderna metoderna har uppnått en hög träffsäkerhet för maskinskriven text med minimala förvrängningar, men bilder tagna i en naturlig scen är fortfarande svåra att hantera. De senaste åren har ett stort intresse för att förbättra tecken igenkännings algoritmerna uppstått, eftersom fler kraftfulla och handhållna enheter används. Det huvudsakliga problemet när det kommer till igenkänning i naturliga bilder är olika förvrängningar som infallande ljus, textens textur och komplicerade bakgrunder. Olika metoder för förbehandling och därmed separation av texten och dess bakgrund har studerats under den senaste tiden. I våran studie bedömer vi förbättringen som uppnås vid förbehandlingen med två metoder som kallas för k-means och Otsu genom att jämföra svaren från en OCR algoritm. Studien visar att Otsu och k-means kan förbättra träffsäkerheten i vissa förhållanden men generellt sett ger det ett sämre resultat än de oförändrade bilderna.

APA, Harvard, Vancouver, ISO, and other styles

4

Mishra, Vishal Vijayshankar. "Sequence-to-Sequence Learning using Deep Learning for Optical Character Recognition (OCR)." University of Toledo / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1513273051760905.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Rodrigues, Antonio Jose Nunes Navarro. "A robust off-line hand written character recognition system using dynamic features." Thesis, University of Newcastle Upon Tyne, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.295503.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Sandgren, Frida. "Creation of a customised character recognition application." Thesis, Uppsala University, Department of Linguistics and Philology, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-4801.

Full text

Abstract:

<p>This master’s thesis describes the work in creating a customised optical character recognition (OCR) application; intended for use in digitisation of theses submitted to the Uppsala University in the 18th and 19th centuries. For this purpose, an open source software called Gamera has been used for recognition and classification of the characters in the documents. The software provides specific algorithms for analysis of heritage documents and is designed to be used as a tool for creating domain-specific (i.e. customised) recognition applications.</p><p>By using the Gamera classifier training interface, classifier data was created which reflects the characters in the particular theses. The data can then be used in automatic recognition of ‘new’ characters, by loading it into one of Gamera’s classifiers. The output of Gamera are sets of classified glyphs (i.e. small images of characters), stored in an XML-based format.</p><p>However, as OCR typically involves translation of images of text into a machine-readable format, a complementary OCR-module was needed. For this purpose, an external Gamera module for page segmentation was modified and used.</p><p>In addition, a script for control of the OCR-process was created, which initiates the page segmentation on Gamera classified glyphs. The result is written to text files.</p><p>Finally, in a test for recognition accuracy, one of the theses was used for creation of training data and for test of data. The result from the test show an average accuracy rate of 82% and that there is a need for a better pre-processing module which removes more noise from the images, as well as recognises different character sizes in the images before they are run by the OCR-process.</p>

APA, Harvard, Vancouver, ISO, and other styles

7

Monger, David M. "The human factors aspects of interactive document image description for OCR of handwritten forms." Thesis, University of Essex, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.238747.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Radvar-Zanganeh, Siasb. "The role of the Elementary Perceiver and Memorizer (EPAM) in optical character recognition (OCR)." Thesis, Connect to online version, 1994. http://0-wwwlib.umi.com.mercury.concordia.ca/cr/concordia/fullcit?pMM10888.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Favish, Ashleigh. "Data Capture Automation in the South African Deeds Registry using Optical Character Recognition (OCR)." Master's thesis, Faculty of Commerce, 2019. http://hdl.handle.net/11427/31389.

Full text

Abstract:

The impact of apartheid on land registration is still evident within South Africa. The Deeds Registry is facing a current backlog in registering an estimated 900,000 title deeds. Providing formal ownership, through title, is seen as necessary for unlocking the 'dead capital’ of unregistered property, fostering access to capital markets and poverty alleviation. Within the current legislative framework, the Deeds Registry only accepts paper documents, which introduces inefficiencies. To increase the number of deeds processed per day, automation of manual data capture is tested using an OCR pipeline. To adapt to the linguistics used in title deeds, text analysis and parsing is done using Regex. Uploading the scanned title deeds onto IPFS is as an additional security measure included in the pipeline. Previous research has failed to apply these techniques to formal land registration or other South African government institutions. The preliminary results show that this pipeline has an overall accuracy of 89.6%. This represents the comparison of the expected output to the output extracted using OCR. The results are significantly less accurate when classifying handwritten and stamped information. Thus, further measures are required to increase accuracy for these fields. The OCR accuracy was 98.3% for the fields extracted from typed text characters. This is within the accuracy range of manual data capture. A secondary quality check, which is currently done on manual data capture, would still be necessary to ensure accuracy of inputs. Overall it appears that this application would be appropriate for incorporation into the Deeds Registry to streamline their processes while ensuring title deed validity.

APA, Harvard, Vancouver, ISO, and other styles

10

Serafini, Sara. "Machine Learning applied to OCR tasks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text

Abstract:

The content of this thesis describes the work done during a six-month internship at Datalogic, in its research laboratories in Pasadena (CA). The aim of my research was to implement and evaluate a classifier as part of an industrial OCR system for learning purposes and to see how well it could work in comparison to current best Datalogic products, since it might be simpler/faster, it might be a good alternative for implementing on an embedded system (where current Datalogic products may not be able to run fast enough).

APA, Harvard, Vancouver, ISO, and other styles

11

Shah, Jaimin Nitesh. "Underwater Document Recognition." University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1619452066101887.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Kapusta, Ján. "OCR modul pro rozpoznání písmen a číslic." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2010. http://www.nusl.cz/ntk/nusl-218623.

Full text

Abstract:

This paper describes basic methods used for optical character recognition. It explains all procedures of recognition from adjustment of picture, processing, feature extracting to matching algorithms. It compares methods and algorithms for character recognition obtained graphically distorted or else modified image so-called „captcha“, used in present. Further it compares method based on invariant moments and neural network as final classifier and method based on correlation between normals and recognized characters.

APA, Harvard, Vancouver, ISO, and other styles

13

Santos, Claudio Filipi Gonçalves dos. "Optical character recognition using deep learning." Universidade Estadual Paulista (UNESP), 2018. http://hdl.handle.net/11449/154100.

Full text

Abstract:

Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-24T11:51:59Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 8334356 bytes, checksum: 8dd05363a96c946ae1f6d665edc80d09 (MD5)<br>Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Problema 02) Corrigir a ordem das páginas pré-textuais; a ordem correta (capa, folha de rosto, dedicatória, agradecimentos, epígrafe, resumo na língua vernácula, resumo em língua estrangeira, listas de ilustrações, de tabelas, de abreviaturas, de siglas e de símbolos e sumário). Problema 03) Faltam as palavras-chave no resumo e no abstracts. Na página da Seção de pós-graduação, em Instruções para Qualificação e Defesas de Dissertação e Tese, você pode acessar o modelo das páginas pré-textuais. Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão. on 2018-05-24T20:59:53Z (GMT)<br>Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T00:43:19Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 11084990 bytes, checksum: 6f8d7431cd17efd931a31c0eade10c65 (MD5)<br>Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Problema 02) A paginação deve ser sequencial, iniciando a contagem na folha de rosto e mostrando o número a partir da introdução, a ficha catalográfica ficará após a folha de rosto e não deverá ser contada. Problema 03) Na descrição do item: Título em outro idioma – Se você colocou no título em inglês deve por neste campo o título em outro idioma (ex: português, espanhol, francês...) Estamos encaminhando via e-mail o template/modelo para que você possa fazer as correções. Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão. on 2018-05-25T15:22:45Z (GMT)<br>Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T15:52:53Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 11089966 bytes, checksum: d6c863077a995bd2519035b8a3e97c80 (MD5)<br>Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Agradecemos a compreensão. on 2018-05-25T18:03:19Z (GMT)<br>Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T18:08:09Z No. of bitstreams: 1 Claudio Filipi Gonçalves dos Santos Corrigido Biblioteca.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5)<br>Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-05-25T18:51:24Z (GMT) No. of bitstreams: 1 santos_cfg_me_sjrp.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5)<br>Made available in DSpace on 2018-05-25T18:51:24Z (GMT). No. of bitstreams: 1 santos_cfg_me_sjrp.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5) Previous issue date: 2018-04-26<br>Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)<br>Detectores óticos de caracteres, ou Optical Character Recognition (OCR) é o nome dado à técnologia de traduzir dados de imagens em arquivo de texto. O objetivo desse projeto é usar aprendizagem profunda, também conhecido por aprendizado hierárquico ou Deep Learning para o desenvolvimento de uma aplicação com a habilidade de detectar áreas candidatas, segmentar esses espaços dan imagem e gerar o texto contido na figura. Desde 2006, Deep Learning emergiu como uma nova área em aprendizagem de máquina. Em tempos recentes, as técnicas desenvolvidas em pesquisas com Deep Learning têm influenciado e expandido escopo, incluindo aspectos chaves nas área de inteligência artificial e aprendizagem de máquina. Um profundo estudo foi conduzido com a intenção de desenvolver um sistema OCR usando apenas arquiteturas de Deep Learning.A evolução dessas técnicas, alguns trabalhos passados e como esses trabalhos influenciaram o desenvolvimento dessa estrutura são explicados nesse texto. Essa tese demonstra com resultados como um classificador de caracteres foi desenvolvido. Em seguida é explicado como uma rede neural pode ser desenvolvida para ser usada como um detector de objetos e como ele pode ser transformado em um detector de texto. Logo após é demonstrado como duas técnicas diferentes de Deep Learning podem ser combinadas e usadas na tarefa de transformar segmentos de imagens em uma sequência de caracteres. Finalmente é demonstrado como o detector de texto e o sistema transformador de imagem em texto podem ser combinados para se desenvolver um sistema OCR completo que detecta regiões de texto nas imagens e o que está escrito nessa região. Esse estudo demonstra que a idéia de usar apenas estruturas de Deep Learning podem ter performance melhores do técnicas baseadas em outras áreas da computação como por exemplo o processamento de imagens. Para detecção de texto foi alcançado mais de 70% de precisão quando uma arquitetura mais complexa foi usada, por volta de 69% de traduções de imagens para texto corretas e por volta de 50% na tarefa ponta-à-ponta de detectar as áreas de texto e traduzi-las em sequência de caracteres.<br>Optical Character Recognition (OCR) is the name given to the technology used to translate image data into a text file. The objective of this project is to use Deep Learning techniques to develop a software with the ability to segment images, detecting candidate characters and generating textthatisinthepicture. Since2006,DeepLearningorhierarchicallearning, emerged as a new machine learning area. Over recent years, the techniques developed from deep learning research have influenced and expanded scope, including key aspects of artificial intelligence and machine learning. A thorough study was carried out in order to develop an OCR system using only Deep Learning architectures. It is explained the evolution of these techniques, some past works and how they influenced thisframework’sdevelopment. Inthisthesisitisdemonstratedwithresults how a single character classifier was developed. Then it is explained how a neural network can be developed to be an object detector and how to transform this object detector into a text detector. After that it shows how a set of two Deep Learning techniques can be combined and used in the taskoftransformingacroppedregionofanimageinastringofcharacters. Finally, it demonstrates how the text detector and the Image-to-Text systemswerecombinedinordertodevelopafullend-to-endOCRsystemthat detects the regions of a given image containing text and what is written in this region. It shows the idea of using only Deep Learning structures can outperform other techniques based on other areas like image processing. In text detection it reached over 70% of precision when a more complex architecture was used, around 69% of correct translation of image-to-text areasandaround50%onend-to-endtaskofdetectingareasandtranslating them into text.<br>1623685

APA, Harvard, Vancouver, ISO, and other styles

14

Mendonça, Fábio Lúcio Lopes de. "Proposta de arquitetura de um sistema com base em OCR neuronal para resgate e indexação de escritas paleográficas do sec. XVI ao XIX." reponame:Repositório Institucional da UnB, 2008. http://repositorio.unb.br/handle/10482/1157.

Full text

Abstract:

Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2008.<br>Submitted by Jaqueline Oliveira (jaqueoliveiram@gmail.com) on 2008-11-21T11:33:16Z No. of bitstreams: 1 DISSERTACAO_2008_FabioLucioLMendonca.pdf: 1577853 bytes, checksum: f9d0f561e4281eac3a1f4808f3e239ea (MD5)<br>Approved for entry into archive by Georgia Fernandes(georgia@bce.unb.br) on 2009-02-06T10:39:13Z (GMT) No. of bitstreams: 1 DISSERTACAO_2008_FabioLucioLMendonca.pdf: 1577853 bytes, checksum: f9d0f561e4281eac3a1f4808f3e239ea (MD5)<br>Made available in DSpace on 2009-02-06T10:39:13Z (GMT). No. of bitstreams: 1 DISSERTACAO_2008_FabioLucioLMendonca.pdf: 1577853 bytes, checksum: f9d0f561e4281eac3a1f4808f3e239ea (MD5)<br>Este trabalho objetiva propor uma arquitetura de um sistema para tratamento e reconhecimento automático do texto de documentos paleográficos, utilizando um OCR (Optical Character Recognition) com tecnologia de redes neurais artificiais. O sistema proposto deve atuar no contexto de processos de transcrição do texto de documentos de escritas paleográficas do século XVI ao XIX, documentos estes do Brasil colônia que foram digitalizados a partir dos originais impressos arquivados no Arquivo Ultramarino de Lisboa, uma das realizações do Projeto Resgate do Ministério da Cultura brasileiro. A arquitetura do sistema proposto inclui módulos para segmentar as imagens digitalizadas dos documentos, para análise dos segmentos com OCR na tentativa de reconhecimento do texto, para treinamento do OCR com formação de um dicionário de palavras reconhecidas e para armazenamento do texto transcrito a partir das imagens dos documentos. Para avaliar essa arquitetura foi desenvolvido um protótipo de software que permite ao usuário segmentar manualmente uma imagem de documento, treinar um OCR simples e extrair com esse OCR algumas informações de texto do documento paleográfico digitalizado. Conclui-se que a arquitetura proposta é funcional, ainda que sejam necessários desenvolvimentos mais profundos no que se refere aos processos de segmentação dos documentos e reconhecimento das escritas paleográficas do século XVI ao XIX. ___________________________________________________________________________________________ ABSTRACT<br>This work propose a system architecture for automatic manipulate and recognize of text on paleographic document, using Optical Character Recognition (OCR) aggregate with artificial neural networks. The system should work on the context of process text transcription on text documents with paleographic writing of century XVI to XIX; those documents are acquired from Brazil on colony age and digitalized from the original files archived on Ultramario Archive of Lisboa, one works of Projeto Resgate from Brazilian Culture Ministry. The architecture of propose system has modules for segment the digital image of documents, analyze of segments with OCR in try of text recognize, OCR training for compose a dictionary of recognized worlds and also a module for storage the transcript text from document images. For evaluation has been developed prototype software, where one user could manually segment a document image, simple OCR training and using this OCR gets some text information from a digital paleographic document. We conclude that the propose architecture was functional, but still need more improvements on document segmentation module and on module that recognize the paleographic writings of century XVI to XIX.

APA, Harvard, Vancouver, ISO, and other styles

15

Noghe, Petr. "Vyhodnocení testových formulářů pomocí OCR." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-219986.

Full text

Abstract:

This thesis deals with the evaluation forms using optical character recognition. Image processing and methods used for OCR is described in the first part of thesis. In the practical part is created database of sample characters. The chosen method is based on correlation between patterns and recognized characters. The program is designed in a graphical environment MATLAB. Finally, several forms are evaluated and success rate of the proposed program is detected.

APA, Harvard, Vancouver, ISO, and other styles

16

Zhai, Xiaojun. "Automatic number plate recognition on FPGA." Thesis, University of Hertfordshire, 2013. http://hdl.handle.net/2299/14231.

Full text

Abstract:

Intelligent Transportation Systems (ITSs) play an important role in modern traffic management, which can be divided into intelligent infrastructure systems and intelligent vehicle systems. Automatic Number Plate Recognition systems (ANPRs) are one of infrastructure systems that allow users to track, identify and monitor moving vehicles by automatically extracting their number plates. ANPR is a well proven technology that is widely used throughout the world by both public and commercial organisations. There are a wide variety of commercial uses for the technology that include automatic congestion charge systems, access control and tracing of stolen cars. The fundamental requirements of an ANPR system are image capture using an ANPR camera and processing of the captured image. The image processing part, which is a computationally intensive task, includes three stages: Number Plate Localisation (NPL), Character Segmentation (CS) and Optical Character Recognition (OCR). The common hardware choice for its implementation is often high performance workstations. However, the cost, compactness and power issues that come with these solutions motivate the search for other platforms. Recent improvements in low-power high-performance Field Programmable Gate Arrays (FPGAs) and Digital Signal Processors (DSPs) for image processing have motivated researchers to consider them as a low cost solution for accelerating such computationally intensive tasks. Current ANPR systems generally use a separate camera and a stand-alone computer for processing. By optimising the ANPR algorithms to take specific advantages of technical features and innovations available within new FPGAs, such as low power consumption, development time, and vast on-chip resources, it will be possible to replace the high performance roadside computers with small in-camera dedicated platforms. In spite of this, costs associated with the computational resources required for complex algorithms together with limited memory have hindered the development of embedded vision platforms. The work described in this thesis is concerned with the development of a range of image processing algorithms for NPL, CS and OCR and corresponding FPGA architectures. MATLAB implementations have been used as a proof of concept for the proposed algorithms prior to the hardware implementation. The proposed architectures are speed/area efficient architectures, which have been implemented and verified using the Mentor Graphics RC240 FPGA development board equipped with a 4M Gates Xilinx Virtex-4 LX40. The proposed NPL architecture can localise a number plate in 4.7 ms whilst achieving a 97.8% localisation rate and consuming only 33% of the available area of the Virtex-4 FPGA. The proposed CS architecture can segment the characters within a NP image in 0.2-1.4 ms with 97.7% successful segmentation rate and consumes only 11% of the Virtex-4 FPGA on-chip resources. The proposed OCR architecture can recognise a character in 0.7 ms with 97.3% successful recognition rate and consumes only 23% of the Virtex-4 FPGA available area. In addition to the three main stages, two pre-processing stages which consist of image binarisation, rotation and resizing are also proposed to link these stages together. These stages consume 9% of the available FPGA on-chip resources. The overall results achieved show that the entire ANPR system can be implemented on a single FPGA that can be placed within an ANPR camera housing to create a stand-alone unit. As the benefits of this are drastically improve energy efficiency and removing the need for the installation and cabling costs associated with bulky PCs situated in expensive, cooled, waterproof roadside cabinets.

APA, Harvard, Vancouver, ISO, and other styles

17

Nederhof, Mark-Jan. "OCR of hand-written transcriptions of hieroglyphic text." Universitätsbibliothek Leipzig, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-201704.

Full text

Abstract:

Encoding hieroglyphic texts is time-consuming. If a text already exists as hand-written transcription, there is an alternative, namely OCR. Off-the-shelf OCR systems seem difficult to adapt to the peculiarities of Ancient Egyptian. Presented is a proof-of-concept tool that was designed to digitize texts of Urkunden IV in the hand-writing of Kurt Sethe. It automatically recognizes signs and produces a normalized encoding, suitable for storage in a database, or for printing on a screen or on paper, requiring little manual correction. The encoding of hieroglyphic text is RES (Revised Encoding Scheme) rather than (common dialects of) MdC (Manuel de Codage). Earlier papers argued against MdC and in favour of RES for corpus development. Arguments in favour of RES include longevity of the encoding, as its semantics are font-independent. The present study provides evidence that RES is also much preferable to MdC in the context of OCR. With a well-understood parsing technique, relative positioning of scanned signs can be straightforwardly mapped to suitable primitives of the encoding.

APA, Harvard, Vancouver, ISO, and other styles

18

Kraljevic, Matija. "Character recognition in natural images : Testing the accuracy of OCR and potential improvement by image segmentation." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-187991.

Full text

Abstract:

In recent years, reading text from natural images has gained renewed research attention. One of the main reasons for this is the rapid growth of camera-based applications on smart phones and other portable devices. With the increasing availability of high performance, low-priced, image-capturing devices, the application of scene text recognition is rapidly expanding and becoming increasingly popular. Despite many efforts, character recognition in natural images, is still considered a challenging and unresolved problem. The difficulties stem from the fact that natural images suffer from a wide variety of obstacles such as complex backgrounds, font variation, uneven illumination, resolution problems, occlusions, perspective effects, just to mention a few. This paper aims to test the accuracy of OCR in character recognition of natural images as well as testing the possible improvement in accuracy after implementing three different segmentation methods.The results showed that the accuracy of OCR was very poor and no improvments in accuracy were found after implementing the chosen segmentation methods.

APA, Harvard, Vancouver, ISO, and other styles

19

Onak, Onder Nazim. "Comparison Of Ocr Algorithms Using Fourier And Wavelet Based Feature Extraction." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12612928/index.pdf.

Full text

Abstract:

A lot of research have been carried in the field of optical character recognition. Selection of a feature extraction scheme is probably the most important factor in achieving high recognition performance. Fourier and wavelet transforms are among the popular feature extraction techniques allowing rotation invariant recognition. The performance of a particular feature extraction technique depends on the used dataset and the classifier. Dierent feature types may need dierent types of classifiers. In this thesis Fourier and wavelet based features are compared in terms of classification accuracy. The influence of noise with dierent intensities is also analyzed. Character recognition system is implemented with Matlab. Isolated gray scale character image first transformed into one dimensional function. Then, set of features are extracted. The feature set are fed to a classifier. Two types of classifier were used, Nearest Neighbor and Linear Discriminant Function. The performance of each feature extraction and classification methods were tested on various rotated and scaled character images.

APA, Harvard, Vancouver, ISO, and other styles

20

Sprague, Christopher. "AUTONOMOUS REPAIR OF OPTICAL CHARACTER RECOGNITION DATA THROUGH SIMPLE VOTING AND MULTI-DIMENSIONAL INDEXING TECHNIQUES." Master's thesis, University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2642.

Full text

Abstract:

The three major optical character recognition (OCR) engines (ExperVision, Scansoft OCR, and Abby OCR) in use today are all capable of recognizing text at near perfect percentages. The remaining errors however have proven very difficult to identify within a single engine. Recent research has shown that a comparison between the errors of the three engines proved to have very little correlation, and thus, when used in conjunction, may be useful to increase accuracy of the final result. This document discusses the implementation and results of a simple voting system designed to prove the hypothesis and show a statistical improvement in overall accuracy. Additional aspects of implementing an improved OCR scheme such as dealing with multiple engine data output alignment and recognizing application specific solutions are also addressed in this research. Although voting systems are currently in use by many major OCR engine developers, this research focuses on the addition of a collaborative system which is able to utilize the various positive aspects of multiple engines while also addressing the immediate need for practical industry applications such as litigation and forms processing. Doculex TM, a major developer and leader in the document imaging industry, has provided the funding for this research.<br>M.S.Cp.E.<br>Department of Electrical and Computer Engineering<br>Engineering and Computer Science<br>Computer Engineering

APA, Harvard, Vancouver, ISO, and other styles

21

Ilestrand, Maja. "Automatic Eartag Recognition on Dairy Cows in Real Barn Environment." Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139245.

Full text

Abstract:

All dairy cows in Europe wear unique identification tags in their ears. These eartags are standardized and contains the cows identification numbers, today only used for visual identification by the farmer. The cow also needs to be identified by an automatic identification system connected to milk machines and other robotics used at the farm. Currently this is solved with a non-standardized radio transmitter which can be placed on different places on the cow and different receivers needs to be used on different farms. Other drawbacks with the currently used identification system are that it is expensive and unreliable. This thesis explores the possibility to replace this non standardized radio frequency based identification system with a standardized computer vision based system. The method proposed in this thesis uses a color threshold approach for detection, a flood fill approach followed by Hough transform and a projection method for segmentation and evaluates template matching, k-nearest neighbour and support vector machines as optical character recognition methods. The result from the thesis shows that the quality of the data used as input to the system is vital. By using good data, k-nearest neighbour, which showed the best results of the three OCR approaches, handles 98 % of the digits.

APA, Harvard, Vancouver, ISO, and other styles

22

Joosep, Henno. "Empirical Evaluation of Approaches for Digit Recognition." Thesis, Linnéuniversitetet, Institutionen för datavetenskap (DV), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-46676.

Full text

Abstract:

Optical Character Recognition (OCR) is a well studied subject involving variousapplication areas. OCR results in various limited problem areas are promising,however building highly accurate OCR application is still problematic in practice.This thesis discusses the problem of recognizing and confirming Bingo lottery numbersfrom a real lottery field, and a prototype for Android phone is implementedand evaluated. An OCR library Tesseract and two Artificial Neural Network (ANN)approaches are compared in an experiment and discussed. The results show thattraining a neural network for each number gives slightly higher results than Tesseract.

APA, Harvard, Vancouver, ISO, and other styles

23

Dolci, Beatrice. "Development of a Deep Learning system for Optical Character Recognition." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text

Abstract:

This thesis is a documentation of the work resulting of an internship in Datalogic USA Inc. in Pasadena, California, of the duration of six months. The developed project had the purpose of designing a system for detecting text lines within high resolution images in an industrial framework, making use of Deep Learning and convolutional neural networks, focusing on the use of a general objects detection system in the context of optical character recognition. The chosen general purpose object detector was YOLO, currently providing state-of-the-art performances in terms of trade-off between speed and accuracy. The goal of the thesis work was to configure and specialize a general object detection convolutional neural network in such a way to optimize its performances for the purpose of optical character recognition. After laying down the theoretical bases, the specific object detection system (YOLO) was mastered, from the architecture of the network, to the structure of output and loss function. The same neural network framework as for the original implementation of YOLO was used, called Darknet. Darknet consists of a system for building, training and testing neural networks written in C, CUDA and featuring OpenCV libraries. Part of the thesis work consisted in gaining deep knowledge of the code and enhancing it with additional features. New solutions were proposed to maximize accuracy on the given datasets and solve technology-related problems that were impairing performances in some instances. It resulted that YOLO is impressively fast, providing a very large speedup with respect to the current OCR solution used by Datalogic. It is very accurate as long as its training set features enough variability. On the other hand, it struggles at generalizing on unknown patterns.

APA, Harvard, Vancouver, ISO, and other styles

24

Nell, Henrik. "Quantifying the noise tolerance of the OCR engine Tesseract using a simulated environment." Thesis, Blekinge Tekniska Högskola, Institutionen för kreativa teknologier, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4028.

Full text

Abstract:

->Context. Optical Character Recognition (OCR), having a computer recognize text from an image, is not as intuitive as human recognition. Even small (to human eyes) degradations can thwart the OCR result. The problem is that random unknown degradations are unavoidable in a real-world setting. ->Objectives. The noise tolerance of Tesseract, a state-of-the-art OCR engine, is evaluated in relation to how well it handles salt and pepper noise, a type of image degradation. Noise tolerance is measured as the percentage of aberrant pixels when comparing two images (one with noise and the other without noise). ->Methods. A novel systematic approach for finding the noise tolerance of an OCR engine is presented. A simulated environment is developed, where the test parameters, called test cases (font, font size, text string), can be modified. The simulation program creates a text string image (white background, black text), degrades it iteratively using salt and pepper noise, and lets Tesseract perform OCR on it, in each iteration. The iteration process is stopped when the comparison between the image text string and the OCR result of Tesseract mismatches. ->Results. Simulation results are given as changed pixels percentage (noise tolerance) between the clean text string image and the text string image the degradation iteration before Tesseract OCR failed to recognize all characters in the text string image. The results include 14400 test cases: 4 fonts (Arial, Calibri, Courier and Georgia), 100 font sizes (1-100) and 36 different strings (4*100*36=14400), resulting in about 1.8 million OCR attempts performed by Tesseract. ->Conclusions. The noise tolerance depended on the test parameters. Font sizes smaller than 7 were not recognized at all, even without noise applied. The font size interval 13-22 was the peak performance interval, i.e. the font size interval that had the highest noise tolerance, except for the only monospaced font tested, Courier, which had lower noise tolerance in the peak performance interval. The noise tolerance trend for the font size interval 22-100 was that the noise tolerance decreased for larger font sizes. The noise tolerance of Tesseract as a whole, given the experiment results, was circa 6.21 %, i.e. if 6.21 % of the pixel in the image has changed Tesseract can still recognize all text in the image.<br><p>42</p>

APA, Harvard, Vancouver, ISO, and other styles

25

Kozlovski, Nikolai. "TEXT-IMAGE RESTORATION AND TEXT ALIGNMENT FOR MULTI-ENGINE OPTICAL CHARACTER RECOGNITION SYSTEMS." Master's thesis, University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3607.

Full text

Abstract:

Previous research showed that combining three different optical character recognition (OCR) engines (ExperVision® OCR, Scansoft OCR, and Abbyy® OCR) results using voting algorithms will get higher accuracy rate than each of the engines individually. While a voting algorithm has been realized, several aspects to automate and improve the accuracy rate needed further research. This thesis will focus on morphological image preprocessing and morphological text restoration that goes to OCR engines. This method is similar to the one used in restoration partial finger prints. Series of morphological dilating and eroding filters of various mask shapes and sizes were applied to text of different font sizes and types with various noises added. These images were then processed by the OCR engines, and based on these results successful combinations of text, noise, and filters were chosen. The thesis will also deal with the problem of text alignment. Each OCR engine has its own way of dealing with noise and corrupted characters; as a result, the output texts of OCR engines have different lengths and number of words. This in turn, makes it impossible to use spaces a delimiter as a method to separate the words for processing by the voting part of the system. Text aligning determines, using various techniques, what is an extra word, what is supposed to be two or more words instead of one, which words are missing in one document compared to the other, etc. Alignment algorithm is made up of a series of shifts in the two texts to determine which parts are similar and which are not. Since errors made by OCR engines are due to visual misrecognition, in addition to simple character comparison (equal or not), a technique was developed that allows comparison of characters based on how they look.<br>M.S.E.E.<br>Department of Electrical and Computer Engineering<br>Engineering and Computer Science<br>Electrical Engineering

APA, Harvard, Vancouver, ISO, and other styles

26

Lund, William B. "Ensemble Methods for Historical Machine-Printed Document Recognition." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4024.

Full text

Abstract:

The usefulness of digitized documents is directly related to the quality of the extracted text. Optical Character Recognition (OCR) has reached a point where well-formatted and clean machine- printed documents are easily recognizable by current commercial OCR products; however, older or degraded machine-printed documents present problems to OCR engines resulting in word error rates (WER) that severely limit either automated or manual use of the extracted text. Major archives of historical machine-printed documents are being assembled around the globe, requiring an accurate transcription of the text for the automated creation of descriptive metadata, full-text searching, and information extraction. Given document images to be transcribed, ensemble recognition methods with multiple sources of evidence from the original document image and information sources external to the document have been shown in this and related work to improve output. This research introduces new methods of evidence extraction, feature engineering, and evidence combination to correct errors from state-of-the-art OCR engines. This work also investigates the success and failure of ensemble methods in the OCR error correction task, as well as the conditions under which these ensemble recognition methods reduce the Word Error Rate (WER), improving the quality of the OCR transcription, showing that the average document word error rate can be reduced below the WER of a state-of-the-art commercial OCR system by between 7.4% and 28.6% depending on the test corpus and methods. This research on OCR error correction contributes within the larger field of ensemble methods as follows. Four unique corpora for OCR error correction are introduced: The Eisenhower Communiqués, a collection of typewritten documents from 1944 to 1945; The Nineteenth Century Mormon Articles Newspaper Index from 1831 to 1900; and two synthetic corpora based on the Enron (2001) and the Reuters (1997) datasets. The Reverse Dijkstra Heuristic is introduced as a novel admissible heuristic for the A* exact alignment algorithm. The impact of the heuristic is a dramatic reduction in the number of nodes processed during text alignment as compared to the baseline method. From the aligned text, the method developed here creates a lattice of competing hypotheses for word tokens. In contrast to much of the work in this field, the word token lattice is created from a character alignment, preserving split and merged tokens within the hypothesis columns of the lattice. This alignment method more explicitly identifies competing word hypotheses which may otherwise have been split apart by a word alignment. Lastly, this research explores, in order of increasing contribution to word error rate reduction: voting among hypotheses, decision lists based on an in-domain training set, ensemble recognition methods with novel feature sets, multiple binarizations of the same document image, and training on synthetic document images.

APA, Harvard, Vancouver, ISO, and other styles

27

Odd, Joel, and Emil Theologou. "Utilize OCR text to extract receipt data and classify receipts with common Machine Learning algorithms." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148350.

Full text

Abstract:

This study investigated if it was feasible to use machine learning tools on OCR extracted text data to classify receipts and extract specific data points. Two OCR tools were evaluated, the first was Azure Computer Vision API and the second was Google Drive REST Api, where Google Drive REST Api was the main OCR tool used in the project because of its impressive performance. The classification task mainly tried to predict which of five given categories the receipts belongs to, and also a more challenging task of predicting specific subcategories inside those five larger categories. The data points we where trying to extract was the date of purchase on the receipt and the total price of the transaction. The classification was mainly done with the help of scikit-learn, while the extraction of data points was achieved by a simple custom made N-gram model. The results were promising with about 94 % cross validation score for classifying receipts based on category with the help of a LinearSVC classifier. Our custom model was successful in 72 % of cases for the price data point while the results for extracting the date was less successful with an accuracy of 50 %, which we still consider very promising given the simplistic nature of the custom model.

APA, Harvard, Vancouver, ISO, and other styles

28

Bristow, Kelly H. "Freeform Cursive Handwriting Recognition Using a Clustered Neural Network." Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc804845/.

Full text

Abstract:

Optical character recognition (OCR) software has advanced greatly in recent years. Machine-printed text can be scanned and converted to searchable text with word accuracy rates around 98%. Reasonably neat hand-printed text can be recognized with about 85% word accuracy. However, cursive handwriting still remains a challenge, with state-of-the-art performance still around 75%. Algorithms based on hidden Markov models have been only moderately successful, while recurrent neural networks have delivered the best results to date. This thesis explored the feasibility of using a special type of feedforward neural network to convert freeform cursive handwriting to searchable text. The hidden nodes in this network were grouped into clusters, with each cluster being trained to recognize a unique character bigram. The network was trained on writing samples that were pre-segmented and annotated. Post-processing was facilitated in part by using the network to identify overlapping bigrams that were then linked together to form words and sentences. With dictionary assisted post-processing, the network achieved word accuracy of 66.5% on a small, proprietary corpus. The contributions in this thesis are threefold: 1) the novel clustered architecture of the feed-forward neural network, 2) the development of an expanded set of observers combining image masks, modifiers, and feature characterizations, and 3) the use of overlapping bigrams as the textual working unit to assist in context analysis and reconstruction.

APA, Harvard, Vancouver, ISO, and other styles

29

Girjotas, Andrius. "Transporto priemonių numerių atpažinimo algoritmų analizė bei universalios atpažinimo sistemos teorija." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2006~D_20140702_193526-52512.

Full text

Abstract:

Automatinis transporto priemonių registracijos numerio atpažinimas atlieka svarbų vaidmenį daugelyje programinių paketų, taikomų tiek valstybinėse institucijose, tiek ir privačiose kompanijose, kuriuose yra naudojamos įvairios atpažinimo algoritmų technologijos. Tačiau net ir dabar neįmanoma sukurti idealiai veikiančios sistemos, kuri palieka laisvę efektyviausių algoritmų paieškai. Šio tiriamojo darbo tikslas yra išanalizuoti alternatyvius automobilio numerio lokalizacijos ir kitų atpažinimo etapų algoritmus, jų efektyvumą bei adaptyvumą. Analizė atliekama juos realizuojant ir atliekant tyrimus su testiniais duomenimis bei iš jų gautais rezultatais. Iš realizuotos alternatyvių atpažinimo algoritmų sistemos gauti rezultatai parodė, kad kiekviena atpažinimo proceso grandis yra jautri įvairiems faktoriams, kurių kitimas lemia tarpinių bei galutinių rezultatų variaciją.<br>Automatic license plate recognition plays an important role in numerous applications and a number of techniques have been proposed for public institutions or private companies. However, even now it is impossible to design a perfect and operational recognition system. It still leaves a space for creativity and research of the most effective algorithms. The main objective of this dissertation is to analyze alternatives of licese plate localization and other stages of recognition, their efficiency and adaptability. Selected means of this research are such as implementation of algorithms, analysis of testing data and test results. Every stage of recognition process is extremely sensitive to different factors which determinate variation of transitional and final results. This was proven by analysis of alternative algorithms functionality.

APA, Harvard, Vancouver, ISO, and other styles

30

Kazlauskas, Tomas. "Transporto priemonių numerių atpažinimo algoritmų analizė bei universalios atpažinimo sistemos teorija." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2006~D_20140702_193534-66751.

Full text

Abstract:

Automatinis transporto priemonių registracijos numerio atpažinimas atlieka svarbų vaidmenį daugelyje programinių paketų, taikomų tiek valstybinėse institucijose, tiek ir privačiose kompanijose, kuriuose yra naudojamos įvairios atpažinimo algoritmų technologijos. Tačiau net ir dabar neįmanoma sukurti idealiai veikiančios sistemos, kuri palieka laisvę efektyviausių algoritmų paieškai. Šio tiriamojo darbo tikslas yra išanalizuoti alternatyvius automobilio numerio lokalizacijos ir kitų atpažinimo etapų algoritmus, jų efektyvumą bei adaptyvumą. Analizė atliekama juos realizuojant ir atliekant tyrimus su testiniais duomenimis bei iš jų gautais rezultatais. Iš realizuotos alternatyvių atpažinimo algoritmų sistemos gauti rezultatai parodė, kad kiekviena atpažinimo proceso grandis yra jautri įvairiems faktoriams, kurių kitimas lemia tarpinių bei galutinių rezultatų variaciją.<br>Automatic license plate recognition plays an important role in numerous applications and a number of techniques have been proposed for public institutions or private companies. However, even now it is impossible to design a perfect and operational recognition system. It still leaves a space for creativity and research of the most effective algorithms. The main objective of this dissertation is to analyze alternatives of licese plate localization and other stages of recognition, their efficiency and adaptability. Selected means of this research are such as implementation of algorithms, analysis of testing data and test results. Every stage of recognition process is extremely sensitive to different factors witch determinate variation of transitional and final results. This was proven by analysis of alternative algorithms functionality.

APA, Harvard, Vancouver, ISO, and other styles

31

Edvartsen, Hannes. "OCR of dot peen markings : with deep learning and image analysis." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-71013.

Full text

Abstract:

A way to follow products through the chain of production is important in the process industry and it is often solved by marking them with serial numbers. In some cases permanent markings such as dot peen marking is required. To ensure profitability in the industry and reduce errors, these markings must be read automatically. Automatic reading of dot peen markings using a camera can be hard since there is low contrast between the background and the numbers, the background can be uneven and different illuminations can affect the visibility. In this work, two different systems are implemented and evaluated to assess the possibility of developing a robust system. One system uses image analysis to segment the numbers before classifying them. The other system uses the recent advances in deep learning for object detection. Both implementations are shown to work in near real-time on a cpu. The deep learning object detection approach was able to classify all numbers correct in a image 60% of the time, while the other approach only succeeded in 20% of the time.

APA, Harvard, Vancouver, ISO, and other styles

32

Johansson, Elias. "Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-88602.

Full text

Abstract:

Automatization is a desirable feature in many business areas. Manually extracting information from a physical object such as a receipt is something that can be automated to save resources for a company or a private person. In this paper the process will be described of combining an already existing OCR engine with a developed python script to achieve data extraction of valuable information from a digital image of a receipt. Values such as VAT, VAT%, date, total-, gross-, and net-cost; will be considered as valuable information. This is a feature that has already been implemented in existing applications. However, the company that I have done this project for are interested in creating their own version. This project is an experiment to see if it is possible to implement such an application using restricted resources. To develop a program that can extract the information mentioned above. In this paper you will be guided though the process of the development of the program. As well as indulging in the mindset, findings and the steps taken to overcome the problems encountered along the way. The program achieved a success rate of 86.6% in extracting the most valuable information: total cost, VAT% and date from a set of 53 receipts originated from 34 separate establishments.

APA, Harvard, Vancouver, ISO, and other styles

33

Di, Pasquale Elia. "CardManager: applicazione per la gestione di documenti e carte, e l'acquisizione di dati tramite OCR." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12244/.

Full text

Abstract:

Lo scopo di questa tesi è quello di implementare un sistema software per lo scanning e l’analisi di una vasta tipologia di documenti, nonché l’estrapolazione automatica (tramite tecnologie OCR), l’elaborazione e il mantenimento in formato strutturato di tutti i dati ricavabili dallo stesso; i dati ottenuti saranno così a disposizione in formato digitale, e fruibili per un molteplice numero di operazioni. Questa applicazione non è indirizzata ad una specifica circostanza, ma è fruibile in tutti quei contesti in cui le operazioni di raccolta e registrazione dei dati sono delle componenti “scomode” nel processo produttivo. Inoltre è utilizzabile nel quotidiano da chiunque voglia creare un suo “portafogli virtuale” per avere sempre a portata di mano in formato digitale i propri documenti, comodamente fruibili da smartphone. Obiettivo principale di questa tesi è la creazione di un sistema che combini le tecnologie OCR e i processi di elaborazione delle immagini per ottenere dei risultati di estrapolazione più accurati possibili, allo scopo di fornire tutta una serie di servizi per la gestione dei dati, in modo che il loro utilizzo non sia limitato ad una consultazione fine a se stessa, ma i dati siano utilizzabili e condivisibili in diverse modalità. Questo software è stato progettato e implementato mantenendo la massima flessibilità, e con particolare attenzione sull’estendibilità in modo da facilitare espansioni future. Si è quindi cercato di operare in un ambiente modulare per cercare di rendere ogni sezione del progetto il più indipendente possibile dalle altre, in modo da garantire la possibilità di modificare singole parti del progetto senza gravare sulle restanti. In fase di valutazione è stato possibile osservare che il sistema sviluppato riesce a gestire agilmente tutte le funzionalità richieste dai diversi scenari applicativi, e a mantenere un livello di accuratezza tale da garantire un corretto funzionamento anche in condizioni non ottimali.

APA, Harvard, Vancouver, ISO, and other styles

34

RUBIO, VILLALBA IGNACIO. "Analysis of the OCR System Application in Intermodal Terminals : Malmö Intermodal Terminal." Thesis, KTH, Transportplanering, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278856.

Full text

Abstract:

The analysis carried out in this thesis is made from two different points of view, the qualitative and the quantitative, by using the case study of Malmö intermodal terminal. The first analysis is focused on how the intermodal terminals works and which elements of it interact and how, in order to achieve the purpose of the terminal, and how the Intelligent Video Gate is able to affect in any way to this functioning, mainly in a positive way that allows the better functioning of the terminal.From the quantitative point of view what is carried out is a timing and economic analysis of the Malmö Intermodal Terminal, which is based on the information obtained from the qualitative analysis and from the data provided by the terminal operators that allow to make different simulations to compare the effect of the Intelligent Video Gate implementation in this specific terminal, and that could be extended to similar intermodal terminals located in regions with similar labour conditions and that as the European Union have a huge standardized freight system.Finally, what is stated with the provided data, despite not allowing to make the most complex and representative simulation, is that the aim of the Intelligent Video Gate is reached successfully with a great improvement of the efficiency what allows to ensure with quite certainty that the system implementation is recommended in this kind of terminals.

APA, Harvard, Vancouver, ISO, and other styles

35

Saracoglu, Ahmet. "Localization And Recognition Of Text In Digital Media." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/2/12609028/index.pdf.

Full text

Abstract:

Textual information within digital media can be used in many areas such as, indexing and structuring of media databases, in the aid of visually impaired, translation of foreign signs and many more. This said, mainly text can be separated into two categories in digital media as, overlay-text and scene-text. In this thesis localization and recognition of video text regardless of its category in digital media is investigated. As a necessary first step, framework of a complete system is discussed. Next, a comparative analysis of feature vector and classification method pairs is presented. Furthermore, multi-part nature of text is exploited by proposing a novel Markov Random Field approach for the classification of text/non-text regions. Additionally, better localization of text is achieved by introducing bounding-box extraction method. And for the recognition of text regions, a handprint based Optical Character Recognition system is thoroughly investigated. During the investigation of text recognition, multi-hypothesis approach for the segmentation of background is proposed by incorporating k-Means clustering. Furthermore, a novel dictionary-based ranking mechanism is proposed for recognition spelling correction. And overall system is simulated on a challenging data set. Also, a through survey on scene-text localization and recognition is presented. Furthermore, challenges are identified and discussed by providing related work on them. Scene-text localization simulations on a public competition data set are also provided. Lastly, in order to improve recognition performance of scene-text on signs that are affected from perspective projection distortion, a rectification method is proposed and simulated.

APA, Harvard, Vancouver, ISO, and other styles

36

Liaqat, Ahmad Gull. "Mobile Real-Time License Plate Recognition." Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-15944.

Full text

Abstract:

License plate recognition (LPR) system plays an important role in numerous applications, such as parking accounting systems, traffic law enforcement, road monitoring, expressway toll system, electronic-police system, and security systems. In recent years, there has been a lot of research in license plate recognition, and many recognition systems have been proposed and used. But these systems have been developed for computers. In this project, we developed a mobile LPR system for Android Operating System (OS). LPR involves three main components: license plate detection, character segmentation and Optical Character Recognition (OCR). For License Plate Detection and character segmentation, we used JavaCV and OpenCV libraries. And for OCR, we used tesseract-ocr. We obtained very good results by using these libraries. We also stored records of license numbers in database and for that purpose SQLite has been used.

APA, Harvard, Vancouver, ISO, and other styles

37

Albertazzi, Riccardo. "A study on the application of generative adversarial networks to industrial OCR." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text

Abstract:

High performance and nearly perfect accuracy are the standards required by OCR algorithms for industrial applications. In the last years research on Deep Learning has proven that Convolutional Neural Networks (CNNs) are a very powerful and robust tool for image analysis and classification; when applied to OCR tasks, CNNs are able to perform much better than previously adopted techniques and reach easily 99% accuracy. However, Deep Learning models' effectiveness relies on the quality of the data used to train them; this can become a problem since OCR tools can run for months without interruption, and during this period unpredictable variations (printer errors, background modifications, light conditions) could affect the accuracy of the trained system. We cannot expect that the final user who trains the tool will take thousands of training pictures under different conditions until all imaginable variations have been captured; we then have to be able to generate these variations programmatically. Generative Adversarial Networks (GANs) are a recent breakthrough in machine learning; these networks are able to learn the distribution of the input data and therefore generate realistic samples belonging to that distribution. This thesis' objective is learning how GANs work in detail and perform experiments on generative models that allow to create unseen variations of OCR training characters, thus allowing the whole OCR system to be more robust to future character variations.

APA, Harvard, Vancouver, ISO, and other styles

38

Sanfer, Jonathan. "API för att tolka och ta fram information från kvitton." Thesis, Örebro universitet, Institutionen för naturvetenskap och teknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-68741.

Full text

Abstract:

Denna rapport redogör för skapandet av ett API som kan extrahera information från bilder på kvitton. Informationen som APIet skulle kunna ta fram var organisationsnummer, datum, tid, summa och moms. Här ingår även en fördjupning om tekniken OCR (optical character recognition) som omvandlar bilder och dokument till text. Examensarbetet utfördes åt Flex Applications AB. Examensarbetet utfördes åt Flex Applications AB.<br>This report describes the creation of an API that can extract information from pictures of receipts. Registration number, date, time, sum and tax are the information that the API was going to be able to deliver. In this thesis there is also a deepening of the technology OCR (optical character recognition) that transforms pictures and documents to text. The thesis was performed for Flex Applications AB.

APA, Harvard, Vancouver, ISO, and other styles

39

Voils, Danny. "Scale Invariant Object Recognition Using Cortical Computational Models and a Robotic Platform." PDXScholar, 2012. https://pdxscholar.library.pdx.edu/open_access_etds/632.

Full text

Abstract:

This paper proposes an end-to-end, scale invariant, visual object recognition system, composed of computational components that mimic the cortex in the brain. The system uses a two stage process. The first stage is a filter that extracts scale invariant features from the visual field. The second stage uses inference based spacio-temporal analysis of these features to identify objects in the visual field. The proposed model combines Numenta's Hierarchical Temporal Memory (HTM), with HMAX developed by MIT's Brain and Cognitive Science Department. While these two biologically inspired paradigms are based on what is known about the visual cortex, HTM and HMAX tackle the overall object recognition problem from different directions. Image pyramid based methods like HMAX make explicit use of scale, but have no sense of time. HTM, on the other hand, only indirectly tackles scale, but makes explicit use of time. By combining HTM and HMAX, both scale and time are addressed. In this paper, I show that HTM and HMAX can be combined to make a com- plete cortex inspired object recognition model that explicitly uses both scale and time to recognize objects in temporal sequences of images. Additionally, through experimentation, I examine several variations of HMAX and its

APA, Harvard, Vancouver, ISO, and other styles

40

Al-Muhtaseb, Husni A. "Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/4426.

Full text

Abstract:

Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.<br>King Fahd University of Petroleum and Minerals (KFUPM)

APA, Harvard, Vancouver, ISO, and other styles

41

Osorio, Fernando Santos. "Um estudo sobre reconhecimento visual de caracteres através de redes neurais." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 1991. http://hdl.handle.net/10183/24184.

Full text

Abstract:

Este trabalho apresenta um estudo sabre reconhecimento visual de caracteres através da utilização das redes neurais. São abordados os assuntos referentes ao Processamento Digital de Imagens, aos sistemas de reconhecimento de caracteres, e as redes neurais. Ao final é apresentada uma proposta de implementação de um sistema OCR orientado ao reconhecimento de caracteres impressos, que utiliza uma rede neural desenvolvida especificamente para esta aplicação. O sistema proposto, que é denominado de sistema N2OCR, possui um protótipo implementado que também é descrito neste trabalho. Em relação ao Processamento Digital de Imagens são apresentados diversos temas, abrangendo os assuntos referentes à aquisição de imagens, ao tratamento das imagens e ao reconhecimento de padrões. A respeito da aquisição de imagens são destacados os aspectos referentes aos dispositivos de aquisição e os tipos de imagens obtidas através destes. Sobre o tratamento de imagens são abordados os aspectos referentes a imagens textuais, incluindo: halftoning, geração e modificação de histograma, limiarização e operações de filtragem. Quanto ao reconhecimento de padrões é feita uma breve análise das técnicas relacionadas a este tema. Os diversos tipos de sistemas de reconhecimento de caracteres são abordados, assim coma as técnicas e algoritmos empregados por estes. Além destes tópicos é apresentada uma discussão a respeito da avaliação dos resultados obtidos por estes sistemas, assim como é feita uma análise das principais dificuldades enfrentadas por estas aplicações. Neste trabalho é feita uma apresentação a respeito das redes neurais, suas características, histórico e evolução das pesquisas nesta área. É feita uma descrição dos principais modelos de redes neurais em destaque na atualidade: Perceptron, Adaline, Madaline, redes multinível, ART, modelo de Hopfield, máquina de Boltzmann, BAM e modelo de Kohonen. A partir da análise dos diferentes modelos de redes neurais empregados na atualidade, chega-se a proposta de um novo modelo de rede a ser utilizado pelo sistema N2OCR. São descritos os itens referentes ao aprendizado, ao reconhecimento e as possíveis extensões deste novo modelo. Também é abordada a possibilidade de implementação de um hardware dedicado para este modelo. No final deste trabalho é fornecida uma visão global do sistema N2OCR, descrevendo cada um de seus módulos. Também é feita uma descrição do protótipo implementado e de suas funções.<br>This work presents a study of visual character recognition using neural networks. It describes some aspects related to Digital Image Processing, character recognition systems and neural networks. The implementation proposal of one OCR system, for printed character recognition, is also presented. This system uses one neural network specifically developed for this purpose. The OCR system, named N2OCR, has a prototype implementation, which is also described. Several topics related to Digital Image Processing are presented, including some referent to image acquisition, image processing and pattern recognition. Some aspects on image acquisiton are treated, like acquisition equipments and kinds of image data obtained from those equipments. The following items about text image processing are mentioned: halftoning, hystogram generation and alteration, thresholding and filtering operations. A brief analysis about pattern recognition related to this theme is done. Different kinds of character recognition systems are described, as the techniques and algorithms used by them. Besides, a di cussi on about performance estimation of this OCR systems is done, including typical OCR problems description and analysis. In this work, neural networks are presented, describing their characteristics, historical aspects and research evolution in this field. Different famous neural network models are described: Perceptron, Adaline, Madaline, multilevel networks. ART, Hopfield's model , Boltzmann machine, BAM and Kohonen's model. From the analysis of such different neural network models, we arrive to a proposal of a new neural net model, where are described items related to learning, recognition and possible model extensions. A possible hardware implementation of this model is also presented. A global vision of N2OCR system is presented at the end of this work, describing each of its modules. A description of the prototype implementation and functions is also provided.

APA, Harvard, Vancouver, ISO, and other styles

42

Al-Muhtaseb, Husni Abdulghani. "Arabic text recognition of printed manuscripts : efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/4426.

Full text

Abstract:

Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.

APA, Harvard, Vancouver, ISO, and other styles

43

Fiala, Petr. "Rozpoznávání znaků z realných scén pomocí neuronových sítí." Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-198610.

Full text

Abstract:

This thesis focuses on a problem of character recognition from real scenes, which has earned significant amount of attention with the development of modern technology. The aim of the paper is to use an algorithm that has state-of-art performance on standard data sets and apply it for the recognition task. The chosen algorithm is a convolution network with deep structure where the application of the specified model has not yet been published. The implemented solution is built on theoretical parts which are provided in comprehensive overview. Two types of neural network are used in the practical part: a multilayer perceptron and the convolution model. But as the complex structure of the convolution networks gives much better performance compare with the classification error of the MLP on the first data set, only the convolution structure is used in the further experiments. The model is validated on two public data sets that correspond with the specification of the task. In order to obtain an optimal solution based on the data structure several tests had been made on the modificated network and with various adjustments on the input data. Presented solution provided comparable prediction rate compare to the best results of the other studies while using artificially generated learning pattern. In conclusion, the thesis describes possible extensions and improvements of the model, which should lead to the decrease of the classification error.

APA, Harvard, Vancouver, ISO, and other styles

44

Duba, Nikolas. "Jednoduché rozpoznávání písma." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-236979.

Full text

Abstract:

This thesis is focused on optical character recognition and its processing. The goal of this application is to make it possible easily track daily expenses. It can be used by an individual or by a company as a monitoring tool. The main principle is to make this tool most as user friendly as it can be. The application gets its input from hardware, such as a scanner or camera, and analyzes the content of the cash voucher for further processing. To analyze the voucher, the application employs different optical character recognition methods. The result is subsequently parsed. Detailed explanations of used methods are inside the document. The application output is a filled database with cash voucher details. Another part of the work is an information system with the main purpose of displaying the collected data.

APA, Harvard, Vancouver, ISO, and other styles

45

Šebela, Miroslav. "Detekce objektu ve videosekvencích." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2010. http://www.nusl.cz/ntk/nusl-218732.

Full text

Abstract:

The thesis consists of three parts. Theoretical description of digital image processing, optical character recognition and design of system for car licence plate recognition (LPR) in image or video sequence. Theoretical part describes image representation, smoothing, methods used for blob segmentation and proposed are two methods for optical character recognition (OCR). Concern of practical part is to find solution and design procedure for LPR system included OCR. The design contain image pre-processing, blob segmentation, object detection based on its properties and OCR. Proposed solution use grayscale trasformation, histogram processing, thresholding, connected component,region recognition based on its patern and properties. Implemented is also optical recognition method of licence plate where acquired values are compared with database used to manage entry of vehicles into object.

APA, Harvard, Vancouver, ISO, and other styles

46

Galarza, Luis E. "A Book Reader Design for Persons with Visual Impairment and Blindness." FIU Digital Commons, 2017. https://digitalcommons.fiu.edu/etd/3541.

Full text

Abstract:

The objective of this dissertation is to provide a new design approach to a fully automated book reader for individuals with visual impairment and blindness that is portable and cost effective. This approach relies on the geometry of the design setup and provides the mathematical foundation for integrating, in a unique way, a 3-D space surface map from a low-resolution time of flight (ToF) device with a high-resolution image as means to enhance the reading accuracy of warped images due to the page curvature of bound books and other magazines. The merits of this low cost, but effective automated book reader design include: (1) a seamless registration process of the two imaging modalities so that the low resolution (160 x 120 pixels) height map, acquired by an Argos3D-P100 camera, accurately covers the entire book spread as captured by the high resolution image (3072 x 2304 pixels) of a Canon G6 Camera; (2) a mathematical framework for overcoming the difficulties associated with the curvature of open bound books, a process referred to as the dewarping of the book spread images, and (3) image correction performance comparison between uniform and full height map to determine which map provides the highest Optical Character Recognition (OCR) reading accuracy possible. The design concept could also be applied to address the challenging process of book digitization. This method is dependent on the geometry of the book reader setup for acquiring a 3-D map that yields high reading accuracy once appropriately fused with the high-resolution image. The experiments were performed on a dataset consisting of 200 pages with their corresponding computed and co-registered height maps, which are made available to the research community (cate-book3dmaps.fiu.edu). Improvements to the characters reading accuracy, due to the correction steps, were quantified and measured by introducing the corrected images to an OCR engine and tabulating the number of miss-recognized characters. Furthermore, the resilience of the book reader was tested by introducing a rotational misalignment to the book spreads and comparing the OCR accuracy to those obtained with the standard alignment. The standard alignment yielded an average reading accuracy of 95.55% with the uniform height map (i.e., the height values of the central row of the 3-D map are replicated to approximate all other rows), and 96.11% with the full height maps (i.e., each row has its own height values as obtained from the 3D camera). When the rotational misalignments were taken into account, the results obtained produced average accuracies of 90.63% and 94.75% for the same respective height maps, proving added resilience of the full height map method to potential misalignments.

APA, Harvard, Vancouver, ISO, and other styles

47

Hříbek, David. "Active Learning pro zpracování archivních pramenů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445535.

Full text

Abstract:

This work deals with the creation of a system that allows uploading and annotating scans of historical documents and subsequent active learning of models for character recognition (OCR) on available annotations (marked lines and their transcripts). The work describes the process, classifies the techniques and presents an existing system for character recognition. Above all, emphasis is placed on machine learning methods. Furthermore, the methods of active learning are explained and a method of active learning of available OCR models from annotated scans is proposed. The rest of the work deals with a system design, implementation, available datasets, evaluation of self-created OCR model and testing of the entire system.

APA, Harvard, Vancouver, ISO, and other styles

48

Le, Berre Guillaume. "Vers la mitigation des biais en traitement neuronal des langues." Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0074.

Full text

Abstract:

Il est de notoriété que les modèles d'apprentissage profond sont sensibles aux biais qui peuvent être présents dans les données utilisées pour l'apprentissage. Ces biais qui peuvent être définis comme de l'information inutile ou préjudiciable pour la tâche considérée, peuvent être de différentes natures: on peut par exemple trouver des biais dans les styles d'écriture utilisés, mais aussi des biais bien plus problématiques portant sur le sexe ou l'origine ethnique des individus. Ces biais peuvent provenir de différentes sources, comme des annotateurs ayant créé les bases de données, ou bien du processus d'annotation lui-même. Ma thèse a pour sujet l'étude de ces biais et, en particulier, s'organise autour de la mitigation des effets des biais sur l'apprentissage des modèles de Traitement Automatique des Langues (TAL). J'ai notamment beaucoup travaillé avec les modèles pré-entraînés comme BERT, RoBERTa ou UnifiedQA qui sont devenus incontournables ces dernières années dans tous les domaines du TAL et qui, malgré leur large pré-entraînement, sont très sensibles à ces problèmes de biais. Ma thèse s'organise en trois volets, chacun présentant une façon différente de gérer les biais présents dans les données. Le premier volet présente une méthode permettant d'utiliser les biais présents dans une base de données de résumé automatique afin d'augmenter la variabilité et la contrôlabilité des résumés générés. Puis, dans le deuxième volet, je m'intéresse à la génération automatique d'une base de données d'entraînement pour la tâche de question-réponse à choix multiples. L'intérêt d'une telle méthode de génération est qu'elle permet de ne pas faire appel à des annotateurs et donc d'éliminer les biais venant de ceux-ci dans les données. Finalement, je m'intéresse à l'entraînement d'un modèle multitâche pour la reconnaissance optique de texte. Je montre dans ce dernier volet qu'il est possible d'augmenter les performances de nos modèles en utilisant différents types de données (manuscrites et tapuscrites) lors de leur entraînement<br>It is well known that deep learning models are sensitive to biases that may be present in the data used for training. These biases, which can be defined as useless or detrimental information for the task in question, can be of different kinds: one can, for example, find biases in the writing styles used, but also much more problematic biases relating to the sex or ethnic origin of individuals. These biases can come from different sources, such as annotators who created the databases, or from the annotation process itself. My thesis deals with the study of these biases and, in particular, is organized around the mitigation of the effects of biases on the training of Natural Language Processing (NLP) models. In particular, I have worked a lot with pre-trained models such as BERT, RoBERTa or UnifiedQA which have become essential in recent years in all areas of NLP and which, despite their extensive pre-training, are very sensitive to these bias problems.My thesis is organized in three parts, each presenting a different way of managing the biases present in the data. The first part presents a method allowing to use the biases present in an automatic summary database in order to increase the variability and the controllability of the generated summaries. Then, in the second part, I am interested in the automatic generation of a training dataset for the multiple-choice question-answering task. The advantage of such a generation method is that it makes it possible not to call on annotators and therefore to eliminate the biases coming from them in the data. Finally, I am interested in training a multitasking model for optical text recognition. I show in this last part that it is possible to increase the performance of our models by using different types of data (handwritten and typed) during their training

APA, Harvard, Vancouver, ISO, and other styles

49

Sharma, Anand. "Devanagari Online Handwritten Character Recognition." Thesis, 2019. https://etd.iisc.ac.in/handle/2005/4633.

Full text

Abstract:

In this thesis, a classifier based on local sub-unit level and global character level representations of a character, using stroke direction and order variations independent features, is developed for recognition of Devanagari online handwritten characters. It is shown that online character corresponding to Devanagari ideal character can be analyzed and uniquely represented in terms of homogeneous sub-structures called the sub-units. These sub-units can be extracted using direction property of online strokes in an ideal character. A method for extraction of sub-units from a handwritten character is developed, such that the extracted sub-units are similar to the sub-units of the corresponding ideal character. Features are developed that are independent of variations in order and direction of strokes in characters. The features are called histograms of points, orientations, and dynamics of orientations (HPOD) features. The method for extraction of these features spatially maps co-ordinates of points and orientations and dynamics of orientations of strokes at these points. Histograms of these mapped features are computed in di erent regions into which the spatial map is divided. HPOD features extracted from the sub-units represent the character locally; and those extracted from the character as a whole represent it globally. A classifier is developed that models handwritten characters in terms of the joint distribution of the local and global HPOD features of the characters and the number of sub-units in the characters. The classifier uses latent variables to model the structure of the the sub-units. The parameters of the model are estimated using the maximum likelihood method. The use of HPOD features and the assumption of independent generation of the sub-units given the number of sub-units, make the classifier independent of variations in the direction and order of strokes in characters. This sub-unit based classifier is called SUB classifier. Datasets for training and testing the classifiers consist of handwritten samples of Devanagari vowels, consonants, half consonants, nasalization sign, vowel omission sign, vowel signs, consonant with vowel sign, conjuncts, consonant clusters, and three more short strokes with di erent shapes. In all, there are 96 di erent characters or symbols that have been considered for recognition. The average number of samples per character class in the training and the test sets are, respectively, 133 and 29. The smallest and the largest dimensions of the extracted feature vectors are, respectively, 258 and 786. Since the size of the training set per class is not large compared to the dimension of the extracted feature vectors, the training set is small from the perspective of training any classifier. classifiers that can be trained on a small data set are considered for performance comparison with the developed classifier. Second order statistics (SOS), sub-space (SS), Fisher discriminant (FD), feedforward neural network (FNN), and support vector machines (SVM) are the other classifiers considered that are trained with the other features like spatio-temporal (ST), discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete wavelet transform (DWT), spatial (SP), and histograms of oriented gradients (HOG) features extracted from the samples of the training set. These classifiers are tested with these features extracted from the samples of the test set. SVM classifier trained with DFT features has the highest accuracy of 90.2% among the accuracies of the other classifiers trained with the other features extracted from the test set. The accuracy of SUB classifier trained with HPOD features is 92.9% on the test set which is the highest among the accuracies of all the classifiers. The accuracies of the classifiers SOS, SS, FD, FNN, and SVM increase when trained with HPOD features. The accuracy of SVM classifier trained with HPOD features is 92.9%, which is the highest among the accuracies of the other classifiers trained with HPOD features. SUB classifier using HPOD features has the highest accuracy among the considered classifiers trained with the considered features on the same training set and tested on the same test set. The better character discriminative capability of the designed HPOD features is re ected by the increase in the accuracies of the other classifiers when trained with these features

APA, Harvard, Vancouver, ISO, and other styles

50

Das, Dibyasundar. "Learning Centric Feature Extraction and Classification Models for OCR." Thesis, 2021. http://ethesis.nitrkl.ac.in/10253/1/2021_PhD_DDas_514CS1014_Learning.pdf.

Full text

Abstract:

Character recognition is the process of enabling computers to classify the characters from their image presentation. The practical application involves bankcheck reading, book reading for the blind, postal address reading, and many more. With emerging trends in technology, mobile devices have been equipped with high definition cameras and powerful processors. This pattern enlarges the scope of mobile applications to read business cards, street signboards, medical prescriptions, and local language translation for travellers, etc. The essential step in all these applications is the recognition of characters. Traditionally the application uses handcrafted features but in recent years, the nonhandcrafted feature extraction methods have gained increasing popularity for solving pattern classification tasks due to their inherent ability to extract robust features and handle outliers. This dissertation focuses on the design of nonhandcrafted learning models for character recognition application. The research primarily involves the proposition of hyperparameterless learning models that can be used for the image classification task. The dissertation proposed three newly developed learning methodology for character classification from raw images. The first contribution overcomes limitations of classification over Single Layer Feed Forward Network. The network is a widely validated model for the classification task. The limitations of such model are due to the need for hyperparameter tuning. Extreme Learning Machine was developed as a hyperparameterless to overcome this limitation. However, the random input weight in Extreme Learning Machine makes it suffer from the illposed problem. Hence, a hyperparameterless algorithm namely BackwardForward Extreme Learning Machine (BFELM) is developed that learns the input and output weights in one backward and one forward pass respectively. The second contribution extended the BFELM framework to learn the weights of the convolutional neural network. The newly developed model is named Convolutional Network with Backward Forward Extreme Learning Machine (CNBFELM). Indepth analysis of the proposed model over the various publicly available dataset, prove its efficiency in hyperparameterless nonhandcrafted learning. The third contribution involves the development of feature learning by optimization. First, feature learning is modeled as an optimization problem that does not depend on classification error or accuracy. It is worth mentioning that the optimization model can generalize even with a smaller number of the training sample. The final contribution is about designing the convolutional neural network model for handwritten word recognition in Indic language. The CMATERdb2.1.2 dataset is used to study the proposed model in depth.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!