To see the other types of publications on this topic, follow the link: Text detection and recognition.

Dissertations / Theses on the topic 'Text detection and recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Text detection and recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Brifkany, Jan, and Yasini Anass El. "Text Recognition in Natural Images : A study in Text Detection." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-282935.

Full text
Abstract:
In recent years, a surge in computer vision methods and solutions has been developed to solve the computer vision problem. By combining different methods from different areas of computer vision, computer scientists have been able to develop more advanced and sophisticated models to solve these problems. This report will cover two categories, text detection and text recognition. These areas will be defined, described, and analyzed in the result and discussion chapter. This report will cover an exciting and challenging topic, text recognition in natural images. It set out to assess the improvement of OCR accuracy after three image segmentation methods have been applied to images. The methods used are Maximally stable extremal regions and geometric filtering based on geometric properties. The result showed that the accuracy of OCR with segmentation methods had an overall better accuracy when compared to OCR without segmentation methods. Also, it was shown that images with horizontal text orientation had better accuracy when applying OCR with segmentation methods compared to images with multi-oriented text orientation.
Under de senaste åren har en ökning av datorseende metoder och lösningar utvecklats för att lösa datorseende problemet. Genom att kombinera olika metoder från olika områden av datorseende har datavetare kunnat utveckla mer avancerade och komplexa modeller för att lösa dessa problem. Denna rapport kommer att omfatta två kategorier, textidentifiering och textigenkänning. Dessa områden kommer att definieras, beskrivas och analyseras i resultat- och diskussionskapitlet. Denna rapport kommer att omfatta ett mycket intressant och utmanande ämne, textigenkänning i naturliga bilder. Rapporten syftar till att bedöma förbättringen av OCR-resultatet efter det att tre bildsegmenteringsmetoder har tillämpats på bilder. Metoderna som har använts är ” Maximally stable extremal regions” och geometrisk filtrering baserad på geometriska egenskaper. Resultatet visade att hos OCR med segmenteringsmetoder hade en övergripande bättre resultat jämfört med OCR utan segmenteringsmetoder. Det visades också att bilder med horisontell textorientering hade bättre noggrannhet vid tillämpning av OCR med segmenteringsmetoder jämfört med bilder med flerorienterad textorientering.
APA, Harvard, Vancouver, ISO, and other styles
2

Khiari, El Hebri. "Text Detection and Recognition in the Automotive Context." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32458.

Full text
Abstract:
This thesis achieved the goal of obtaining high accuracy rates (precision and recall) in a real-time system that detects and recognizes text in the automotive context. For the sake of simplicity, this work targets two Objects of Interest (OOIs): North American (NA) traffic boards (TBs) and license plates (LPs). The proposed approach adopts a hybrid detection module consisting of a Connected Component Analysis (CCA) step followed by a Texture Analysis (TA) step. An initial set of candidates is extracted by highlighting the Maximally Stable Extremal Regions (MSERs). Each sebsequent step in the CCA and TA steps attempts to reduce the size of the set by filtering out false positives and retaining the true positives. The final set of candidates is fed into a recognition stage that integrates an open source Optical Character Reader (OCR) into the framework by using two additional steps that serve the purpose of minimizing false readings as well as the incurred delays. A set of of manually taken videos from various regions of Ottawa were used to evaluate the performance of the system, using precision, recall and latency as metrics. The high precision and recall values reflect the proposed approach's ability in removing false positives and retaining the true positives, respectively, while the low latency values deem it suitable for the automotive context. Moreover, the ability to detect two OOIs of varying appearances demonstrates the flexibility that is featured by the hybrid detection module.
APA, Harvard, Vancouver, ISO, and other styles
3

Yousfi, Sonia. "Embedded Arabic text detection and recognition in videos." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI069/document.

Full text
Abstract:
Cette thèse s'intéresse à la détection et la reconnaissance du texte arabe incrusté dans les vidéos. Dans ce contexte, nous proposons différents prototypes de détection et d'OCR vidéo (Optical Character Recognition) qui sont robustes à la complexité du texte arabe (différentes échelles, tailles, polices, etc.) ainsi qu'aux différents défis liés à l'environnement vidéo et aux conditions d'acquisitions (variabilité du fond, luminosité, contraste, faible résolution, etc.). Nous introduisons différents détecteurs de texte arabe qui se basent sur l'apprentissage artificiel sans aucun prétraitement. Les détecteurs se basent sur des Réseaux de Neurones à Convolution (ConvNet) ainsi que sur des schémas de boosting pour apprendre la sélection des caractéristiques textuelles manuellement conçus. Quant à notre méthodologie d'OCR, elle se passe de la segmentation en traitant chaque image de texte en tant que séquence de caractéristiques grâce à un processus de scanning. Contrairement aux méthodes existantes qui se basent sur des caractéristiques manuellement conçues, nous proposons des représentations pertinentes apprises automatiquement à partir des données. Nous utilisons différents modèles d'apprentissage profond, regroupant des Auto-Encodeurs, des ConvNets et un modèle d'apprentissage non-supervisé, qui génèrent automatiquement ces caractéristiques. Chaque modèle résulte en un système d'OCR bien spécifique. Le processus de reconnaissance se base sur une approche connexionniste récurrente pour l'apprentissage de l'étiquetage des séquences de caractéristiques sans aucune segmentation préalable. Nos modèles d'OCR proposés sont comparés à d'autres modèles qui se basent sur des caractéristiques manuellement conçues. Nous proposons, en outre, d'intégrer des modèles de langage (LM) arabes afin d'améliorer les résultats de reconnaissance. Nous introduisons différents LMs à base des Réseaux de Neurones Récurrents capables d'apprendre des longues interdépendances linguistiques. Nous proposons un schéma de décodage conjoint qui intègre les inférences du LM en parallèle avec celles de l'OCR tout en introduisant un ensemble d’hyper-paramètres afin d'améliorer la reconnaissance et réduire le temps de réponse. Afin de surpasser le manque de corpus textuels arabes issus de contenus multimédia, nous mettons au point de nouveaux corpus manuellement annotés à partir des flux TV arabes. Le corpus conçu pour l'OCR, nommé ALIF et composée de 6,532 images de texte annotées, a été publié a des fins de recherche. Nos systèmes ont été développés et évalués sur ces corpus. L’étude des résultats a permis de valider nos approches et de montrer leurs efficacité et généricité avec plus de 97% en taux de détection, 88.63% en taux de reconnaissance mots sur le corpus ALIF dépassant ainsi un des systèmes d'OCR commerciaux les mieux connus par 36 points
This thesis focuses on Arabic embedded text detection and recognition in videos. Different approaches robust to Arabic text variability (fonts, scales, sizes, etc.) as well as to environmental and acquisition condition challenges (contrasts, degradation, complex background, etc.) are proposed. We introduce different machine learning-based solutions for robust text detection without relying on any pre-processing. The first method is based on Convolutional Neural Networks (ConvNet) while the others use a specific boosting cascade to select relevant hand-crafted text features. For the text recognition, our methodology is segmentation-free. Text images are transformed into sequences of features using a multi-scale scanning scheme. Standing out from the dominant methodology of hand-crafted features, we propose to learn relevant text representations from data using different deep learning methods, namely Deep Auto-Encoders, ConvNets and unsupervised learning models. Each one leads to a specific OCR (Optical Character Recognition) solution. Sequence labeling is performed without any prior segmentation using a recurrent connectionist learning model. Proposed solutions are compared to other methods based on non-connectionist and hand-crafted features. In addition, we propose to enhance the recognition results using Recurrent Neural Network-based language models that are able to capture long-range linguistic dependencies. Both OCR and language model probabilities are incorporated in a joint decoding scheme where additional hyper-parameters are introduced to boost recognition results and reduce the response time. Given the lack of public multimedia Arabic datasets, we propose novel annotated datasets issued from Arabic videos. The OCR dataset, called ALIF, is publicly available for research purposes. As the best of our knowledge, it is first public dataset dedicated for Arabic video OCR. Our proposed solutions were extensively evaluated. Obtained results highlight the genericity and the efficiency of our approaches, reaching a word recognition rate of 88.63% on the ALIF dataset and outperforming well-known commercial OCR engine by more than 36%
APA, Harvard, Vancouver, ISO, and other styles
4

Olsson, Oskar, and Moa Eriksson. "Automated system tests with image recognition : focused on text detection and recognition." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-160249.

Full text
Abstract:
Today’s airplanes and modern cars are equipped with displays to communicate important information to the pilot or driver. These displays needs to be tested for safety reasons; displays that fail can be a huge safety risk and lead to catastrophic events. Today displays are tested by checking the output signals or with the help of a person who validates the physical display manually. However this technique is very inefficient and can lead to important errors being unnoticed. MindRoad AB is searching for a solution where validation of the display is made from a camera pointed at it, text and numbers will then be recognized using a computer vision algorithm and validated in a time efficient and accurate way. This thesis compares the three different text detection algorithms, EAST, SWT and Tesseract to determine the most suitable for continued work. The chosen algorithm is then optimized and the possibility to develop a program which meets MindRoad ABs expectations is investigated. As a result several algorithms were combined to a fully working program to detect and recognize text in industrial displays.
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Datong. "Text detection and recognition in images and video sequences /." [S.l.] : [s.n.], 2003. http://library.epfl.ch/theses/?display=detail&nr=2863.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Mešár, Marek. "Svět kolem nás jako hyperlink." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236204.

Full text
Abstract:
Document describes selected techniques and approaches to problem of text detection, extraction and recognition on modern mobile devices. It also describes their proper presentation to the user interface and their conversion to hyperlinks as a source of information about surrounding world. The paper outlines text detection and recognition technique based on MSER detection and also describes the use of image features tracking method for text motion estimation.
APA, Harvard, Vancouver, ISO, and other styles
7

Fraz, Muhammad. "Video content analysis for intelligent forensics." Thesis, Loughborough University, 2014. https://dspace.lboro.ac.uk/2134/18065.

Full text
Abstract:
The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild.
APA, Harvard, Vancouver, ISO, and other styles
8

Wigington, Curtis Michael. "End-to-End Full-Page Handwriting Recognition." BYU ScholarsArchive, 2018. https://scholarsarchive.byu.edu/etd/7099.

Full text
Abstract:
Despite decades of research, offline handwriting recognition (HWR) of historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives. Historical documents are plagued with noise, degradation, ink bleed-through, overlapping strokes, variation in slope and slant of the writing, and inconsistent layouts. Often the documents in a collection have been written by thousands of authors, all of whom have significantly different writing styles. In order to better capture the variations in writing styles we introduce a novel data augmentation technique. This methods achieves state-of-the-art results on modern datasets written in English and French and a historical dataset written in German.HWR models are often limited by the accuracy of the preceding steps of text detection and segmentation.Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations.Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of handwriting lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) handwriting into dewarped images, and a CNN-LSTM network to read the characters. SFR exceeds the performance of the winner of the ICDAR2017 handwriting recognition competition, even when not using the provided competition region annotations.
APA, Harvard, Vancouver, ISO, and other styles
9

Jaderberg, Maxwell. "Deep learning for text spotting." Thesis, University of Oxford, 2015. http://ora.ox.ac.uk/objects/uuid:e893c11e-6b6b-4d11-bb25-846bcef9b13e.

Full text
Abstract:
This thesis addresses the problem of text spotting - being able to automatically detect and recognise text in natural images. Developing text spotting systems, systems capable of reading and therefore better interpreting the visual world, is a challenging but wildly useful task to solve. We approach this problem by drawing on the successful developments in machine learning, in particular deep learning and neural networks, to present advancements using these data-driven methods. Deep learning based models, consisting of millions of trainable parameters, require a lot of data to train effectively. To meet the requirements of these data hungry algorithms, we present two methods of automatically generating extra training data without any additional human interaction. The first crawls a photo sharing website and uses a weakly-supervised existing text spotting system to harvest new data. The second is a synthetic data generation engine, capable of generating unlimited amounts of realistic looking text images, that can be solely relied upon for training text recognition models. While we define these new datasets, all our methods are also evaluated on standard public benchmark datasets. We develop two approaches to text spotting: character-centric and word-centric. In the character-centric approach, multiple character classifier models are developed, reinforcing each other through a feature sharing framework. These character models are used to generate text saliency maps to drive detection, and convolved with detection regions to enable text recognition, producing an end-to-end system with state-of-the-art performance. For the second, higher-level, word-centric approach to text spotting, weak detection models are constructed to find potential instances of words in images, which are subsequently refined and adjusted with a classifier and deep coordinate regressor. A whole word image recognition model recognises words from a huge dictionary of 90k words using classification, resulting in previously unattainable levels of accuracy. The resulting end-to-end text spotting pipeline advances the state of the art significantly and is applied to large scale video search. While dictionary based text recognition is useful and powerful, the need for unconstrained text recognition still prevails. We develop a two-part model for text recognition, with the complementary parts combined in a graphical model and trained using a structured output learning framework adapted to deep learning. The trained recognition model is capable of accurately recognising unseen and completely random text. Finally, we make a general contribution to improve the efficiency of convolutional neural networks. Our low-rank approximation schemes can be utilised to greatly reduce the number of computations required for inference. These are applied to various existing models, resulting in real-world speedups with negligible loss in predictive power.
APA, Harvard, Vancouver, ISO, and other styles
10

Lu, Hsin-Min. "SURVEILLANCE IN THE INFORMATION AGE: TEXT QUANTIFICATION, ANOMALY DETECTION, AND EMPIRICAL EVALUATION." Diss., The University of Arizona, 2010. http://hdl.handle.net/10150/193893.

Full text
Abstract:
Deep penetration of personal computers, data communication networks, and the Internet has created a massive platform for data collection, dissemination, storage, and retrieval. Large amounts of textual data are now available at a very low cost. Valuable information, such as consumer preferences, new product developments, trends, and opportunities, can be found in this large collection of textual data. Growing worldwide competition, new technology development, and the Internet contribute to an increasingly turbulent business environment. Conducting surveillance on this growing collection of textual data could help a business avoid surprises, identify threats and opportunities, and gain competitive advantages.Current text mining approaches, nonetheless, provide limited support for conducting surveillance using textual data. In this dissertation, I develop novel text quantification approaches to identify useful information in textual data, effective anomaly detection approaches to monitor time series data aggregated based on the text quantification approaches, and empirical evaluation approaches that verify the effectiveness of text mining approaches using external numerical data sources.In Chapter 2, I present free-text chief complaint classification studies that aim to classify incoming emergency department free-text chief complaints into syndromic categories, a higher level of representation that facilitates syndromic surveillance. Chapter 3 presents a novel detection algorithm based on Markov switching with jumps models. This surveillance model aims at detecting different types of disease outbreaks based on the time series generated from the chief complaint classification system.In Chapters 4 and 5, I studied the surveillance issue under the context of business decision making. Chapter 4 presents a novel text-based risk recognition design framework that can be used to monitor the changing business environment. Chapter 5 presents an empirical evaluation study that looks at the interaction between news sentiment and numerical accounting earnings information. Chapter 6 concludes this dissertation by highlighting major research contributions and the relevance to MIS research.
APA, Harvard, Vancouver, ISO, and other styles
11

Zhu, Winstead Xingran. "Hotspot Detection for Automatic Podcast Trailer Generation." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444887.

Full text
Abstract:
With podcasts being a fast growing audio-only form of media, an effective way of promoting different podcast shows becomes more and more vital to all the stakeholders concerned, including the podcast creators, the podcast streaming platforms, and the podcast listeners. This thesis investigates the relatively little studied topic of automatic podcast trailer generation, with the purpose of en- hancing the overall visibility and publicity of different podcast contents and gen- erating more user engagement in podcast listening. This thesis takes a hotspot- based approach, by specifically defining the vague concept of “hotspot” and designing different appropriate methods for hotspot detection. Different meth- ods are analyzed and compared, and the best methods are selected. The selected methods are then used to construct an automatic podcast trailer generation sys- tem, which consists of four major components and one schema to coordinate the components. The system can take a random podcast episode audio as input and generate an around 1 minute long trailer for it. This thesis also proposes two human-based podcast trailer evaluation approaches, and the evaluation results show that the proposed system outperforms the baseline with a large margin and achieves promising results in terms of both aesthetics and functionality.
APA, Harvard, Vancouver, ISO, and other styles
12

Minetto, Rodrigo 1983. "Reconhecimento de texto e rastreamento de objetos 2D/3D." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275708.

Full text
Abstract:
Orientadores: Jorge Stolfi, Neucimar Jerônimo Leite
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-20T03:12:07Z (GMT). No. of bitstreams: 1 Minetto_Rodrigo_D.pdf: 35894128 bytes, checksum: 8a0e453fba7e6a9a02fb17a52fdbf878 (MD5) Previous issue date: 2012
Resumo: Nesta tese abordamos três problemas de visão computacional: (1) detecção e reconhecimento de objetos de texto planos em imagens de cenas reais; (2) rastreamento destes objetos de texto em vídeos digitais; e (3) o rastreamento de um objeto tridimensional rígido arbitrário com marcas conhecidas em um vídeo digital. Nós desenvolvemos, para cada um dos problemas, algoritmos inovadores, que são pelo menos tão precisos e robustos quanto outros algoritmos estado-da-arte. Especificamente, para reconhecimento de texto nós desenvolvemos (e validamos extensivamente) um novo descritor de imagem baseado em HOG especializado para escrita romana, que denominamos T-HOG, e mostramos sua contribuição como um filtro em um detector de texto (SNOOPERTEXT). Nós também melhoramos o algoritmo SNOOPERTEXT através do uso da técnica multiescala para tratar caracteres de tamanhos bastante variados e limitar a sensibilidade do algoritmo a vários artefatos. Para rastreamento de texto, nós descrevemos quatro estratégias básicas para combinar a detecção e o rastreamento de texto, e desenvolvemos também um rastreador específico baseado em filtro de partículas que explora o uso do reconhecedor T-HOG. Para o rastreamento de objetos rígidos, nós desenvolvemos um novo algoritmo preciso e robusto (AFFTRACK) que combina rastreamento de características por KLT com uma calibração de câmera melhorada. Nós testamos extensivamente nossos algoritmos com diversas bases de dados descritas na literatura. Nós também desenvolvemos algumas bases de dados (publicamente disponíveis) para a validação de algoritmos de detecção e rastreamento de texto e de rastreamento de objetos rígidos em vídeos
Abstract: In this thesis we address three computer vision problems: (1) the detection and recognition of flat text objects in images of real scenes; (2) the tracking of such text objects in a digital video; and (3) the tracking an arbitrary three-dimensional rigid object with known markings in a digital video. For each problem we developed innovative algorithms, which are at least as accurate and robust as other state-of-the-art algorithms. Specifically, for text classification we developed (and extensively evaluated) a new HOG-based descriptor specialized for Roman script, which we call T-HOG, and showed its value as a post-filter for an existing text detector (SNOOPERTEXT). We also improved the SNOOPERTEXT algorithm by using the multi-scale technique to handle widely different letter sizes while limiting the sensitivity of the algorithm to various artifacts. For text tracking, we describe four basic ways of combining a text detector and a text tracker, and we developed a specific tracker based on a particle-filter which exploits the T-HOG recognizer. For rigid object tracking we developed a new accurate and robust algorithm (AFFTRACK) that combines the KLT feature tracker with an improved camera calibration procedure. We extensively tested our algorithms on several benchmarks well-known in the literature. We also created benchmarks (publicly available) for the evaluation of text detection and tracking and rigid object tracking algorithms
Doutorado
Ciência da Computação
Doutor em Ciência da Computação
APA, Harvard, Vancouver, ISO, and other styles
13

Day, Adam C. "Designing a face detection CAPTCHA." Morgantown, W. Va. : [West Virginia University Libraries], 2010. http://hdl.handle.net/10450/11036.

Full text
Abstract:
Thesis (M.S.)--West Virginia University, 2010.
Title from document title page. Document formatted into pages; contains viii, 80 p. : ill. Includes abstract. Includes bibliographical references (p. 78-80).
APA, Harvard, Vancouver, ISO, and other styles
14

Moysset, Bastien. "Détection, localisation et typage de texte dans des images de documents hétérogènes par Réseaux de Neurones Profonds." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEI044/document.

Full text
Abstract:
Lire automatiquement le texte présent dans les documents permet de rendre accessible les informations qu'ils contiennent. Pour réaliser la transcription de pages complètes, la localisation des lignes de texte est une étape cruciale. Les méthodes traditionnelles de détection de lignes, basées sur des approches de traitement d'images, peinent à généraliser à des jeux de données hétérogènes. Pour cela, nous proposons dans cette thèse une approche par réseaux de neurones profonds. Nous avons d'abord proposé une approche de segmentation mono-dimensionnelle des paragraphes de texte en lignes à l'aide d'une technique inspirée des modèles de reconnaissance, où une classification temporelle connexionniste (CTC) est utilisée pour aligner implicitement les séquences. Ensuite, nous proposons un réseau qui prédit directement les coordonnées des boîtes englobant les lignes de texte. L'ajout d'un terme de confiance à ces boîtes hypothèses permet de localiser un nombre variable d'objets. Nous proposons une prédiction locale des objets afin de partager les paramètres entre les localisations et, ainsi, de multiplier les exemples d'objets vus par chaque prédicteur de boîte lors de l'entraînement. Cela permet de compenser la taille restreinte des jeux de données utilisés. Pour récupérer les informations contextuelles permettant de prendre en compte la structure du document, nous ajoutons, entre les couches convolutionnelles, des couches récurrentes LSTM multi-dimensionnelles. Nous proposons trois stratégies de reconnaissance pleine page qui permettent de tenir compte du besoin important de précision au niveau des positions et nous montrons, sur la base hétérogène Maurdor, la performance de notre approche pour des documents multilingues pouvant être manuscrits et imprimés. Nous nous comparons favorablement à des méthodes issues de l'état de l'art. La visualisation des concepts appris par nos neurones permet de souligner la capacité des couches récurrentes à apporter l'information contextuelle
Being able to automatically read the texts written in documents, both printed and handwritten, makes it possible to access the information they convey. In order to realize full page text transcription, the detection and localization of the text lines is a crucial step. Traditional methods tend to use image processing based approaches, but they hardly generalize to very heterogeneous datasets. In this thesis, we propose to use a deep neural network based approach. We first propose a mono-dimensional segmentation of text paragraphs into lines that uses a technique inspired by the text recognition models. The connexionist temporal classification (CTC) method is used to implicitly align the sequences. Then, we propose a neural network that directly predicts the coordinates of the boxes bounding the text lines. Adding a confidence prediction to these hypothesis boxes enables to locate a varying number of objects. We propose to predict the objects locally in order to share the network parameters between the locations and to increase the number of different objects that each single box predictor sees during training. This compensates the rather small size of the available datasets. In order to recover the contextual information that carries knowledge on the document layout, we add multi-dimensional LSTM recurrent layers between the convolutional layers of our networks. We propose three full page text recognition strategies that tackle the need of high preciseness of the text line position predictions. We show on the heterogeneous Maurdor dataset how our methods perform on documents that can be printed or handwritten, in French, English or Arabic and we favourably compare to other state of the art methods. Visualizing the concepts learned by our neurons enables to underline the ability of the recurrent layers to convey the contextual information
APA, Harvard, Vancouver, ISO, and other styles
15

Karagol, Yusuf. "Event Ordering In Turkish Texts." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12612623/index.pdf.

Full text
Abstract:
In this thesis, we present an event orderer application that works on Turkish texts. Events are words denoting an occurrence or happenings in natural language texts. By using the features of the events in a sentence or by the helps of temporal expressions in the sentence, anchoring an event on a timeline or ordering events between other events are called event ordering. The application presented in this thesis, is one of the earliest study in this domain with Turkish and it realizes all needed sub modules for event ordering. It realizes event recognition in Turkish texts and event feature detection in Turkish texts. In addition to this, the application is realizing temporal expression recognition and temporal signal recognition tasks.
APA, Harvard, Vancouver, ISO, and other styles
16

Westberg, Michael. "Time of Flight Based Teat Detection." Thesis, Linköping University, Department of Electrical Engineering, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-19292.

Full text
Abstract:

Time of flight is an imaging technique with uses depth information to capture 3D information in a scene. Recent developments in the technology have made ToF cameras more widely available and practical to work with. The cameras now enable real time 3D imaging and positioning in a compact unit, making the technology suitable for variety of object recognition tasks

An object recognition system for locating teats is at the center of the DeLaval VMS, which is a fully automated system for milking cows. By implementing ToF technology as part of the visual detection procedure, it would be possible to locate and track all four teat’s positions in real time and potentially provide an improvement compared with the current system.

The developed algorithm for teat detection is able to locate teat shaped objects in scenes and extract information of their position, width and orientation. These parameters are determined with an accuracy of millimeters. The algorithm also shows promising results when tested on real cows. Although detecting many false positives the algorithm was able to correctly detected 171 out of 232 visible teats in a test set of real cow images. This result is a satisfying proof of concept and shows the potential of ToF technology in the field of automated milking.

APA, Harvard, Vancouver, ISO, and other styles
17

Raymondi, Luis Guillermo Antezana, Fabricio Eduardo Aguirre Guzman, Jimmy Armas-Aguirre, and Paola Agonzalez. "Technological solution for the identification and reduction of stress level using wearables." IEEE Computer Society, 2020. http://hdl.handle.net/10757/656578.

Full text
Abstract:
El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado.
In this article, a technological solution is proposed to identify and reduce the level of mental stress of a person through a wearable device. The proposal identifies a physiological variable: Heart rate, through the integration between a wearable and a mobile application through text recognition using the back camera of a smartphone. As part of the process, the technological solution shows a list of guidelines depending on the level of stress obtained in a given time. Once completed, it can be measured again in order to confirm the evolution of your stress level. This proposal allows the patient to keep his stress level under control in an effective and accessible way in real time. The proposal consists of four phases: 1. Collection of parameters through the wearable; 2. Data reception by the mobile application; 3. Data storage in a cloud environment and 4. Data collection and processing; this last phase is divided into 4 sub-phases: 4.1. Stress level analysis, 4.2. Recommendations to decrease the level obtained, 4.3. Comparison between measurements and 4.4. Measurement history per day. The proposal was validated in a workplace with people from 20 to 35 years old located in Lima, Peru. Preliminary results showed that 80% of patients managed to reduce their stress level with the proposed solution.
Revisión por pares
APA, Harvard, Vancouver, ISO, and other styles
18

Packer, Thomas L. "Scalable Detection and Extraction of Data in Lists in OCRed Text for Ontology Population Using Semi-Supervised and Unsupervised Active Wrapper Induction." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4258.

Full text
Abstract:
Lists of records in machine-printed documents contain much useful information. As one example, the thousands of family history books scanned, OCRed, and placed on-line by FamilySearch.org probably contain hundreds of millions of fact assertions about people, places, family relationships, and life events. Data like this cannot be fully utilized until a person or process locates the data in the document text, extracts it, and structures it with respect to an ontology or database schema. Yet, in the family history industry and other industries, data in lists goes largely unused because no known approach adequately addresses all of the costs, challenges, and requirements of a complete end-to-end solution to this task. The diverse information is costly to extract because many kinds of lists appear even within a single document, differing from each other in both structure and content. The lists' records and component data fields are usually not set apart explicitly from the rest of the text, especially in a corpus of OCRed historical documents. OCR errors and the lack of document structure (e.g. HMTL tags) make list content hard to recognize by a software tool developed without a substantial amount of highly specialized, hand-coded knowledge or machine learning supervision. Making an approach that is not only accurate but also sufficiently scalable in terms of time and space complexity to process a large corpus efficiently is especially challenging. In this dissertation, we introduce a novel family of scalable approaches to list discovery and ontology population. Its contributions include the following. We introduce the first general-purpose methods of which we are aware for both list detection and wrapper induction for lists in OCRed or other plain text. We formally outline a mapping between in-line labeled text and populated ontologies, effectively reducing the ontology population problem to a sequence labeling problem, opening the door to applying sequence labelers and other common text tools to the goal of populating a richly structured ontology from text. We provide a novel admissible heuristic for inducing regular expression wrappers using an A* search. We introduce two ways of modeling list-structured text with a hidden Markov model. We present two query strategies for active learning in a list-wrapper induction setting. Our primary contributions are two complete and scalable wrapper-induction-based solutions to the end-to-end challenge of finding lists, extracting data, and populating an ontology. The first has linear time and space complexity and extracts highly accurate information at a low cost in terms of user involvement. The second has time and space complexity that are linear in the size of the input text and quadratic in the length of an output record and achieves higher F1-measures for extracted information as a function of supervision cost. We measure the performance of each of these approaches and show that they perform better than strong baselines, including variations of our own approaches and a conditional random field-based approach.
APA, Harvard, Vancouver, ISO, and other styles
19

Oscanoa1, Julio, Marcelo Mena, and Guillermo Kemper. "A Detection Method of Ectocervical Cell Nuclei for Pap test Images, Based on Adaptive Thresholds and Local Derivatives." Science and Engineering Research Support Society, 2015. http://hdl.handle.net/10757/624843.

Full text
Abstract:
Cervical cancer is one of the main causes of death by disease worldwide. In Peru, it holds the first place in frequency and represents 8% of deaths caused by sickness. To detect the disease in the early stages, one of the most used screening tests is the cervix Papanicolaou test. Currently, digital images are increasingly being used to improve Pap test efficiency. This work develops an algorithm based on adaptive thresholds, which will be used in Pap smear assisted quality control software. The first stage of the method is a pre-processing step, in which noise and background removal is done. Next, a block is segmented for each one of the points selected as not background, and a local threshold per block is calculated to search for cell nuclei. If a nucleus is detected, an artifact rejection follows, where only cell nuclei and inflammatory cells are left for the doctors to interpret. The method was validated with a set of 55 images containing 2317 cells. The algorithm successfully recognized 92.3% of the total nuclei in all images collected.
Revisón por pares
APA, Harvard, Vancouver, ISO, and other styles
20

Karvir, Hrishikesh. "Design and Validation of a Sensor Integration and Feature Fusion Test-Bed for Image-Based Pattern Recognition Applications." Wright State University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=wright1291753291.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Akra, Mohamad A. (Mohamad Ahmad). "Automated text recognition." Thesis, Massachusetts Institute of Technology, 1993. http://hdl.handle.net/1721.1/11109.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1993.
Includes bibliographical references (leaves 92-96).
by Mohamad A. Akra.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
22

Wachenfeld, Steffen. "Recognition of screen-rendered text /." Münster, 2009. http://opac.nebis.ch/cgi-bin/showAbstract.pl?sys=000252284.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Ben-Haim, Nadav. "Task specific image text recognition." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2008. http://wwwlib.umi.com/cr/ucsd/fullcit?p1450595.

Full text
Abstract:
Thesis (M.S.)--University of California, San Diego, 2008.
Title from first page of PDF file (viewed June 16, 2008). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 37-39).
APA, Harvard, Vancouver, ISO, and other styles
24

Goraine, Habib. "Machine recognition of Arabic text." Thesis, University of Reading, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.278135.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Savkov, Aleksandar Dimitrov. "Deciphering clinical text : concept recognition in primary care text notes." Thesis, University of Sussex, 2017. http://sro.sussex.ac.uk/id/eprint/68232/.

Full text
Abstract:
Electronic patient records, containing data about the health and care of a patient, are a valuable source of information for longitudinal clinical studies. The General Practice Research Database (GPRD) has collected patient records from UK primary care practices since the late 1980s. These records contain both structured data (in the form of codes and numeric values) and free text notes. While the structured data have been used extensively in clinical studies, there are significant practical obstacles in extracting information from the free text notes. The main obstacles are data access restrictions, due to the presence of sensitive information, and the specific language of medical practitioners, which renders standard language processing tools ineffective. The aim of this research is to investigate approaches for computer analysis of free text notes. The research involved designing a primary care text corpus (the Harvey Corpus) annotated with syntactic chunks and clinically-relevant semantic entities, developing a statistical chunking model, and devising a novel method for applying machine learning for entity recognition based on chunk annotation. The tools produced would facilitate reliable information extraction from primary care patient records, needed for the development of clinically-related research. The three medical concept types targeted in this thesis could contribute to epidemiological studies by enhancing the detection of co-morbidities, and better analysing the descriptions of patient experiences and treatments. The main contributions of the research reported in this thesis are: guidelines for chunk and concept annotation of clinical text, an approach to maximising agreement between human annotators, the Harvey Corpus, a method for using a standard part-of-speech tagging model in clinical text chunking, and a novel approach to recognising clinically relevant medical concepts.
APA, Harvard, Vancouver, ISO, and other styles
26

Giménez, Pastor Adrián. "Bernoulli HMMs for Handwritten Text Recognition." Doctoral thesis, Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/37978.

Full text
Abstract:
In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using In last years Hidden Markov Models (HMMs) have received significant attention in the task off-line handwritten text recognition (HTR). As in automatic speech recognition (ASR), HMMs are used to model the probability of an observation sequence, given its corresponding text transcription. However, in contrast to what happens in ASR, in HTR there is no standard set of local features being used by most of the proposed systems. In this thesis we propose the use of raw binary pixels as features, in conjunction with models that deal more directly with the binary data. In particular, we propose the use of Bernoulli HMMs (BHMMs), that is, conventional HMMs in which Gaussian (mixture) distributions have been replaced by Bernoulli (mixture) probability functions. The objective is twofold: on the one hand, this allows us to better modeling the binary nature of text images (foreground/background) using BHMMs. On the other hand, this guarantees that no discriminative information is filtered out during feature extraction (most HTR available datasets can be easily binarized without a relevant loss of information). In this thesis, all the HMM theory required to develop a HMM based HTR toolkit is reviewed and adapted to the case of BHMMs. Specifically, we begin by defining a simple classifier based on BHMMs with Bernoulli probability functions at the states, and we end with an embedded Bernoulli mixture HMM recognizer for continuous HTR. Regarding the binary features, we propose a simple binary feature extraction process without significant loss of information. All input images are scaled and binarized, in order to easily reinterpret them as sequences of binary feature vectors. Two extensions are proposed to this basic feature extraction method: the use of a sliding window in order to better capture the context, and a repositioning method in order to better deal with vertical distortions. Competitive results were obtained when BHMMs and proposed methods were applied to well-known HTR databases. In particular, we ranked first at the Arabic Handwriting Recognition Competition organized during the 12th International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), and at the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text organized during the 11th International Conference on Document Analysis and Recognition (ICDAR 2011). In the last part of this thesis we propose a method for training BHMM classifiers using discriminative training criteria, instead of the conventionalMaximum Likelihood Estimation (MLE). Specifically, we propose a log-linear classifier for binary data based on the BHMM classifier. Parameter estimation of this model can be carried out using discriminative training criteria for log-linear models. In particular, we show the formulae for several MMI based criteria. Finally, we prove the equivalence between both classifiers, hence, discriminative training of a BHMM classifier can be carried out by obtaining its equivalent log-linear classifier. Reported results show that discriminative BHMMs clearly outperform conventional generative BHMMs.
Giménez Pastor, A. (2014). Bernoulli HMMs for Handwritten Text Recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37978
TESIS
APA, Harvard, Vancouver, ISO, and other styles
27

Alkhoury, Ihab. "Arabic Text Recognition and Machine Translation." Doctoral thesis, Universitat Politècnica de València, 2015. http://hdl.handle.net/10251/53029.

Full text
Abstract:
[EN] Research on Arabic Handwritten Text Recognition (HTR) and Arabic-English Machine Translation (MT) has been usually approached as two independent areas of study. However, the idea of creating one system that combines both areas together, in order to generate English translation out of images containing Arabic text, is still a very challenging task. This process can be interpreted as the translation of Arabic images. In this thesis, we propose a system that recognizes Arabic handwritten text images, and translates the recognized text into English. This system is built from the combination of an HTR system and an MT system. Regarding the HTR system, our work focuses on the use of Bernoulli Hidden Markov Models (BHMMs). BHMMs had proven to work very well with Latin script. Indeed, empirical results based on it were reported on well-known corpora, such as IAM and RIMES. In this thesis, these results are extended to Arabic script, in particular, to the well-known IfN/ENIT and NIST OpenHaRT databases for Arabic handwritten text. The need for transcribing Arabic text is not only limited to handwritten text, but also to printed text. Arabic printed text might be considered as a simple form of handwritten text version. Thus, for this kind of text, we also propose Bernoulli HMMs. In addition, we propose to compare BHMMs with state-of-the-art technology based on neural networks. A key idea that has proven to be very effective in this application of Bernoulli HMMs is the use of a sliding window of adequate width for feature extraction. This idea has allowed us to obtain very competitive results in the recognition of both Arabic handwriting and printed text. Indeed, a system based on it ranked first at the ICDAR 2011 Arabic recognition competition on the Arabic Printed Text Image (APTI) database. Moreover, this idea has been refined by using repositioning techniques for extracted windows, leading to further improvements in Arabic text recognition. In the case of handwritten text, this refinement improved our system which ranked first at the ICFHR 2010 Arabic handwriting recognition competition on IfN/ENIT. In the case of printed text, this refinement led to an improved system which ranked second at the ICDAR 2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text on APTI. Furthermore, this refinement was used with neural networks-based technology, which led to state-of-the-art results. For machine translation, the system was based on the combination of three state-of-the-art statistical models: the standard phrase-based models, the hierarchical phrase-based models, and the N-gram phrase-based models. This combination was done using the Recognizer Output Voting Error Reduction (ROVER) method. Finally, we propose three methods of combining HTR and MT to develop an Arabic image translation system. The system was evaluated on the NIST OpenHaRT database, where competitive results were obtained.
[ES] El reconocimiento de texto manuscrito (HTR) en árabe y la traducción automática (MT) del árabe al inglés se han tratado habitualmente como dos áreas de estudio independientes. De hecho, la idea de crear un sistema que combine las dos áreas, que directamente genere texto en inglés a partir de imágenes que contienen texto en árabe, sigue siendo una tarea difícil. Este proceso se puede interpretar como la traducción de imágenes de texto en árabe. En esta tesis, se propone un sistema que reconoce las imágenes de texto manuscrito en árabe, y que traduce el texto reconocido al inglés. Este sistema está construido a partir de la combinación de un sistema HTR y un sistema MT. En cuanto al sistema HTR, nuestro trabajo se enfoca en el uso de los Bernoulli Hidden Markov Models (BHMMs). Los modelos BHMMs ya han sido probados anteriormente en tareas con alfabeto latino obteniendo buenos resultados. De hecho, existen resultados empíricos publicados usando corpus conocidos, tales como IAM o RIMES. En esta tesis, estos resultados se han extendido al texto manuscrito en árabe, en particular, a las bases de datos IfN/ENIT y NIST OpenHaRT. En aplicaciones reales, la transcripción del texto en árabe no se limita únicamente al texto manuscrito, sino también al texto impreso. El texto impreso se puede interpretar como una forma simplificada de texto manuscrito. Por lo tanto, para este tipo de texto, también proponemos el uso de modelos BHMMs. Además, estos modelos se han comparado con tecnología del estado del arte basada en redes neuronales. Una idea clave que ha demostrado ser muy eficaz en la aplicación de modelos BHMMs es el uso de una ventana deslizante (sliding window) de anchura adecuada durante la extracción de características. Esta idea ha permitido obtener resultados muy competitivos tanto en el reconocimiento de texto manuscrito en árabe como en el de texto impreso. De hecho, un sistema basado en este tipo de extracción de características quedó en la primera posición en el concurso ICDAR 2011 Arabic recognition competition usando la base de datos Arabic Printed Text Image (APTI). Además, esta idea se ha perfeccionado mediante el uso de técnicas de reposicionamiento aplicadas a las ventanas extraídas, dando lugar a nuevas mejoras en el reconocimiento de texto árabe. En el caso de texto manuscrito, este refinamiento ha conseguido mejorar el sistema que ocupó el primer lugar en el concurso ICFHR 2010 Arabic handwriting recognition competition usando IfN/ENIT. En el caso del texto impreso, este refinamiento condujo a un sistema mejor que ocupó el segundo lugar en el concurso ICDAR 2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text en el que se usaba APTI. Por otro lado, esta técnica se ha evaluado también en tecnología basada en redes neuronales, lo que ha llevado a resultados del estado del arte. Respecto a la traducción automática, el sistema se ha basado en la combinación de tres tipos de modelos estadísticos del estado del arte: los modelos standard phrase-based, los modelos hierarchical phrase-based y los modelos N-gram phrase-based. Esta combinación se hizo utilizando el método Recognizer Output Voting Error Reduction (ROVER). Por último, se han propuesto tres métodos para combinar los sistemas HTR y MT con el fin de desarrollar un sistema de traducción de imágenes de texto árabe a inglés. El sistema se ha evaluado sobre la base de datos NIST OpenHaRT, donde se han obtenido resultados competitivos.
[CAT] El reconeixement de text manuscrit (HTR) en àrab i la traducció automàtica (MT) de l'àrab a l'anglès s'han tractat habitualment com dues àrees d'estudi independents. De fet, la idea de crear un sistema que combine les dues àrees, que directament genere text en anglès a partir d'imatges que contenen text en àrab, continua sent una tasca difícil. Aquest procés es pot interpretar com la traducció d'imatges de text en àrab. En aquesta tesi, es proposa un sistema que reconeix les imatges de text manuscrit en àrab, i que tradueix el text reconegut a l'anglès. Aquest sistema està construït a partir de la combinació d'un sistema HTR i d'un sistema MT. Pel que fa al sistema HTR, el nostre treball s'enfoca en l'ús dels Bernoulli Hidden Markov Models (BHMMs). Els models BHMMs ja han estat provats anteriorment en tasques amb alfabet llatí obtenint bons resultats. De fet, existeixen resultats empírics publicats emprant corpus coneguts, tals com IAM o RIMES. En aquesta tesi, aquests resultats s'han estès a la escriptura manuscrita en àrab, en particular, a les bases de dades IfN/ENIT i NIST OpenHaRT. En aplicacions reals, la transcripció de text en àrab no es limita únicament al text manuscrit, sinó també al text imprès. El text imprès es pot interpretar com una forma simplificada de text manuscrit. Per tant, per a aquest tipus de text, també proposem l'ús de models BHMMs. A més a més, aquests models s'han comparat amb tecnologia de l'estat de l'art basada en xarxes neuronals. Una idea clau que ha demostrat ser molt eficaç en l'aplicació de models BHMMs és l'ús d'una finestra lliscant (sliding window) d'amplària adequada durant l'extracció de característiques. Aquesta idea ha permès obtenir resultats molt competitius tant en el reconeixement de text àrab manuscrit com en el de text imprès. De fet, un sistema basat en aquest tipus d'extracció de característiques va quedar en primera posició en el concurs ICDAR 2011 Arabic recognition competition emprant la base de dades Arabic Printed Text Image (APTI). A més a més, aquesta idea s'ha perfeccionat mitjançant l'ús de tècniques de reposicionament aplicades a les finestres extretes, donant lloc a noves millores en el reconeixement de text en àrab. En el cas de text manuscrit, aquest refinament ha aconseguit millorar el sistema que va ocupar el primer lloc en el concurs ICFHR 2010 Arabic handwriting recognition competition usant IfN/ENIT. En el cas del text imprès, aquest refinament va conduir a un sistema millor que va ocupar el segon lloc en el concurs ICDAR 2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text en el qual s'usava APTI. D'altra banda, aquesta tècnica s'ha avaluat també en tecnologia basada en xarxes neuronals, el que ha portat a resultats de l'estat de l'art. Respecte a la traducció automàtica, el sistema s'ha basat en la combinació de tres tipus de models estadístics de l'estat de l'art: els models standard phrase-based, els models hierarchical phrase-based i els models N-gram phrase-based. Aquesta combinació es va fer utilitzant el mètode Recognizer Output Voting Errada Reduction (ROVER). Finalment, s'han proposat tres mètodes per combinar els sistemes HTR i MT amb la finalitat de desenvolupar un sistema de traducció d'imatges de text àrab a anglès. El sistema s'ha avaluat sobre la base de dades NIST OpenHaRT, on s'han obtingut resultats competitius.
Alkhoury, I. (2015). Arabic Text Recognition and Machine Translation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/53029
TESIS
APA, Harvard, Vancouver, ISO, and other styles
28

Guthrie, David. "Unsupervised detection of anomalous text." Thesis, University of Sheffield, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.500287.

Full text
Abstract:
This thesis describes work on the detection of anomalous material in text without the use of training data. We use the term anomalous to refer to text that is irregular, or deviates signihcantly from its surrounding context. In this thesis we show to identifying such abnormalities in text can be viewed as a type of outlier detection because these anomahes will differ significantly from the writing style in the majority We consider segments of text, which are anomalous with respect to topic about a different subject, author (written by a different person), or genre (written for a different audience or from a different source) and experiment with whether it is possible to identify these anomalous segments automatically. Five different innovative approaches to this problem are introduced and assessed using many experiments ver large document collections, created to contain randomly inserted anomalous segments. In order to identify anomalies in text successfully, we investigate and evaluate 166 stylistic and linguistic features used to characterize writing, some of which are well-established stylistic determiners, but many of which are original. Using these features with each of our methods, we examine the effect of segment size on our ability to detect anomaly, allowing segments of size 100 words, 500 words and 1000 words. We show substantial improvements over a baseline in all cases for all methods, a novel method which performs consistently better than others and the features that contribute most to unsupervised anomaly detection.
APA, Harvard, Vancouver, ISO, and other styles
29

Orizu, Udochukwu. "Implicit emotion detection in text." Thesis, Aston University, 2018. http://publications.aston.ac.uk/37693/.

Full text
Abstract:
In text, emotion can be expressed explicitly, using emotion-bearing words (e.g. happy, guilty) or implicitly without emotion-bearing words. Existing approaches focus on the detection of explicitly expressed emotion in text. However, there are various ways to express and convey emotions without the use of these emotion-bearing words. For example, given two sentences: “The outcome of my exam makes me happy” and “I passed my exam”, both sentences express happiness, with the first expressing it explicitly and the other implying it. In this thesis, we investigate implicit emotion detection in text. We propose a rule-based approach for implicit emotion detection, which can be used without labeled corpora for training. Our results show that our approach outperforms the lexicon matching method consistently and gives competitive performance in comparison to supervised classifiers. Given that emotions such as guilt and admiration which often require the identification of blameworthiness and praiseworthiness, we also propose an approach for the detection of blame and praise in text, using an adapted psychology model, Path model to blame. Lack of benchmarking dataset led us to construct a corpus containing comments of individuals’ emotional experiences annotated as blame, praise or others. Since implicit emotion detection might be useful for conflict-of-interest (CoI) detection in Wikipedia articles, we built a CoI corpus and explored various features including linguistic and stylometric, presentation, bias and emotion features. Our results show that emotion features are important when using Nave Bayes, but the best performance is obtained with SVM on linguistic and stylometric features only. Overall, we show that a rule-based approach can be used to detect implicit emotion in the absence of labelled data; it is feasible to adopt the psychology path model to blame for blame/praise detection from text, and implicit emotion detection is beneficial for CoI detection in Wikipedia articles.
APA, Harvard, Vancouver, ISO, and other styles
30

Gupta, Smita. "Modelling Deception Detection in Text." Thesis, Kingston, Ont. : [s.n.], 2007. http://hdl.handle.net/1974/922.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Lu, Su. "DCT coefficient based text detection." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 57 p, 2008. http://proquest.umi.com/pqdweb?did=1605147371&sid=4&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

O'Shea, Kieran. "Roadsign detection & recognition /." Leeds : University of Leeds, School of Computer Studies, 2008. http://www.comp.leeds.ac.uk/fyproj/reports/0708/OShea.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Young-Lai, Matthew. "Text structure recognition using a region algebra." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/NQ60576.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Keenan, Francis Gerard. "Large vocabulary syntactic analysis for text recognition." Thesis, Nottingham Trent University, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.334311.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Abuhaiba, Ibrahim S. I. "Recognition of off-line handwritten cursive text." Thesis, Loughborough University, 1996. https://dspace.lboro.ac.uk/2134/7331.

Full text
Abstract:
The author presents novel algorithms to design unconstrained handwriting recognition systems organized in three parts: In Part One, novel algorithms are presented for processing of Arabic text prior to recognition. Algorithms are described to convert a thinned image of a stroke to a straight line approximation. Novel heuristic algorithms and novel theorems are presented to determine start and end vertices of an off-line image of a stroke. A straight line approximation of an off-line stroke is converted to a one-dimensional representation by a novel algorithm which aims to recover the original sequence of writing. The resulting ordering of the stroke segments is a suitable preprocessed representation for subsequent handwriting recognition algorithms as it helps to segment the stroke. The algorithm was tested against one data set of isolated handwritten characters and another data set of cursive handwriting, each provided by 20 subjects, and has been 91.9% and 91.8% successful for these two data sets, respectively. In Part Two, an entirely novel fuzzy set-sequential machine character recognition system is presented. Fuzzy sequential machines are defined to work as recognizers of handwritten strokes. An algorithm to obtain a deterministic fuzzy sequential machine from a stroke representation, that is capable of recognizing that stroke and its variants, is presented. An algorithm is developed to merge two fuzzy machines into one machine. The learning algorithm is a combination of many described algorithms. The system was tested against isolated handwritten characters provided by 20 subjects resulting in 95.8% recognition rate which is encouraging and shows that the system is highly flexible in dealing with shape and size variations. In Part Three, also an entirely novel text recognition system, capable of recognizing off-line handwritten Arabic cursive text having a high variability is presented. This system is an extension of the above recognition system. Tokens are extracted from a onedimensional representation of a stroke. Fuzzy sequential machines are defined to work as recognizers of tokens. It is shown how to obtain a deterministic fuzzy sequential machine from a token representation that is capable'of recognizing that token and its variants. An algorithm for token learning is presented. The tokens of a stroke are re-combined to meaningful strings of tokens. Algorithms to recognize and learn token strings are described. The. recognition stage uses algorithms of the learning stage. The process of extracting the best set of basic shapes which represent the best set of token strings that constitute an unknown stroke is described. A method is developed to extract lines from pages of handwritten text, arrange main strokes of extracted lines in the same order as they were written, and present secondary strokes to main strokes. Presented secondary strokes are combined with basic shapes to obtain the final characters by formulating and solving assignment problems for this purpose. Some secondary strokes which remain unassigned are individually manipulated. The system was tested against the handwritings of 20 subjects yielding overall subword and character recognition rates of 55.4% and 51.1%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
36

Rose, Tony Gerard. "Large vocabulary semantic analysis for text recognition." Thesis, Nottingham Trent University, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.333961.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Zhang, Yaxi. "Named Entity Recognition for Social Media Text." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-395978.

Full text
Abstract:
This thesis aims to perform named entity recognition for English social media texts. Named Entity Recognition (NER) is applied in many NLP tasks as an important preprocessing procedure. Social media texts contain lots of real-time data and therefore serve as a valuable source for information extraction. Nevertheless, NER for social media texts is a rather challenging task due to the noisy context. Traditional approaches to deal with this task use hand-crafted features but prove to be both time-consuming and very task-specific. As a result, they fail to deliver satisfactory performance. The goal of this thesis is to tackle this task by automatically identifying and annotating the named entities with multiple types with the help of neural network methods. In this thesis, we experiment with three different word embeddings and character embedding neural network architectures that combine long short- term memory (LSTM), bidirectional LSTM (BI-LSTM) and conditional random field (CRF) to get the best result. The data and evaluation tool comes from the previous shared tasks on Noisy User-generated Text (W- NUT) in 2017. We achieve the best F1 score 42.44 using BI-LSTM-CRF with character-level representation extracted by a BI-LSTM, and pre-trained word embeddings trained by GloVe. We also find out that the results could be improved with larger training data sets.
APA, Harvard, Vancouver, ISO, and other styles
38

Dahlstedt, Olle. "Automatic Handwritten Text Detection and Classification." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-453809.

Full text
Abstract:
As more and more organizations digitize their records, the need for automatic document processing software increases. In particular, the rise of ‘digital humanities’ precede a new set of problems on how to digitize historical archival material in an efficient and accurate manner. The transcription of archival material to formats fit for research purposes, such as handwritten spreadsheets, is still expensive and plagued by tedious manual labor. Over the decades, research in handwritten text recognition has focused on text line extraction and recognition. In this thesis, we examine document images that contain complex details, contain more categories of text than handwriting, and handwritten text that is not separated easily to lines. The thesis examines the sub-problem of handwritten text segmentation in detail. We propose a broad definition of text segmentation that requires both text detection and text classification, since this enables us to detect multiple kinds of text within the same image. The aim is to design a system which can detect and identify both handwriting and machine-text within the same image. Working with photographs of spreadsheet documents from the years 1871-1951, a topdown layout-agnostic image processing pipeline is developed. Different kinds of preprocessing are examined, to correct illumination and enhance contrast before binarization, and to detect and clear line contours. To achieve text region detection, we evaluate connected components labeling and MSER as region detectors, extracting textual and non-textual sub-images. On detected sub-images, we perform a Bag-of-Visual-Words quantization of k-means clustered feature descriptor vectors and perform categorical classification by training a Naïve Bayesclassifier on the feature distances to the cluster centroids. Results include a novel two-stage illumination correction and contrast enhancement algorithm that improves document quality as a precursor to binarization, increasing the mean grayscale values of an image while retaining low grayscale variance. Region detectors are evaluated on images with different types of preprocessing and the results show that clearing document outlines influences text region detection. Training on a small sample of sub-images, the categorical classification model proves viable for discrimination between machine-text and handwriting, enabling the use of this model for further recognition purposes.
APA, Harvard, Vancouver, ISO, and other styles
39

Uren, Victoria Susannah. "Combining text categorizers." Thesis, University of Portsmouth, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.343389.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Červenec, Radek. "Rozpoznávání emocí v česky psaných textech." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-218962.

Full text
Abstract:
With advances in information and communication technologies over the past few years, the amount of information stored in the form of electronic text documents has been rapidly growing. Since the human abilities to effectively process and analyze large amounts of information are limited, there is an increasing demand for tools enabling to automatically analyze these documents and benefit from their emotional content. These kinds of systems have extensive applications. The purpose of this work is to design and implement a system for identifying expression of emotions in Czech texts. The proposed system is based mainly on machine learning methods and therefore design and creation of a training set is described as well. The training set is eventually utilized to create a model of classifier using the SVM. For the purpose of improving classification results, additional components were integrated into the system, such as lexical database, lemmatizer or derived keyword dictionary. The thesis also presents results of text documents classification into defined emotion classes and evaluates various approaches to categorization.
APA, Harvard, Vancouver, ISO, and other styles
41

Kozlovski, Nikolai. "TEXT-IMAGE RESTORATION AND TEXT ALIGNMENT FOR MULTI-ENGINE OPTICAL CHARACTER RECOGNITION SYSTEMS." Master's thesis, University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3607.

Full text
Abstract:
Previous research showed that combining three different optical character recognition (OCR) engines (ExperVision® OCR, Scansoft OCR, and Abbyy® OCR) results using voting algorithms will get higher accuracy rate than each of the engines individually. While a voting algorithm has been realized, several aspects to automate and improve the accuracy rate needed further research. This thesis will focus on morphological image preprocessing and morphological text restoration that goes to OCR engines. This method is similar to the one used in restoration partial finger prints. Series of morphological dilating and eroding filters of various mask shapes and sizes were applied to text of different font sizes and types with various noises added. These images were then processed by the OCR engines, and based on these results successful combinations of text, noise, and filters were chosen. The thesis will also deal with the problem of text alignment. Each OCR engine has its own way of dealing with noise and corrupted characters; as a result, the output texts of OCR engines have different lengths and number of words. This in turn, makes it impossible to use spaces a delimiter as a method to separate the words for processing by the voting part of the system. Text aligning determines, using various techniques, what is an extra word, what is supposed to be two or more words instead of one, which words are missing in one document compared to the other, etc. Alignment algorithm is made up of a series of shifts in the two texts to determine which parts are similar and which are not. Since errors made by OCR engines are due to visual misrecognition, in addition to simple character comparison (equal or not), a technique was developed that allows comparison of characters based on how they look.
M.S.E.E.
Department of Electrical and Computer Engineering
Engineering and Computer Science
Electrical Engineering
APA, Harvard, Vancouver, ISO, and other styles
42

Bashir, Sulaimon A. "Change detection for activity recognition." Thesis, Robert Gordon University, 2017. http://hdl.handle.net/10059/3104.

Full text
Abstract:
Activity Recognition is concerned with identifying the physical state of a user at a particular point in time. Activity recognition task requires the training of classification algorithm using the processed sensor data from the representative population of users. The accuracy of the generated model often reduces during classification of new instances due to the non-stationary sensor data and variations in user characteristics. Thus, there is a need to adapt the classification model to new user haracteristics. However, the existing approaches to model adaptation in activity recognition are blind. They continuously adapt a classification model at a regular interval without specific and precise detection of the indicator of the degrading performance of the model. This approach can lead to wastage of system resources dedicated to continuous adaptation. This thesis addresses the problem of detecting changes in the accuracy of activity recognition model. The thesis developed a classifier for activity recognition. The classifier uses three statistical summaries data that can be generated from any dataset for similarity based classification of new samples. The weighted ensemble combination of the classification decision from each statistical summary data results in a better performance than three existing benchmarked classification algorithms. The thesis also presents change detection approaches that can detect the changes in the accuracy of the underlying recognition model without having access to the ground truth label of each activity being recognised. The first approach called `UDetect' computes the change statistics from the window of classified data and employed statistical process control method to detect variations between the classified data and the reference data of a class. Evaluation of the approach indicates a consistent detection that correlates with the error rate of the model. The second approach is a distance based change detection technique that relies on the developed statistical summaries data for comparing new classified samples and detects any drift in the original class of the activity. The implemented approach uses distance function and a threshold parameter to detect the accuracy change in the classifier that is classifying new instances. Evaluation of the approach yields above 90% detection accuracy. Finally, a layered framework for activity recognition is proposed to make model adaptation in activity recognition informed using the developed techniques in this thesis.
APA, Harvard, Vancouver, ISO, and other styles
43

Sabir, Ahmed. "Enhancing scene text recognition with visual context information." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/670286.

Full text
Abstract:
This thesis addresses the problem of improving text spotting systems, which aim to detect and recognize text in unrestricted images (e.g. a street sign, an advertisement, a bus destination, etc.). The goal is to improve the performance of off-the-shelf vision systems by exploiting the semantic information derived from the image itself. The rationale is that knowing the content of the image or the visual context can help to decide which words are the correct andidate words. For example, the fact that an image shows a coffee shop makes it more likely that a word on a signboard reads as Dunkin and not unkind. We address this problem by drawing on successful developments in natural language processing and machine learning, in particular, learning to re-rank and neural networks, to present post-process frameworks that improve state-of-the-art text spotting systems without the need for costly data-driven re-training or tuning procedures. Discovering the degree of semantic relatedness of candidate words and their image context is a task related to assessing the semantic similarity between words or text fragments. However, semantic relatedness is more general than similarity (e.g. car, road, and traffic light are related but not similar) and requires certain adaptations. To meet the requirements of these broader perspectives of semantic similarity, we develop two approaches to learn the semantic related-ness of the spotted word and its environmental context: word-to-word (object) or word-to-sentence (caption). In the word-to-word approach, word embed-ding based re-rankers are developed. The re-ranker takes the words from the text spotting baseline and re-ranks them based on the visual context from the object classifier. For the second, an end-to-end neural approach is designed to drive image description (caption) at the sentence-level as well as the word-level (objects) and re-rank them based not only on the visual context but also on the co-occurrence between them. As an additional contribution, to meet the requirements of data-driven ap-proaches such as neural networks, we propose a visual context dataset for this task, in which the publicly available COCO-text dataset [Veit et al. 2016] has been extended with information about the scene (including the objects and places appearing in the image) to enable researchers to include the semantic relations between texts and scene in their Text Spotting systems, and to offer a common evaluation baseline for such approaches.
Aquesta tesi aborda el problema de millorar els sistemes de reconeixement de text, que permeten detectar i reconèixer text en imatges no restringides (per exemple, un cartell al carrer, un anunci, una destinació d’autobús, etc.). L’objectiu és millorar el rendiment dels sistemes de visió existents explotant la informació semàntica derivada de la pròpia imatge. La idea principal és que conèixer el contingut de la imatge o el context visual en el que un text apareix, pot ajudar a decidir quines són les paraules correctes. Per exemple, el fet que una imatge mostri una cafeteria fa que sigui més probable que una paraula en un rètol es llegeixi com a Dunkin que no pas com unkind. Abordem aquest problema recorrent a avenços en el processament del llenguatge natural i l’aprenentatge automàtic, en particular, aprenent re-rankers i xarxes neuronals, per presentar solucions de postprocés que milloren els sistemes de l’estat de l’art de reconeixement de text, sense necessitat de costosos procediments de reentrenament o afinació que requereixin grans quantitats de dades. Descobrir el grau de relació semàntica entre les paraules candidates i el seu context d’imatge és una tasca relacionada amb l’avaluació de la semblança semàntica entre paraules o fragments de text. Tanmateix, determinar l’existència d’una relació semàntica és una tasca més general que avaluar la semblança (per exemple, cotxe, carretera i semàfor estan relacionats però no són similars) i per tant els mètodes existents requereixen certes adaptacions. Per satisfer els requisits d’aquestes perspectives més àmplies de relació semàntica, desenvolupem dos enfocaments per aprendre la relació semàntica de la paraula reconeguda i el seu context: paraula-a-paraula (amb els objectes a la imatge) o paraula-a-frase (subtítol de la imatge). En l’enfocament de paraula-a-paraula s’usen re-rankers basats en word-embeddings. El re-ranker pren les paraules proposades pel sistema base i les torna a reordenar en funció del context visual proporcionat pel classificador d’objectes. Per al segon cas, s’ha dissenyat un enfocament neuronal d’extrem a extrem per explotar la descripció de la imatge (subtítol) tant a nivell de frase com a nivell de paraula i re-ordenar les paraules candidates basant-se tant en el context visual com en les co-ocurrències amb el subtítol. Com a contribució addicional, per satisfer els requisits dels enfocs basats en dades com ara les xarxes neuronals, presentem un conjunt de dades de contextos visuals per a aquesta tasca, en el què el conjunt de dades COCO-text disponible públicament [Veit et al. 2016] s’ha ampliat amb informació sobre l’escena (inclosos els objectes i els llocs que apareixen a la imatge) per permetre als investigadors incloure les relacions semàntiques entre textos i escena als seus sistemes de reconeixement de text, i oferir una base d’avaluació comuna per a aquests enfocaments.
APA, Harvard, Vancouver, ISO, and other styles
44

Saracoglu, Ahmet. "Localization And Recognition Of Text In Digital Media." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/2/12609028/index.pdf.

Full text
Abstract:
Textual information within digital media can be used in many areas such as, indexing and structuring of media databases, in the aid of visually impaired, translation of foreign signs and many more. This said, mainly text can be separated into two categories in digital media as, overlay-text and scene-text. In this thesis localization and recognition of video text regardless of its category in digital media is investigated. As a necessary first step, framework of a complete system is discussed. Next, a comparative analysis of feature vector and classification method pairs is presented. Furthermore, multi-part nature of text is exploited by proposing a novel Markov Random Field approach for the classification of text/non-text regions. Additionally, better localization of text is achieved by introducing bounding-box extraction method. And for the recognition of text regions, a handprint based Optical Character Recognition system is thoroughly investigated. During the investigation of text recognition, multi-hypothesis approach for the segmentation of background is proposed by incorporating k-Means clustering. Furthermore, a novel dictionary-based ranking mechanism is proposed for recognition spelling correction. And overall system is simulated on a challenging data set. Also, a through survey on scene-text localization and recognition is presented. Furthermore, challenges are identified and discussed by providing related work on them. Scene-text localization simulations on a public competition data set are also provided. Lastly, in order to improve recognition performance of scene-text on signs that are affected from perspective projection distortion, a rectification method is proposed and simulated.
APA, Harvard, Vancouver, ISO, and other styles
45

Uzuner, Halil. "Robust text-independent speaker recognition over telecommunications systems." Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/843391/.

Full text
Abstract:
Biometric recognition methods, using human features such as voice, face or fingeorprints, are increasingly popular for user authentication. Voice is unique in that it is a non-intrusive biometric which can be transmitted over the existing telecommunication networks, thereby allowing remote authentication. Current spealcer recognition systems can provide high recognition rates on clean speech signals. However, their performance has been shown to degrade in real-life applications such as telephone banking, where speech compression and background noise can affect the speech signal. In this work, three important advancements have been introduced to improve the speaker recognition performance, where it is affected by the coder mismatch, the aliasing distortion caused by the Line Spectral Frequency (LSF) parameter extraction, and the background noise. The first advancement focuses on investigating the speaker recognition system performance in a multi-coder environment using a Speech Coder Detection (SCD) System, which minimises training and testing data mismatch and improves the speaker recognition performance. Having reduced the speaker recognition error rates for multi-coder environment, further investigation on GSM-EFR speech coder is performed to deal with a particular - problem related to LSF parameter extraction method. It has been previously shown that the classic technique for extraction of LSF parameters in speech coders is prone to aliasing distortion. Low-pass filtering on up-sampled LSF vectors has been shown to alleviate this problem, therefore improving speech quality. In this thesis, as a second advancement, the Non-Aliased LSF (NA-LSF) extraction method is introduced in order to reduce the unwanted effects of GSM-EFR coder on speaker recognition performance. Another important factor that effects the performance of speaker recognition systems is the presence of the background noise. Background noise might severely reduce the performance of the targeted application such as quality of the coded speech, or the performance of the speaker recognition systems. The third advancement was achieved by using a noise-canceller to improve the speaker recognition performance in mismatched environments with varying background noise conditions. Speaker recognition system with a Minimum Mean Square Error - Log Spectral Amplitudes (MMSE-LSA) noise- canceller used as a pre-processor is proposed and investigated to determine the efficiency of noise cancellation on the speaker recognition performance using speech corrupted by different background noise conditions. Also the effects of noise cancellation on speaker recognition performance using coded noisy speech have been investigated. Key words; Identification, Verification, Recognition, Gaussian Mixture Models, Speech Coding, Noise Cancellation.
APA, Harvard, Vancouver, ISO, and other styles
46

Wildermoth, Brett Richard, and n/a. "Text-Independent Speaker Recognition Using Source Based Features." Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
APA, Harvard, Vancouver, ISO, and other styles
47

Greenhalgh, Jack. "Driver assistance using automated symbol and text recognition." Thesis, University of Bristol, 2015. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.685967.

Full text
Abstract:
This thesis introduces several novel methods for the detection and recognition of both text and symbols in road signs and road markings. Firstly, a method for the automatic detection and recognition of symbol-based road signs is presented. This algorithm detects candidate regions as maximally stable extremal regions (MSER), due to their robustness to lighting variations. Candidate regions are verified and classified during a recognition stage, which uses a cascade of Random Forests trained on histogram of oriented gradient (HOG) features. All training data used in this process is synthetically generated from template images available from an online database, eliminating the need for real footage data. The method retains a high accuracy, even at high vehicle speeds, and can operate under a range of weather conditions. The algorithm runs in real-time, at a processing rate of 20 frames per second, and recognises all road signs currently in use in the UK. Comparative results are provided to validate the performance. Secondly, a method is proposed for the automatic detection and recognition of text in road Signs. Search regions for road sign candidates are defined through exploitation of scene structure. A large number of candidate regions are located through a combination of MSER and hue, saturation, value (HSV) thresholding, which are then reduced through the analysis of temporal and structural features. The recognition stage of the algorithm then aims to interpret the text contained within the candidate regions. Text characters are first detected as MSERs, which are then grouped into lines and interpreted using optical character recognition (OCR). Temporal fusion is applied to the text results across consecutive frames, which vastly improves performance. Comparative analysis is provided to validate the performance of the method, and an overall F-measure of 0.87 is achieved. Finally, a method for the automatic detection and recognition of symbols and text painted on the road surface is presented. Candidates for symbols and text characters are detected in an inverse perspective mapping (IPM) transformed version of the frame, to remove the effect of perspective distortion. Detected candidate regions are then divided into symbols and words, so that they can be recognised using scparate classification stages. Temporal fusion is applied to both words and symbols in order to improve performance. The performance of the proposed method is validated using a challenging dataset of videos, and provides overall F-measures of 0.85 and 0.91 for text characters and symbols, respectively.
APA, Harvard, Vancouver, ISO, and other styles
48

Bertolami, Roman. "Ensemble methods for offline handwritten text line recognition /." [S.l.] : [s.n.], 2008. http://www.zb.unibe.ch/download/eldiss/08bertolami_r.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Calarasanu, Stefania Ana. "Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066524/document.

Full text
Abstract:
Le nombre croissant d'approches de détection de texte proposé dans la littérature exige une évaluation rigoureuse de la performance. Un protocole d'évaluation repose sur trois éléments: une vérité terrain fiable, une stratégie d'appariement et enfin un ensemble de métriques. Peu de protocoles existent et ces protocoles manquent souvent de précision. Dans cette thèse, nous proposons un nouveau protocole d'évaluation qui résout la plupart des problèmes rencontrés dans les méthodes d'évaluation actuelles. Ce travail est axé sur trois contributions principales : tout d’abord, nous introduisons une représentation complexe de la vérité terrain qui ne contraint pas les détecteurs de texte à adopter un niveau de granularité de détection spécifique ou une représentation d'annotation ; d’autre part, nous proposons un ensemble de règles capables d'évaluer tous types de scénario qui peuvent se produire entre les objets de la vérité terrain et les détections correspondantes ; et enfin, nous montrons comment nous pouvons analyser un ensemble de résultats de détection, non seulement à travers un ensemble de mesures, mais aussi à travers une représentation visuelle intuitive. Un défi fréquent pour de nombreux systèmes de détection de texte est d'aborder la variété des caractéristiques de texte dans des images naturelles ou d’origine numérique pour lesquels les OCR actuels ne sont pas bien adaptées. Par exemple, des textes en perspective sont fréquemment présents dans les images réelles. Dans cette thèse, nous proposons également une procédure de rectification capable de corriger des textes hautement déformés, évalué sur un ensemble de données difficiles
The growing number of text detection approaches proposed in the literature requires a rigorous performance evaluation and ranking. An evaluation protocol relies on three elements: a reliable text reference, a matching strategy and finally a set of metrics. The few existing evaluation protocols often lack accuracy either due to inconsistent matching or due to unrepresentative metrics. In this thesis we propose a new evaluation protocol that tackles most of the drawbacks faced by currently used evaluation methods. This work is focused on three main contributions: firstly, we introduce a complex text reference representation that does not constrain text detectors to adopt a specific detection granularity level or annotation representation; secondly, we propose a set of matching rules capable of evaluating any type of scenario that can occur between a text reference and a detection; and finally we show how we can analyze a set of detection results, not only through a set of metrics, but also through an intuitive visual representation. A frequent challenge for many Text Understanding Systems is to tackle the variety of text characteristics in born-digital and natural scene images for which current OCRs are not well adapted. For example, texts in perspective are frequently present in real-word images because the camera capture angle is not normal to the plane containing the text regions. Despite the ability of some detectors to accurately localize such text objects, the recognition stage fails most of the time. In this thesis we also propose a rectification procedure capable of correcting highly distorted texts evaluated on a very challenging dataset
APA, Harvard, Vancouver, ISO, and other styles
50

Namane, Abderrahmane. "Degraded printed text and handwritten recognition methods : Application to automatic bank check recognition." Université Louis Pasteur (Strasbourg) (1971-2008), 2007. http://www.theses.fr/2007STR13048.

Full text
Abstract:
La reconnaissance des caractères est une étape importante dans tout système de reconnaissances de document. Cette reconnaissance de caractère est considérée comme un problème d'affectation et de décision de caractères, et a fait l'objet de recherches dans de nombreuses disciplines. Cette thèse porte principalement sur la reconnaissance du caractère imprimé dégradé et manuscrit. De nouvelles solutions ont été apportées au domaine de l'analyse du document image (ADI). On trouve en premier lieu, le développement de deux méthodes de reconnaissance du chiffre manuscrit, notamment, la méthode basée sur l'utilisation de la transformée de Fourier-Mellin (TFM) et la carte auto-organisatrice (CAO), et l'utilisation de la combinaison parallèle basée sur les HMMs comme classificateurs de bases, avec comme extracteur de paramètres une nouvelle technique de projection. En deuxième lieu, on trouve une nouvelle méthode de reconnaissance holistique de mots manuscrits appliquée au montant légal Français. En troisième lieu, deux travaux basés sur les réseaux de neurones ont étés réalisés sur la reconnaissance du caractère imprimé dégradé et appliqués au chèque postal Algérien. Le premier travail est basé sur la combinaison séquentielle et le deuxième a fait l'objet d'une combinaison série basé sur l'introduction d'une distance relative pour la mesure de qualité du caractère dégradé. Lors de l'élaboration de ce travail, des méthodes de prétraitement ont été aussi développées, notamment, la correction de l'inclinaison du chiffre manuscrit, la détection de la zone centrale du mot manuscrit ainsi que sa pente
Character recognition is a significant stage in all document recognition systems. Character recognition is considered as an assignment problem and decision of a given character, and is an active research subject in many disciplines. This thesis is mainly related to the recognition of degraded printed and handwritten characters. New solutions were brought to the field of document image analysis (DIA). The first solution concerns the development of two recognition methods for handwritten numeral character, namely, the method based on the use of Fourier-Mellin transform (FMT) and the self-organization map (SOM), and the parallel combination of HMM-based classifiers using as parameter extraction a new projection technique. In the second solution, one finds a new holistic recognition method of handwritten words applied to French legal amount. The third solution presents two recognition methods based on neural networks for the degraded printed character applied to the Algerian postal check. The first work is based on sequential combination and the second used a serial combination based mainly on the introduction of a relative distance for the quality measurement of the degraded character. During the development of this thesis, methods of preprocessing were also developed, in particular, the handwritten numeral slant correction, the handwritten word central zone detection and its slope
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography