To see the other types of publications on this topic, follow the link: Script Recognition.

Dissertations / Theses on the topic 'Script Recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 41 dissertations / theses for your research on the topic 'Script Recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Carroll, Johnny Glen 1953. "Practical Cursive Script Recognition." Thesis, University of North Texas, 1995. https://digital.library.unt.edu/ark:/67531/metadc277710/.

Full text
Abstract:
This research focused on the off-line cursive script recognition application. The problem is very large and difficult and there is much room for improvement in every aspect of the problem. Many different aspects of this problem were explored in pursuit of solutions to create a more practical and usable off-line cursive script recognizer than is currently available.
APA, Harvard, Vancouver, ISO, and other styles
2

Higgins, C. A. "Automatic recognition of handwritten script." Thesis, University of Brighton, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.372081.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Poon, C. H. "The recognition of cursive script." Thesis, University of Sussex, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.381611.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Dehkordi, Mandana Ebadian. "Style classification of cursive script recognition." Thesis, Nottingham Trent University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.272442.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Papageorgiu, Dimitrios. "Cursive script recognition in real time." Thesis, University of Sussex, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.317243.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Srikanta, Pal. "Multi-Script Off-Line Signature Verification." Thesis, Griffith University, 2014. http://hdl.handle.net/10072/366751.

Full text
Abstract:
Signature identification and verification are of great importance in authentication systems. In the field of signature verification, substantial investigation has been undertaken, mainly involving English signatures. Conversely, very few works have been conducted considering non-English signatures in signature verification research. Mainly Chinese, Japanese, Arabic and Persian signatures have been considered in the field of non-English signature verification. To the best of my knowledge, no research involving the signatures of Indian scripts have been considered in the field of non-English signature verification before this investigation is introduced. This research which considers signatures of Indian scripts is the first investigation proposed and presented in this area. Considerable research has previously been undertaken in the area of signature verification, particularly involving single-script signatures. On the other hand, no attention has been devoted to the task of multi-script signature verification. A multi-lingual country like India has many different scripts that are used for writing as well as for signing purposes based on different locations or regions. In India, a single official transaction sometimes needs signatures using more than one script. So, the consideration of signatures dealing with more than one or two scripts is an important task mainly for multi-lingual and multi-script countries. Development of a general multi-script signature verification system, which can verify signatures of all scripts, is very complicated and it is not possible to develop such a system in the Indian scenario. The verification accuracy in such multi-script signature environments will not be desirable compared to single script signature verification. To achieve the necessary accuracy for multi-script signature verification, it is first important to identify signatures based on the type of script and then use an individual single script signature verification system for the identified signature script. Based on this observation, in this research, the signatures of different scripts are separated to feed into the individual signature verification system.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
7

Vajda, Szilárd. "Cursive Bengali Script Recognition for Indian Postal Automation." Phd thesis, Université Henri Poincaré - Nancy I, 2008. http://tel.archives-ouvertes.fr/tel-00579806.

Full text
Abstract:
Large variations in writing styles and difficulties in segmenting cursive words are the main reasons for handwritten cursive words recognition for being such a challenging task. An Indian postal document reading system based on a segmentation-free context based stochastic model is presented. The originality of the work resides on a combination of high-level perceptual features with the low-level pixel information considered by the former model and a pruning strategy in the Viterbi decoding to reduce the recognition time. While the low-level information can be easily extracted from the analyzed form, the discriminative power of such information has some limits as describes the shape with less precision. For that reason, we have considered in the framework of an analytical approach, using an implicit segmentation, the implant of high-level information reduced to a lower level. This enrichment can be perceived as a weight at pixel level, assigning an importance to each analyzed pixel based on their perceptual properties. The challenge is to combine the different type of features considering a certain dependence between them. To reduce the decoding time in the Viterbi search, a cumulative threshold mechanism is proposed in a flat lexicon representation. Instead of using a trie representation where the common prefix parts are shared we propose a threshold mechanism in the flat lexicon where based just on a partial Viterbi analysis, we can prune a model and stop the further processing. The cumulative thresholds are based on matching scores calculated at each letter level, allowing a certain dynamic and elasticity to the model. As we are interested in a complete postal address recognition system, we have also focused our attention on digit recognition, proposing different neural and stochastic solutions. To increase the accuracy and robustness of the classifiers a combination scheme is also proposed. The results obtained on different datasets written on Latin and Bengali scripts have shown the interest of the method and the recognition module developed will be integrated in a generic system for the Indian postal automation.
APA, Harvard, Vancouver, ISO, and other styles
8

Beglou, Masoud M. "Preprocessing and recognition of off-line cursive script." Thesis, Loughborough University, 1994. https://dspace.lboro.ac.uk/2134/27496.

Full text
Abstract:
The thesis is concerned with the design and developments of multi-writer cursive script recognition system. The particular problem addressed is probably the most difficult form of Optical Character Recognition (OCR), where the handwritten data is captured off-line (e.g. via a document scanner).
APA, Harvard, Vancouver, ISO, and other styles
9

Vajda, Szilárd Belaïd Abdelwaheb. "Cursive Bengali Script Recognition for Indian Postal Automation." S. l. : Nancy 1, 2008. http://www.scd.uhp-nancy.fr/docnum/SCD_T_2008_0083_VAJDA.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Bellaby, Gareth John. "The use of word level cues for script recognition." Thesis, Nottingham Trent University, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.312317.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Powalka, Robert Kazimierz. "An algorithm toolbox for on-line cursive script recognition." Thesis, Nottingham Trent University, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.283031.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Toyoda, Etsuko. "Developing script-specific recognition ability : the case of learners of Japanese /." Connect to thesis, 2006. http://eprints.unimelb.edu.au/archive/00002971.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Kadirkamanathan, Mahapathy. "A scale-space approach to segmentation and recognition of cursive script." Thesis, University of Cambridge, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.276741.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Wright, P. T. "Algorithms for the recognition of handwriting in real-time." Thesis, Open University, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.234272.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Ozaki, Keiko. "Phonological recoding in single word recognition and text comprehension in English and Japanese." Thesis, University of Sussex, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.310670.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Park, Gwang Hoon. "Handwritten digit and script recognition using density based random vector functional link network." Case Western Reserve University School of Graduate Studies / OhioLINK, 1995. http://rave.ohiolink.edu/etdc/view?acc_num=case1061911553.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Brammall, Neil Howard. "An investigation into the use of linguistic context in cursive script recognition by computer." Thesis, Loughborough University, 1999. https://dspace.lboro.ac.uk/2134/7177.

Full text
Abstract:
The automatic recognition of hand-written text has been a goal for over thirty five years. The highly ambiguous nature of cursive writing (with high variability between not only different writers, but even between different samples from the same writer), means that systems based only on visual information are prone to errors. It is suggested that the application of linguistic knowledge to the recognition task may improve recognition accuracy. If a low-level (pattern recognition based) recogniser produces a candidate lattice (i.e. a directed graph giving a number of alternatives at each word position in a sentence), then linguistic knowledge can be used to find the 'best' path through the lattice. There are many forms of linguistic knowledge that may be used to this end. This thesis looks specifically at the use of collocation as a source of linguistic knowledge. Collocation describes the statistical tendency of certain words to co-occur in a language, within a defined range. It is suggested that this tendency may be exploited to aid automatic text recognition. The construction and use of a post-processing system incorporating collocational knowledge is described, as are a number of experiments designed to test the effectiveness of collocation as an aid to text recognition. The results of these experiments suggest that collocational statistics may be a useful form of knowledge for this application and that further research may produce a system of real practical use.
APA, Harvard, Vancouver, ISO, and other styles
18

Kunwar, Rituraj. "Incremental / Online Learning and its Application to Handwritten Character Recognition." Thesis, Griffith University, 2017. http://hdl.handle.net/10072/366964.

Full text
Abstract:
In real world scenarios where we use machine learning algorithms, we often have to deal with cases where input data changes its nature with time. In order to maintain the accuracy of the learning algorithm, we frequently have to retrain our learning system, thereby making the system inconvenient and unreliable. This problem can be solved by using learning algorithms which can learn continuously with time (incremental/ online learning). Another common problem of real-world learning scenarios that we often have to deal with is acquiring large amounts of data which is expensive and time consuming. Semi-supervised learning is the machine learning paradigm concerned with utilizing unlabeled data to improve the precision of classifier or regressor. Unlabeled data is a powerful and easily available resource and it should be utilized to build an accurate learning system. It has often been observed that there is a vast amount of redundancy in any huge, real-time database and it is not advisable to process every redundant sample to gain the same (already acquired) knowledge. Active learning is the learning setting which can handle this issue. Therefore in this research we propose an online semi-supervised learning framework which can learn actively. We have proposed an "online semi-supervised Random Naive Bayes (RNB)" classifier and as the name implies it can learn in an online manner and make use of both labeled and unlabeled data to learn. In order to boost accuracy we improved the network structure of NB (using Bayes net) to propose an Augmented Naive Bayes (ANB) classifier and achieved a substantial jump in accuracy. In order to reduce the processing of redundant data and achieve faster convergence of learning, we proposed to conduct incremental semi-supervised learning in active manner. We applied the proposed methods on the "Tamil script handwritten character recognition" problem and have obtained favorable results. Experimental results prove that our proposed online classifiers does as well as and sometimes better than its batch learning counterpart. And active learning helps to achieve learning convergence with much less number of samples.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
19

Furuhata, Takashi. "Exploring the relationship between English speaking subjects' verbal working memory and foreign word pronunciation and script recognition /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/7741.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Wahlberg, Fredrik. "Interpreting the Script : Image Analysis and Machine Learning for Quantitative Studies of Pre-modern Manuscripts." Doctoral thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-314211.

Full text
Abstract:
The humanities have for a long time been a collection of fields that have not gained from the advancements in computational power, as predicted by Moore´s law.  Fields like medicine, biology, physics, chemistry, geology and economics have all developed quantitative tools that take advantage of the exponential increase of processing power over time.  Recent advances in computerized pattern recognition, in combination with a rapid digitization of historical document collections around the world, is about to change this. The first part of this dissertation focuses on constructing a full system for finding handwritten words in historical manuscripts. A novel segmentation algorithm is presented, capable of finding and separating text lines in pre-modern manuscripts.  Text recognition is performed by translating the image data of the text lines into sequences of numbers, called features. Commonly used features are analysed and evaluated on manuscript sources from the Uppsala University library Carolina Rediviva and the US Library of Congress.  Decoding the text in the vast number of photographed manuscripts from our libraries makes computational linguistics and social network analysis directly applicable to historical sources. Hence, text recognition is considered a key technology for the future of computerized research methods in the humanities. The second part of this thesis addresses digital palaeography, using a computers superior capacity for endlessly performing measurements on ink stroke shapes. Objective criteria of character shapes only partly catches what a palaeographer use for assessing similarity. The palaeographer often gets a feel for the scribe's style.  This is, however, hard to quantify.  A method for identifying the scribal hands of a pre-modern copy of the revelations of saint Bridget of Sweden, using semi-supervised learning, is presented.  Methods for production year estimation are presented and evaluated on a collection with close to 11000 medieval charters.  The production dates are estimated using a Gaussian process, where the uncertainty is inferred together with the most likely production year. In summary, this dissertation presents several novel methods related to image analysis and machine learning. In combination with recent advances of the field, they enable efficient computational analysis of very large collections of historical documents.
q2b
APA, Harvard, Vancouver, ISO, and other styles
21

Al-Muhtaseb, Husni A. "Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/4426.

Full text
Abstract:
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.
King Fahd University of Petroleum and Minerals (KFUPM)
APA, Harvard, Vancouver, ISO, and other styles
22

Al-Muhtaseb, Husni Abdulghani. "Arabic text recognition of printed manuscripts : efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing." Thesis, University of Bradford, 2010. http://hdl.handle.net/10454/4426.

Full text
Abstract:
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms. This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems. Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques. Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time. Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images. In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected. The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase. Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%. Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved. To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.
APA, Harvard, Vancouver, ISO, and other styles
23

Nguyen, Trung Ky. "Génération d'histoires à partir de données de téléphone intelligentes : une approche de script Dealing with Imbalanced data sets for Human Activity Recognition using Mobile Phone sensors." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAS030.

Full text
Abstract:
Le script est une structure qui décrit une séquence stéréotypée d’événements ou d’actions survenant dans notre vie quotidienne. Les histoires utilisent des scripts , avec une ou plusieurs déviations intéressantes, qui nous permettent de mieux saisir les situations quotidiennes rapportées et les faits saillants du récit. Ainsi, la notion de script est très utile dans de nombreuses applications d’intelligence ambiante telles que la surveillance de la santé et les services d’urgence. Ces dernières années, l’avancement des technologies de détection et des systèmes intégrés permettent aux systèmes de santé de collecter en permanence les activités des êtres humains, en intégrant des capteurs dans des dispositifs portables (par exemple smart-phone ou smart-watch). La reconnaissance de l’activité humaine (HAR) a ainsi connue un essor important grâce notamment à des approches d’apprentissage automatique telles que le réseau neuronal ou le réseau bayésien. Ces avancées ouvre des perspectives qui vont au delà de la simple reconnaissance d’activités. Ce manuscrit défend la thèse selon laquelle ces données de capteurs portables peuvent être utilisées pour générer des récits articulés autour de scripts en utilisant l’apprentissage automatique. Il ne s’agit pas d’une tâche triviale en raison du grand écart sémantique entre les informations brutes de capteurs et les abstractions de haut niveau présente dans les récits. A notre connaissance, il n’existe toujours pas d’approche pour générer une histoire à partir de données de capteurs en utilisant l’apprentissage automatique, même si de nombreuses approches d’apprentissage automatique (réseaux de neurones convolutifs, réseaux de neurones profonds) ont été proposées pour la reconnaissance de l’activité humaine au cours des dernières années. Afin d’atteindre notre objectif, nous proposons premièrement dans cette thèse un nouveau cadre qui traite le problème des données non uniformément distribuées (problème du biais induit par des classes majoritaires par rapport aux classes minoritaires) basé sur un apprentissage actif associé à une technique de sur-échantillonnage afin d’améliorer la macro-exactitude de classification des modèles d’apprentissage classiques comme la perception multi-couche. Deuxièmement, nous présentons un nouveau système permettant de générer automatiquement des scripts à partir de données d’activité humaine à l’aide de l’apprentissage profond. Enfin, nous proposons une approche pour l’apprentissage de scripts à partir de textes en langage naturel capable d’exploiter l’information syntaxique et sémantique sur le contexte textuel des événements. Cette approche permet l’apprentissage de l’ordonnancement d’événements à partir d’histoires décrivant des situations typiques de vie quotidienne. Les performances des méthodes proposées sont systématiquement discutées sur une base expérimentale
Script is a structure describes an appropriate sequence of events or actions in our daily life. A story, is invoked a script with one or more interesting deviations, which allows us to deeper understand about what were happened in routine behaviour of our daily life. Therefore, it is essential in many ambient intelligence applications such as healthmonitoring and emergency services. Fortunately, in recent years, with the advancement of sensing technologies and embedded systems, which make health-care system possible to collect activities of human beings continuously, by integrating sensors into wearable devices (e.g., smart-phone, smart-watch, etc.). Hence, human activity recognition (HAR) has become a hot topic interest of research over the past decades. In order to do HAR, most researches used machine learning approaches such as Neural network, Bayesian network, etc. Therefore, the ultimate goal of our thesis is to generate such kind of stories or scripts from activity data of wearable sensors using machine learning approach. However, to best of our knowledge, it is not a trivial task due to very limitation of information of wearable sensors activity data. Hence, there is still no approach to generate script/story using machine learning, even though many machine learning approaches were proposed for HAR in recent years (e.g., convolutional neural network, deep neural network, etc.) to enhance the activity recognition accuracy. In order to achieve our goal, first of all in this thesis we proposed a novel framework, which solved for the problem of imbalanced data, based on active learning combined with oversampling technique so as to enhance the recognition accuracy of conventional machine learning models i.e., Multilayer Perceptron. Secondly, we introduce a novel scheme to automatically generate scripts from wearable sensor human activity data using deep learning models, and evaluate the generated method performance. Finally, we proposed a neural event embedding approach that is able to benefit from semantic and syntactic information about the textual context of events. The approach is able to learn the stereotypical order of events from sets of narrative describing typical situations of everyday life
APA, Harvard, Vancouver, ISO, and other styles
24

Busch, Andrew W. "Wavelet transform for texture analysis with application to document analysis." Thesis, Queensland University of Technology, 2004. https://eprints.qut.edu.au/15908/1/Andrew_Busch_Thesis.pdf.

Full text
Abstract:
Texture analysis is an important problem in machine vision, with applications in many fields including medical imaging, remote sensing (SAR), automated flaw detection in various products, and document analysis to name but a few. Over the last four decades many techniques for the analysis of textured images have been proposed in the literature for the purposes of classification, segmentation, synthesis and compression. Such approaches include analysis the properties of individual texture elements, using statistical features obtained from the grey-level values of the image itself, random field models, and multichannel filtering. The wavelet transform, a unified framework for the multiresolution decomposition of signals, falls into this final category, and allows a texture to be examined in a number of resolutions whilst maintaining spatial resolution. This thesis explores the use of the wavelet transform to the specific task of texture classification and proposes a number of improvements to existing techniques, both in the area of feature extraction and classifier design. By applying a nonlinear transform to the wavelet coefficients, a better characterisation can be obtained for many natural textures, leading to increased classification performance when using first and second order statistics of these coefficients as features. In the area of classifier design, a combination of an optimal discriminate function and a non-parametric Gaussian mixture model classifier is shown to experimentally outperform other classifier configurations. By modelling the relationships between neighbouring bands of the wavelet trans- form, more information regarding a texture can be obtained. Using such a representation, an efficient algorithm for the searching and retrieval of textured images from a database is proposed, as well as a novel set of features for texture classification. These features are experimentally shown to outperform features proposed in the literature, as well as provide increased robustness to small changes in scale. Determining the script and language of a printed document is an important task in the field of document processing. In the final part of this thesis, the use of texture analysis techniques to accomplish these tasks is investigated. Using maximum a posterior (MAP) adaptation, prior information regarding the nature of script images can be used to increase the accuracy of these methods. Novel techniques for estimating the skew of such documents, normalising text block prior to extraction of texture features and accurately classifying multiple fonts are also presented.
APA, Harvard, Vancouver, ISO, and other styles
25

Busch, Andrew W. "Wavelet Transform For Texture Analysis With Application To Document Analysis." Queensland University of Technology, 2004. http://eprints.qut.edu.au/15908/.

Full text
Abstract:
Texture analysis is an important problem in machine vision, with applications in many fields including medical imaging, remote sensing (SAR), automated flaw detection in various products, and document analysis to name but a few. Over the last four decades many techniques for the analysis of textured images have been proposed in the literature for the purposes of classification, segmentation, synthesis and compression. Such approaches include analysis the properties of individual texture elements, using statistical features obtained from the grey-level values of the image itself, random field models, and multichannel filtering. The wavelet transform, a unified framework for the multiresolution decomposition of signals, falls into this final category, and allows a texture to be examined in a number of resolutions whilst maintaining spatial resolution. This thesis explores the use of the wavelet transform to the specific task of texture classification and proposes a number of improvements to existing techniques, both in the area of feature extraction and classifier design. By applying a nonlinear transform to the wavelet coefficients, a better characterisation can be obtained for many natural textures, leading to increased classification performance when using first and second order statistics of these coefficients as features. In the area of classifier design, a combination of an optimal discriminate function and a non-parametric Gaussian mixture model classifier is shown to experimentally outperform other classifier configurations. By modelling the relationships between neighbouring bands of the wavelet trans- form, more information regarding a texture can be obtained. Using such a representation, an efficient algorithm for the searching and retrieval of textured images from a database is proposed, as well as a novel set of features for texture classification. These features are experimentally shown to outperform features proposed in the literature, as well as provide increased robustness to small changes in scale. Determining the script and language of a printed document is an important task in the field of document processing. In the final part of this thesis, the use of texture analysis techniques to accomplish these tasks is investigated. Using maximum a posterior (MAP) adaptation, prior information regarding the nature of script images can be used to increase the accuracy of these methods. Novel techniques for estimating the skew of such documents, normalising text block prior to extraction of texture features and accurately classifying multiple fonts are also presented.
APA, Harvard, Vancouver, ISO, and other styles
26

Cheung, Anthony Hing-lam. "Design and implementation of an Arabic optical character recognition system." Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36073/1/36073_Cheung_1998.pdf.

Full text
Abstract:
Character recognition is not a difficult task for humans who repeat the process thousands of times every day as they read papers or books. However, after more than 40 years of intensive investigation, there is still no machine that can recognize alphabetic characters as well as humans. Optical Character Recognition (OCR) is the process of converting a raster image representation of a document into a format that a computer can process. It involves many sub-disciplines of computer science including digital image processing, pattern recognition, natural language processing, artificial intelligence, and database systems. Applications of OCR systems are broad and include postal code recognition in postal departments, automatic document entries in companies and government departments, cheque sorting in banks, machine translation, etc. The objective of this thesis is to design an optical character recognition system which can recognize Arabic script. This system has to be: 1) accurate: with a recognition accuracy of 953; 2) robust: able to recognize two different Arabic fonts; and 3) efficient: it should be a real time system. This proposed system is composed of five image processing processes: 1) Image Acquisition; 2) Preprocessing; 3) Segmentation; 4) Feature Extraction; and 5) Classification. The recognized results are presented to users via a window-based user-interface. Thus, they can control the system, recognize and edit documents by a click on the mouse button. A thinning algorithm, a word segmentation algorithm and a recognition based character segmentation algorithm for Arabic script have been proposed to increase the recognition accuracy of the system. The Arabic word segmentation algorithm successfully segments the horizontally overlapped Arabic words, whereas the recognition-based character segmentation algorithm replaces the classical character segmentation method and raises the level of accuracy of recognition of the proposed system. These blocks have been integrated. Results to test the requirements of accuracy, robustness and efficiency are presented. Finally, some extensions to the system have also been proposed.
APA, Harvard, Vancouver, ISO, and other styles
27

Nel, Emli-Mari. "Estimating the Pen Trajectories of Static Handwritten Scripts using Hidden Markov Models." Thesis, Link to the online version, 2005. http://hdl.handle.net/10019/1140.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Kesiman, Made Windu Antara. "Document image analysis of Balinese palm leaf manuscripts." Thesis, La Rochelle, 2018. http://www.theses.fr/2018LAROS013/document.

Full text
Abstract:
Les collections de manuscrits sur feuilles de palmier sont devenues une partie intégrante de la culture et de la vie des peuples de l'Asie du Sud-Est. Avec l’augmentation des projets de numérisation des documents patrimoniaux à travers le monde, les collections de manuscrits sur feuilles de palmier ont finalement attiré l'attention des chercheurs en analyse d'images de documents (AID). Les travaux de recherche menés dans le cadre de cette thèse ont porté sur les manuscrits d'Indonésie, et en particulier sur les manuscrits de Bali. Nos travaux visent à proposer des méthodes d’analyse pour les manuscrits sur feuilles de palmier. En effet, ces collections offrent de nouveaux défis car elles utilisent, d’une part, un support spécifique : les feuilles de palmier, et d’autre part, un langage et un script qui n'ont jamais été analysés auparavant. Prenant en compte, le contexte et les conditions de stockage des collections de manuscrits sur feuilles de palmier à Bali, nos travaux ont pour objectif d’apporter une valeur ajoutée aux manuscrits numérisés en développant des outils pour analyser, translittérer et indexer le contenu des manuscrits sur feuilles de palmier. Ces systèmes rendront ces manuscrits plus accessibles, lisibles et compréhensibles à un public plus large ainsi que pour les chercheurs et les étudiants du monde entier. Cette thèse a permis de développer un système d’AID pour les images de documents sur feuilles de palmier, comprenant plusieurs tâches de traitement d'images : numérisation du document, construction de la vérité terrain, binarisation, segmentation des lignes de texte et des glyphes, la reconnaissance des glyphes et des mots, translittération et l’indexation de document. Nous avons ainsi créé le premier corpus et jeu de données de manuscrits balinais sur feuilles de palmier. Ce corpus est actuellement disponible pour les chercheurs en AID. Nous avons également développé un système de reconnaissance des glyphes et un système de translittération automatique des manuscrits balinais. Cette thèse propose un schéma complet de reconnaissance de glyphes spatialement catégorisé pour la translittération des manuscrits balinais sur feuilles de palmier. Le schéma proposé comprend six tâches : la segmentation de lignes de texte et de glyphes, un processus de classification de glyphes, la détection de la position spatiale pour la catégorisation des glyphes, une reconnaissance globale et catégorisée des glyphes, la sélection des glyphes et la translittération basée sur des règles phonologiques. La translittération automatique de l'écriture balinaise nécessite de mettre en œuvre des mécanismes de représentation des connaissances et des règles phonologiques. Nous proposons un système de translittération sans segmentation basée sur la méthode LSTM. Celui-ci a été testé sur des données réelles et synthétiques. Il comprend un schéma d'apprentissage à deux niveaux pouvant s’appliquer au niveau du mot et au niveau de la ligne de texte
The collection of palm leaf manuscripts is an important part of Southeast Asian people’s culture and life. Following the increasing of the digitization projects of heritage documents around the world, the collection of palm leaf manuscripts in Southeast Asia finally attracted the attention of researchers in document image analysis (DIA). The research work conducted for this dissertation focused on the heritage documents of the collection of palm leaf manuscripts from Indonesia, especially the palm leaf manuscripts from Bali. This dissertation took part in exploring DIA researches for palm leaf manuscripts collection. This collection offers new challenges for DIA researches because it uses palm leaf as writing media and also with a language and script that have never been analyzed before. Motivated by the contextual situations and real conditions of the palm leaf manuscript collections in Bali, this research tried to bring added value to digitized palm leaf manuscripts by developing tools to analyze, to transliterate and to index the content of palm leaf manuscripts. These systems aim at making palm leaf manuscripts more accessible, readable and understandable to a wider audience and, to scholars and students all over the world. This research developed a DIA system for document images of palm leaf manuscripts, that includes several image processing tasks, beginning with digitization of the document, ground truth construction, binarization, text line and glyph segmentation, ending with glyph and word recognition, transliteration and document indexing and retrieval. In this research, we created the first corpus and dataset of the Balinese palm leaf manuscripts for the DIA research community. We also developed the glyph recognition system and the automatic transliteration system for the Balinese palm leaf manuscripts. This dissertation proposed a complete scheme of spatially categorized glyph recognition for the transliteration of Balinese palm leaf manuscripts. The proposed scheme consists of six tasks: the text line and glyph segmentation, the glyph ordering process, the detection of the spatial position for glyph category, the global and categorized glyph recognition, the option selection for glyph recognition and the transliteration with phonological rules-based machine. An implementation of knowledge representation and phonological rules for the automatic transliteration of Balinese script on palm leaf manuscript is proposed. The adaptation of a segmentation-free LSTM-based transliteration system with the generated synthetic dataset and the training schemes at two different levels (word level and text line level) is also proposed
APA, Harvard, Vancouver, ISO, and other styles
29

Li, Guo Bin, and 李國彬. "Neural networks for connected cursive script word recognition." Thesis, 1995. http://ndltd.ncl.edu.tw/handle/14804855937693000302.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Mohapatra, Ramesh Kumar. "Handwritten Character Recognition of a Vernacular Language: The Odia Script." Thesis, 2016. http://ethesis.nitrkl.ac.in/8322/1/2016_Phd._511cs402_Handwritten.pdf.

Full text
Abstract:
Optical Character Recognition, i.e., OCR taking into account the principle of applying electronic or mechanical translation of images from printed, manually written or typewritten sources to editable version. As of late, OCR technology has been utilized in most of the industries for better management of various documents. OCR helps to edit the text, allow us to search for a word or phrase, and store it more compactly in the computer memory for future use and moreover, it can be processed by other applications. In India, a couple of organizations have designed OCR for some mainstream Indic dialects, for example, Devanagari, Hindi, Bangla and to some extent Telugu, Tamil, Gurmukhi, Odia, etc. However, it has been observed that the progress for Odia script recognition is quite less when contrasted with different dialects. Any recognition process works on some nearby standard databases. Till now, no such standard database available in the literature for Odia script. Apart from the existing standard databases for other Indic languages, in this thesis, we have designed databases on handwritten Odia Digit, and character for the simulation of the proposed schemes. In this thesis, four schemes have been suggested, one for the recognition of Odia digit and other three for atomic Odia character. Various issues of handwritten character recognition have been examined including feature extraction, the grouping of samples based on some characteristics, and designing classifiers. Also, different features such as statistical as well as structural of a character have been studied. It is not necessary that the character written by a person next time would always be of same shape and stroke. Hence, variability in the personal writing of different individual makes the character recognition quite challenging. Standard classifiers have been utilized for the recognition of Odia character set. An array of Gabor filters has been employed for recognition of Odia digits. In this regard, each image is divided into four blocks of equal size. Gabor filters with various scales and orientations have been applied to these sub-images keeping other filter parameters constant. The average energy is computed for each transformed image to obtain a feature vector for each digit. Further, a Back Propagation Neural Network (BPNN) has been employed to classify the samples taking the feature vector as input. In addition, the proposed scheme has also been tested on standard digit databases like MNIST and USPS. Toward the end of this part, an application has been intended to evaluate simple arithmetic equation. viii A multi-resolution scheme has been suggested to extract features from Odia atomic character and recognize them using the back propagation neural network. It has been observed that few Odia characters have a vertical line present toward the end. It helps in dividing the whole dataset into two subgroups, in particular, Group I and Group II such that all characters in Group I have a vertical line and rest are in Group II. The two class classification problem has been tackled by a single layer perceptron. Besides, the two-dimensional Discrete Orthogonal S-Transform (DOST) coefficients are extracted from images of each group, subsequently, Principal Component Analysis (PCA) has been applied to find significant features. For each group, a separate BPNN classifier is utilized to recognize the character set.
APA, Harvard, Vancouver, ISO, and other styles
31

Chang, Yu-Wen, and 張郁雯. "The Research of Face Recognition by Using Movie Script Social Network." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/53847084045331596619.

Full text
Abstract:
碩士
明新科技大學
資訊管理系碩士班
103
In recent years, biometric technology has received a lot of attention and lead to the face recognition, the most important research topics of biometric technology, to be applied in many kinds of fields, such as access control, authentication and image character recognition. However, the conventional face recognition adapts simple image resolving and only obtain confined achievement. Therefore, using auxiliary information to reinforce face recognition becomes the emerging trend. This study proposes a method to assist movie characters identification by using movie script social network. First of all, the EmguCV image library is adapted to detect face contour and face images are captured into the database. Next, the facial feature points of these images are extracted by OpenCV and are clustered by EM algorithm. Finally, the movie script social network and image clusters are mapped to obtain the character names. Experimental results show that the character image clusters conform with the movie script social networks upto 76%. The proposed scheme can be used to assist face recognition in the application who has social relationship, for example, suspect recognition and movie character identification.
APA, Harvard, Vancouver, ISO, and other styles
32

Ghosh, Debashis. "A Possibilistic Approach To Handwritten Script Identification Via Morphological Methods For Pattern Representation." Thesis, 1999. https://etd.iisc.ac.in/handle/2005/1673.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Ghosh, Debashis. "A Possibilistic Approach To Handwritten Script Identification Via Morphological Methods For Pattern Representation." Thesis, 1999. http://etd.iisc.ernet.in/handle/2005/1673.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Ying-Zhoug, Chen, and 陳映舟. "Segmentation and Recognition of Chinese Characters in Cursive Script in Calligraphy Documents." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/31585029833017674220.

Full text
Abstract:
碩士
國立交通大學
資訊工程系
89
The calligraphy is one of the quintessence of Chinese culture. The Chinese cursive script is a quite complicated style in calligraphy script styles. In this thesis, we design an automatic segmentation and recognition tool for Chinese characters in cursive script. Thus, we can preserve the Chinese characters in cursive script in a database. The input of our system is binary Chinese cursive script calligraphy image without noises. Our system contains two major modules: characters segmentation and characters recognition. In the characters segmentation module, we first construct a shortest distance map that contains each shortest path for each point of the input image. Then the shortest distance map is combined with the vertical projection to find the vertical text line segmentation paths. Next, we apply the shortest distance map in each text line to obtain initial horizontal character segmentation paths. Finally, we reduce the horizontal character segmentation paths by using the path constraints and cursive script features. In the characters recognition module, we design a good OCR engine that has a high recognition rate for Chinese characters in cursive script. We use four statistical features: contour directional features, crossing count features, Oka''s cellular features and Peripheral background area features. These four features are measured with five feature distance measurements to select the OCR kernel with the highest recogniztion rate of our testing characters in cursive script. In our experiments, we select 55 calligraphy images from five different authors. The success rates are 98.23% in vertical text line segmentation, and 84.06% in horizontal character segmentation.
APA, Harvard, Vancouver, ISO, and other styles
35

Pati, Peeta Basa. "Analysis Of Multi-lingual Documents With Complex Layout And Content." Thesis, 2006. https://etd.iisc.ac.in/handle/2005/346.

Full text
Abstract:
A document image, beside text, may contain pictures, graphs, signatures, logos, barcodes, hand-drawn sketches and/or seals. Further, the text blocks in an image may be in Manhattan or any complex layout. Document Layout Analysis is an important preprocessing step before subjecting any such image to OCR. Here, the image with complex layout and content is segmented into its constituent components. For many present day applications, separating the text from the non-text blocks is sufficient. This enables the conversion of the text elements present in the image to their corresponding editable form. In this work, an effort has been made to separate the text areas from the various kinds of possible non-text elements. The document images may have been obtained from a scanner or camera. If the source is a scanner, there is control on the scanning resolution, and lighting of the paper surface. Moreover, during the scanning process, the paper surface remains parallel to the sensor surface. However, when an image is obtained through a camera, these advantages are no longer available. Here, an algorithm is proposed to separate the text present in an image from the clutter, irrespective of the imaging technology used. This is achieved by using both the structural and textural information of the text present in the gray image. A bank of Gabor filters characterizes the statistical distribution of the text elements in the document. A connected component based technique removes certain types of non-text elements from the image. When a camera is used to acquire document images, generally, along with the structural and textural information of the text, color information is also obtained. It can be assumed that text present in an image has a certain amount of color homogeneity. So, a graph-theoretical color clustering scheme is employed to segment the iso-color components of the image. Each iso-color image is then analyzed separately for its structural and textural properties. The results of such analyses are merged with the information obtained from the gray component of the image. This helps to separate the colored text areas from the non-text elements. The proposed scheme is computationally intensive, because the separation of the text from non-text entities is performed at the pixel level Since any entity is represented by a connected set of pixels, it makes more sense to carry out the separation only at specific points, selected as representatives of their neighborhood. Harris' operator evaluates an edge-measure at each pixel and selects pixels, which are locally rich on this measure. These points are then employed for separating text from non-text elements. Many government documents and forms in India are bi-lingual or tri-lingual in nature. Further, in school text books, it is common to find English words interspersed within sentences in the main Indian language of the book. In such documents, successive words in a line of text may be of different scripts (languages). Hence, for OCR of these documents, the script must be recognized at the level of words, rather than lines or paragraphs. A database of about 20,000 words each from 11 Indian scripts1 is created. This is so far the largest database of Indian words collected and deployed for script recognition purpose. Here again, a bank of 36 Gabor filters is used to extract the feature vector which represents the script of the word. The effectiveness of Gabor features is compared with that of DCT and it is found that Gabor features marginally outperform the DOT. Simple, linear and non-linear classifiers are employed to classify the word in the feature space. It is assumed that a scheme developed to recognize the script of the words would work equally fine for sentences and paragraphs. This assumption has been verified with supporting results. A systematic study has been conducted to evaluate and compare the accuracy of various feature-classifier combinations for word script recognition. We have considered the cases of bi-script and tri-script documents, which are largely available. Average recognition accuracies for bi-script and tri-script cases are 98.4% and 98.2%, respectively. A hierarchical blind script recognizer, involving all eleven scripts has been developed and evaluated, which yields an average accuracy of 94.1%. The major contributions of the thesis are: • A graph theoretic color clustering scheme is used to segment colored text. • A scheme is proposed to separate text from the non-text content of documents with complex layout and content, captured by scanner or camera. • Computational complexity is reduced by performing the separation task on a selected set of locally edge-rich points. • Script identification at word level is carried out using different feature classifier combinations. Gabor features with SVM classifier outperforms any other feature-classifier combinations. A hierarchical blind script recognition algorithm, involving the recognition of 11 Indian scripts, is developed. This structure employs the most efficient feature-classifier combination at each individual nodal point of the tree to maximize the system performance. A sequential forward feature selection algorithm is employed to. select the most discriminating features, in a case by case basis, for script-recognition. The 11 scripts are Bengali, Devanagari, Gujarati, Kannada, Malayalam, Odiya, Puniabi, Roman. Tamil, Telugu and Urdu.
APA, Harvard, Vancouver, ISO, and other styles
36

Pati, Peeta Basa. "Analysis Of Multi-lingual Documents With Complex Layout And Content." Thesis, 2006. http://hdl.handle.net/2005/346.

Full text
Abstract:
A document image, beside text, may contain pictures, graphs, signatures, logos, barcodes, hand-drawn sketches and/or seals. Further, the text blocks in an image may be in Manhattan or any complex layout. Document Layout Analysis is an important preprocessing step before subjecting any such image to OCR. Here, the image with complex layout and content is segmented into its constituent components. For many present day applications, separating the text from the non-text blocks is sufficient. This enables the conversion of the text elements present in the image to their corresponding editable form. In this work, an effort has been made to separate the text areas from the various kinds of possible non-text elements. The document images may have been obtained from a scanner or camera. If the source is a scanner, there is control on the scanning resolution, and lighting of the paper surface. Moreover, during the scanning process, the paper surface remains parallel to the sensor surface. However, when an image is obtained through a camera, these advantages are no longer available. Here, an algorithm is proposed to separate the text present in an image from the clutter, irrespective of the imaging technology used. This is achieved by using both the structural and textural information of the text present in the gray image. A bank of Gabor filters characterizes the statistical distribution of the text elements in the document. A connected component based technique removes certain types of non-text elements from the image. When a camera is used to acquire document images, generally, along with the structural and textural information of the text, color information is also obtained. It can be assumed that text present in an image has a certain amount of color homogeneity. So, a graph-theoretical color clustering scheme is employed to segment the iso-color components of the image. Each iso-color image is then analyzed separately for its structural and textural properties. The results of such analyses are merged with the information obtained from the gray component of the image. This helps to separate the colored text areas from the non-text elements. The proposed scheme is computationally intensive, because the separation of the text from non-text entities is performed at the pixel level Since any entity is represented by a connected set of pixels, it makes more sense to carry out the separation only at specific points, selected as representatives of their neighborhood. Harris' operator evaluates an edge-measure at each pixel and selects pixels, which are locally rich on this measure. These points are then employed for separating text from non-text elements. Many government documents and forms in India are bi-lingual or tri-lingual in nature. Further, in school text books, it is common to find English words interspersed within sentences in the main Indian language of the book. In such documents, successive words in a line of text may be of different scripts (languages). Hence, for OCR of these documents, the script must be recognized at the level of words, rather than lines or paragraphs. A database of about 20,000 words each from 11 Indian scripts1 is created. This is so far the largest database of Indian words collected and deployed for script recognition purpose. Here again, a bank of 36 Gabor filters is used to extract the feature vector which represents the script of the word. The effectiveness of Gabor features is compared with that of DCT and it is found that Gabor features marginally outperform the DOT. Simple, linear and non-linear classifiers are employed to classify the word in the feature space. It is assumed that a scheme developed to recognize the script of the words would work equally fine for sentences and paragraphs. This assumption has been verified with supporting results. A systematic study has been conducted to evaluate and compare the accuracy of various feature-classifier combinations for word script recognition. We have considered the cases of bi-script and tri-script documents, which are largely available. Average recognition accuracies for bi-script and tri-script cases are 98.4% and 98.2%, respectively. A hierarchical blind script recognizer, involving all eleven scripts has been developed and evaluated, which yields an average accuracy of 94.1%. The major contributions of the thesis are: • A graph theoretic color clustering scheme is used to segment colored text. • A scheme is proposed to separate text from the non-text content of documents with complex layout and content, captured by scanner or camera. • Computational complexity is reduced by performing the separation task on a selected set of locally edge-rich points. • Script identification at word level is carried out using different feature classifier combinations. Gabor features with SVM classifier outperforms any other feature-classifier combinations. A hierarchical blind script recognition algorithm, involving the recognition of 11 Indian scripts, is developed. This structure employs the most efficient feature-classifier combination at each individual nodal point of the tree to maximize the system performance. A sequential forward feature selection algorithm is employed to. select the most discriminating features, in a case by case basis, for script-recognition. The 11 scripts are Bengali, Devanagari, Gujarati, Kannada, Malayalam, Odiya, Puniabi, Roman. Tamil, Telugu and Urdu.
APA, Harvard, Vancouver, ISO, and other styles
37

Kumar, Deepak. "Methods for Text Segmentation from Scene Images." Thesis, 2014. http://etd.iisc.ac.in/handle/2005/2693.

Full text
Abstract:
Recognition of text from camera-captured scene/born-digital images help in the development of aids for the blind, unmanned navigation systems and spam filters. However, text in such images is not confined to any page layout, and its location within in the image is random in nature. In addition, motion blur, non-uniform illumination, skew, occlusion and scale-based degradations increase the complexity in locating and recognizing the text in a scene/born-digital image. Text localization and segmentation techniques are proposed for the born-digital image data set. The proposed OTCYMIST technique won the first place and placed in the third position for its performance on the text segmentation task in ICDAR 2011 and ICDAR 2013 robust reading competitions for born-digital image data set, respectively. Here, Otsu’s binarization and Canny edge detection are separately carried out on the three colour planes of the image. Connected components (CC’s) obtained from the segmented image are pruned based on thresholds applied on their area and aspect ratio. CC’s with sufficient edge pixels are retained. The centroids of the individual CC’s are used as nodes of a graph. A minimum spanning tree is built using these nodes of the graph. Long edges are broken from the minimum spanning tree of the graph. Pairwise height ratio is used to remove likely non-text components. CC’s are grouped based on their proximity in the horizontal direction to generate bounding boxes (BB’s) of text strings. Overlapping BB’s are removed using an overlap area threshold. Non-overlapping and minimally overlapping BB’s are used for text segmentation. These BB’s are split vertically to localize text at the word level. A word cropped from a document image can easily be recognized using a traditional optical character recognition (OCR) engine. However, recognizing a word, obtained by manually cropping a scene/born-digital image, is not trivial. Existing OCR engines do not handle these kinds of scene word images effectively. Our intention is to first segment the word image and then pass it to the existing OCR engines for recognition. In two aspects, it is advantageous: it avoids building a character classifier from scratch and reduces the word recognition task to a word segmentation task. Here, we propose two bottom-up approaches for the task of word segmentation. These approaches choose different features at the initial stage of segmentation. Power-law transform (PLT) was applied to the pixels of the gray scale born-digital images to non-linearly modify the histogram. The recognition rate achieved on born-digital word images is 82.9%, which is 20% more than the top performing entry (61.5%) in ICDAR 2011 robust reading competition. In addition, we explored applying PLT to the colour planes such as red, green, blue, intensity and lightness plane by varying the gamma value. We call this technique as Nonlinear enhancement and selection of plane (NESP) for optimal segmentation, which is an improvement over PLT. NESP chooses a particular plane with a proper gamma value based on Fisher discrimination factor. The recognition rate is 72.8% for scene images of ICDAR 2011 robust reading competition, which is 30% higher than the best entry (41.2%). The recognition rate is 81.7% and 65.9% for born-digital and scene images of ICDAR 2013 robust reading competition, respectively, using NESP. Another technique, midline analysis and propagation of segmentation (MAPS), has also been proposed. Here, the middle row pixels of the gray scale image are first segmented and the statistics of the segmented pixels are used to assign text and non-text labels to the rest of the image pixels using min-cut method. Gaussian model is fitted on the middle row segmented pixels before the assignment of other pixels. In MAPS, we assume the middle row pixels are least affected by any of the degradations. This assumption is validated by the good word recognition rate of 71.7% on ICDAR 2011 robust reading competition for scene images. The recognition rate is 83.8% and 66.0% for born-digital and scene images of ICDAR 2013 robust reading competition, respectively, using MAPS. The best reported results for ICDAR 2003 word images is 61.1% using custom lexicons containing the list of test words. On the other hand, NESP and MAPS achieve 66.2% and 64.5% for ICDAR 2003 word images without using any lexicon. By using similar custom lexicon, the recognition rates for ICDAR 2003 word images go up to 74.9% and 74.2% for NESP and MAPS methods, respectively. In place of passing an image segmented by a method, manually segmented word image is submitted to an OCR engine for benchmarking maximum possible recognition rate for each database. The recognition rates of the proposed methods and the benchmark results are reported on the seven publicly available word image data sets and compared with these of reported results in the literature. Since no good Kannada OCR is available, a classifier is designed to recognize Kannada characters and words from Chars74k data set and our own image collection, respectively. Discrete cosine transform (DCT) and block DCT are used as features to train separate classifiers. Kannada words are segmented using the same techniques (MAPS and NESP) and further segmented into groups of components, since a Kannada character may be represented by a single component or a group of components in an image. The recognition rate on Kannada words is reported for different features with and without the use of a lexicon. The obtained recognition performance for Kannada character recognition (11.4%) is three times the best performance (3.5%) reported in the literature.
APA, Harvard, Vancouver, ISO, and other styles
38

Kumar, Deepak. "Methods for Text Segmentation from Scene Images." Thesis, 2014. http://etd.iisc.ernet.in/handle/2005/2693.

Full text
Abstract:
Recognition of text from camera-captured scene/born-digital images help in the development of aids for the blind, unmanned navigation systems and spam filters. However, text in such images is not confined to any page layout, and its location within in the image is random in nature. In addition, motion blur, non-uniform illumination, skew, occlusion and scale-based degradations increase the complexity in locating and recognizing the text in a scene/born-digital image. Text localization and segmentation techniques are proposed for the born-digital image data set. The proposed OTCYMIST technique won the first place and placed in the third position for its performance on the text segmentation task in ICDAR 2011 and ICDAR 2013 robust reading competitions for born-digital image data set, respectively. Here, Otsu’s binarization and Canny edge detection are separately carried out on the three colour planes of the image. Connected components (CC’s) obtained from the segmented image are pruned based on thresholds applied on their area and aspect ratio. CC’s with sufficient edge pixels are retained. The centroids of the individual CC’s are used as nodes of a graph. A minimum spanning tree is built using these nodes of the graph. Long edges are broken from the minimum spanning tree of the graph. Pairwise height ratio is used to remove likely non-text components. CC’s are grouped based on their proximity in the horizontal direction to generate bounding boxes (BB’s) of text strings. Overlapping BB’s are removed using an overlap area threshold. Non-overlapping and minimally overlapping BB’s are used for text segmentation. These BB’s are split vertically to localize text at the word level. A word cropped from a document image can easily be recognized using a traditional optical character recognition (OCR) engine. However, recognizing a word, obtained by manually cropping a scene/born-digital image, is not trivial. Existing OCR engines do not handle these kinds of scene word images effectively. Our intention is to first segment the word image and then pass it to the existing OCR engines for recognition. In two aspects, it is advantageous: it avoids building a character classifier from scratch and reduces the word recognition task to a word segmentation task. Here, we propose two bottom-up approaches for the task of word segmentation. These approaches choose different features at the initial stage of segmentation. Power-law transform (PLT) was applied to the pixels of the gray scale born-digital images to non-linearly modify the histogram. The recognition rate achieved on born-digital word images is 82.9%, which is 20% more than the top performing entry (61.5%) in ICDAR 2011 robust reading competition. In addition, we explored applying PLT to the colour planes such as red, green, blue, intensity and lightness plane by varying the gamma value. We call this technique as Nonlinear enhancement and selection of plane (NESP) for optimal segmentation, which is an improvement over PLT. NESP chooses a particular plane with a proper gamma value based on Fisher discrimination factor. The recognition rate is 72.8% for scene images of ICDAR 2011 robust reading competition, which is 30% higher than the best entry (41.2%). The recognition rate is 81.7% and 65.9% for born-digital and scene images of ICDAR 2013 robust reading competition, respectively, using NESP. Another technique, midline analysis and propagation of segmentation (MAPS), has also been proposed. Here, the middle row pixels of the gray scale image are first segmented and the statistics of the segmented pixels are used to assign text and non-text labels to the rest of the image pixels using min-cut method. Gaussian model is fitted on the middle row segmented pixels before the assignment of other pixels. In MAPS, we assume the middle row pixels are least affected by any of the degradations. This assumption is validated by the good word recognition rate of 71.7% on ICDAR 2011 robust reading competition for scene images. The recognition rate is 83.8% and 66.0% for born-digital and scene images of ICDAR 2013 robust reading competition, respectively, using MAPS. The best reported results for ICDAR 2003 word images is 61.1% using custom lexicons containing the list of test words. On the other hand, NESP and MAPS achieve 66.2% and 64.5% for ICDAR 2003 word images without using any lexicon. By using similar custom lexicon, the recognition rates for ICDAR 2003 word images go up to 74.9% and 74.2% for NESP and MAPS methods, respectively. In place of passing an image segmented by a method, manually segmented word image is submitted to an OCR engine for benchmarking maximum possible recognition rate for each database. The recognition rates of the proposed methods and the benchmark results are reported on the seven publicly available word image data sets and compared with these of reported results in the literature. Since no good Kannada OCR is available, a classifier is designed to recognize Kannada characters and words from Chars74k data set and our own image collection, respectively. Discrete cosine transform (DCT) and block DCT are used as features to train separate classifiers. Kannada words are segmented using the same techniques (MAPS and NESP) and further segmented into groups of components, since a Kannada character may be represented by a single component or a group of components in an image. The recognition rate on Kannada words is reported for different features with and without the use of a lexicon. The obtained recognition performance for Kannada character recognition (11.4%) is three times the best performance (3.5%) reported in the literature.
APA, Harvard, Vancouver, ISO, and other styles
39

Anil, Prasad M. N. "Segmentation Strategies for Scene Word Images." Thesis, 2014. http://etd.iisc.ac.in/handle/2005/2889.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Anil, Prasad M. N. "Segmentation Strategies for Scene Word Images." Thesis, 2014. http://hdl.handle.net/2005/2889.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Kasisopa, Benjawan. "Reading without spaces between words : eye movements in reading Thai." Thesis, 2011. http://handle.uws.edu.au:8081/1959.7/496076.

Full text
Abstract:
Studies of eye movements in reading alphabetic writing system languages, such as English, suggest that the optimal viewing position (OVP), the most effective target in each word that allows fastest word processing, is the word centre. In alphabetic languages with spaces between the words, research has shown that readers‟ preferred viewing location (PVL) is to the left of the OVP. It appears that spaces between words is the most salient low-level visual processing cue to guide the eyes during reading, for when the spaces are removed from the text of alphabetic spaced languages, reading rate decreases by approximately 35% and the PVL shifts dramatically from the word-centre towards the word-beginning. Although these finding are widely accepted, it is unclear how well such results generalize to languages without spaces between words – scriptio continua alphabetic languages. Thai is a good model of a scriptio continua language in which to investigate eye movements during reading, not only because it is written without spaces between words but because its orthography is also quite complicated in terms of visual information. In brief, characters such as vowels, tones, and other diacritics or even some parts of the consonants can be written above or below the main horizontal line. Reilly et al. (2003) surprisingly found that Thai adult readers also target the word centre during saccadic eye movements, even though there is no spacing to help indicate word boundaries. They suggested that the relative position-specific frequency of occurrence of final and initial characters serve as visual cues to guide eye movements of Thai readers. Analysis of the frequency of initial and final characters calculated for the texts used in their experiment confirm their suggestion that participants‟ landing site location tended to be closer to the word centre if the position-specific frequency of the initial and especially final characters are relatively high. The aim of this thesis was to test the effects of low-level visual distinctive features in Thai orthography – namely i) the relative frequency of occurrence of characters in the initial and final positions, and ii) spaces between words – on reading time, eye movement patterns and control, and fixation patterns of native Thai readers, both children and adults. Experiment 1 involved studies of reading time with a group of adults and four groups of children (1st, 2nd, 5th and 6th Graders, with half in each group being good and half being poor readers). This experiment focussed particularly on the start and end characters of words. It was found that relative frequency of occurrence of characters in the word-start and word-end positions had significant effects on reading time and reading accuracy of Thai participants across all ages. Higher frequency characters, especially word-start characters helped reduce reading time and spacing between words facilitated reading in general as indicated by shorter reading times, especially for poor reader young children. Differences due to groups and spacing decreased as the age of participants increased and their reading skills improved. Experiments 2 and 3 involved precise measurements of eye movements using the EyeLink II apparatus and followed up on the effects found in Experiment 1, those of position-specific frequency of word-start and word-end characters. In Experiment 2, adults of lower and higher education levels were tested on unspaced and spaced text reading either silently or aloud. It was found that Thai readers‟ PVL was at or near the word centre in all conditions. The presence or absence of spaces between words did not cause any dramatic changes to this PVL, like those found in the eye movements of English language readers presented with the unusual unspaced text. That is, for Thai readers reading normal unspaced text and unusual spaced text, the oculomotor patterns were the same in contrast to the dramatic change in English readers‟ eye movements when faced with the unusual unspaced text (Rayner et al., 1998). Nevertheless, in concert with the Experiment 1 reading time studies, spaces between words did allow faster reading in Thai readers; there were shorter first fixations and gaze durations for spaced than unspaced text. In addition, for reading aloud the PVL was closer to word start than for silent reading, and first fixation and gaze duration were longer for reading aloud. Relative frequency of characters especially at word-start position had significant effects on landing site location of Thai adults (skilled readers), i.e., higher start character frequency allowed participants to land their eyes at the PVL. Word-end character frequency had less effect on landing site but stronger effects on fixation time of the participants; both first fixation and gaze durations on words with higher end character frequency were shorter than those with the lower frequency. In Experiment 3, with two groups of children the results were similar although the landing site of the younger child participants was a bit further to the left of the word centre than it was for older children and adults, but still too far into the word to be designated as the word-initial area. Unlike the results of the adults‟ eye movements, spaces between words had significant effects on landing site location especially in younger children. Spaces facilitated young children’s oculomotor controls and assisted in lading their eyes closer to the OVP. There were no significant main effects of character frequency on landing site location of Thai children on the target words. However, frequency of characters in both word-start and word-end positions were used by younger child readers if there were no visible visual cue, i.e., spaces, available. Thus younger children relied on low-level visual information such as spaces between words more when reading. This may be because literacy teaching in Thai starts with spaced texts therefore younger children were more familiar with spaced text and try to use this information to locate word boundaries first before moving to the next resource such as the characters at word-start and word-end position. Generally the results of this thesis show that Thai readers, child and adult, use the same oculomotor controls when reading spaced and unspaced text. Adding spaces into the text does not change the PVL which remains near the OVP at or near word centre and similar to that of native readers of spaced alphabetic languages; however spacing does facilitate the landing site of younger children to be closer to the OVP. The developmental trend of eye movements in reading Thai seems to be that Thai readers rely less on visible visual cue as their reading skills increase at which point spaces between words facilitate reading time (first fixation and gaze duration) of skilled readers. Such results show that spaces between words are not essential for optimal eye movements in Thai. However, as spaces between words result in a decrease in reading time, spaces may aid in word recognition. These results have important implications for models of eye movements in reading which, at present, do not explicitly account for reading in scriptio continua languages. Additionally, two more reading time studies were conducted to investigate other distinctive features of Thai orthography (and details are given in Appendix C). In Experiment A, the focus was the feedback consistency versus inconsistency of initial and final characters. Similar results to Experiment 1 were found in terms of spacing conditions of the texts. Feedback inconsistent grapheme-phoneme relationships between letters and sounds slowed reading time for both initial and final consonants but there were more errors for final consonants, possibly due to significantly fewer final phonemes in the Thai phonological system. In Experiment B, the transparency of tone realisations was investigated. It was found that participants read words with transparent tone realisations faster and made fewer errors than for words with opaque tone realisations. Together these results show that reading time is sensitive to the influence of features of particular characters (frequency, grapheme-phoneme consistency, transparency of tone realisation) and also to spacing between words. Even though these features had some significant effects on reading time and reading accuracy of Thai readers, we still did not know if they would have any effects on eye movements of Thai readers or not. The results of these two additional reading time experiments provide the bases and hypotheses for further eye movement experiments; the effects found there need to be followed up in further studies. On the other hand the results of the reading time (Experiment 1) and eye movement (Experiment 2 and 3) studies of word-start and word-end character frequencies provided definitive evidence that the PVL in Thai, a scriptio continua language, is near the OVP at word centre as for spaces alphabetic languages. This finding should be further investigated in other scriptio continua languages such as Khmer (Cambodian) and Lao.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography