Log in

Relevant bibliographies by topics / Visual recognition system / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Visual recognition system.

Dissertations / Theses on the topic 'Visual recognition system'

Author: Grafiati

Published: 4 June 2021

Last updated: 6 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Visual recognition system.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Campbell, Larry W. "An intelligent tutor system for visual aircraft recognition." Thesis, Monterey, California: Naval Postgraduate School, 1990. http://hdl.handle.net/10945/27723.

Full text

Abstract:

Approved for public release; distribution is unlimited.<br>Visual aircraft recognition (VACR) is a critical skill for U.S. Army Short Range Air Defense (SHORAD) soldiers. It is the most reliable means of identifying aircraft, however VACR skills are not easy to teach or learn, and once learned they are highly degradable. The numerous training aids that exist to help units train soldiers require qualified instructors who are not always available. Also, the varying degrees of proficiency among soldiers make group training less than ideal. In an attempt to alleviate the problems in most VASC training programs, an intelligent tutor system has been developed to teach VACR in accordance with the Wings, Engine, Fuselage, Tail (WEFT) cognitive model. The Aircraft Recognition Tutor is a graphics based, object oriented instructional program that teaches, reviews and tests VACR skills at a level appropriate to the student. The tutor adaptively coaches the student from the novice level, through the intermediate level, to the expert level. The tutor was provided to two U.S. Army Air Defense Battalions for testing and evaluation. The six month implementation, testing, and evaluation process demonstrated that, using existing technology in Computer Science and Artificial Intelligence, useful training tools could be developed quickly and inexpensively for deployment on existing computers in field.

APA, Harvard, Vancouver, ISO, and other styles

2

Dong, Junda. "Designing a Visual Front End in Audio-Visual Automatic Speech Recognition System." DigitalCommons@CalPoly, 2015. https://digitalcommons.calpoly.edu/theses/1382.

Full text

Abstract:

Audio-visual automatic speech recognition (AVASR) is a speech recognition technique integrating audio and video signals as input. Traditional audio-only speech recognition system only uses acoustic information from an audio source. However the recognition performance degrades significantly in acoustically noisy environments. It has been shown that visual information also can be used to identify speech. To improve the speech recognition performance, audio-visual automatic speech recognition has been studied. In this paper, we focus on the design of the visual front end of an AVASR system, which mainly consists of face detection and lip localization. The front end is built upon the AVICAR database that was recorded in moving vehicles. Therefore, diverse lighting conditions and poor quality of imagery are the problems we must overcome. We first propose the use of the Viola-Jones face detection algorithm that can process images rapidly with high detection accuracy. When the algorithm is applied to the AVICAR database, we reach an accuracy of 89% face detection rate. By separately detecting and integrating the detection results from all different color channels, we further improve the detection accuracy to 95%. To reliably localize the lips, three algorithms are studied and compared: the Gabor filter algorithm, the lip enhancement algorithm, and the modified Viola-Jones algorithm for lip features. Finally, to increase detection rate, a modified Viola-Jones algorithm and lip enhancement algorithms are cascaded based on the results of three lip localization methods. Overall, the front end achieves an accuracy of 90% for lip localization.

APA, Harvard, Vancouver, ISO, and other styles

3

Wojnowski, Christine. "Reasoning with visual knowledge in an object recognition system /." Online version of thesis, 1990. http://hdl.handle.net/1850/10596.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Sun, Yongbin Ph D. Massachusetts Institute of Technology. "An RFID-based visual recognition system for the retail industry." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/104277.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Department of Mechanical Engineering, 2016.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (pages 63-67).<br>In this thesis, I aim to build an accurate fine-grained retail product recognition system for improving customer in-store shopping experience. To achieve high accuracy, I developed a two-phase visual recognition scheme to identify the viewed retail product by verifying different types of visual features. The proposed scheme is robust enough to distinguish visually similar products in the tests. However, the computation cost of this scheme increases as the database scale becomes larger since it needs to verify all the products in the database. To improve the computation efficiency, my system integrates RFID as a second data source. By attaching an RFID tag to each product, the RFID reader is able to capture the identity information of surrounding products. The detection results can help reduce the verification scope from the whole database to the detected products only. Hence computation cost is saved. In the experiments, I first tested the recognition accuracy of my visual recognition scheme on a database containing visually similar products for different viewing angles, and my scheme achieved over 97.92% recognition accuracy for horizontal viewpoint variations of less than 30 degree. I then experimentally measured the computation cost of both the original system and the RFID-enhanced system. The computation cost is the processing time to recognize a target product. The RFID-enhanced system speeds up system performance dramatically when the scale of detected surrounding products is small.<br>by Yongbin Sun.<br>S.M.

APA, Harvard, Vancouver, ISO, and other styles

5

Koprnicky, Miroslav. "Towards a Versatile System for the Visual Recognition of Surface Defects." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/888.

Full text

Abstract:

Automated visual inspection is an emerging multi-disciplinary field with many challenges; it combines different aspects of computer vision, pattern recognition, automation, and control systems. There does not exist a large body of work dedicated to the design of generalized visual inspection systems; that is, those that might easily be made applicable to different product types. This is an important oversight, in that many improvements in design and implementation times, as well as costs, might be realized with a system that could easily be made to function in different production environments. <br /><br /> This thesis proposes a framework for generalizing and automating the design of the defect classification stage of an automated visual inspection system. It involves using an expandable set of features which are optimized along with the classifier operating on them in order to adapt to the application at hand. The particular implementation explored involves optimizing the feature set in disjoint sets logically grouped by feature type to keep search spaces reasonable. Operator input is kept at a minimum throughout this customization process, since it is limited only to those cases in which the existing feature library cannot adequately delineate the classes at hand, at which time new features (or pools) may have to be introduced by an engineer with experience in the domain. <br /><br /> Two novel methods are put forward which fit well within this framework: cluster-space and hybrid-space classifiers. They are compared in a series of tests against both standard benchmark classifiers, as well as mean and majority vote multi-classifiers, on feature sets comprised of just the logical feature subsets, as well as the entire feature sets formed by their union. The proposed classifiers as well as the benchmarks are optimized with both a progressive combinatorial approach and with an genetic algorithm. Experimentation was performed on true colour industrial lumber defect images, as well as binary hand-written digits. <br /><br /> Based on the experiments conducted in this work, it was found that the sequentially optimized multi hybrid-space methods are capable of matching the performances of the benchmark classifiers on the lumber data, with the exception of the mean-rule multi-classifiers, which dominated most experiments by approximately 3% in classification accuracy. The genetic algorithm optimized hybrid-space multi-classifier achieved best performance however; an accuracy of 79. 2%. <br /><br /> The numeral dataset results were less promising; the proposed methods could not equal benchmark performance. This is probably because the numeral feature-sets were much more conducive to good class separation, with standard benchmark accuracies approaching 95% not uncommon. This indicates that the cluster-space transform inherent to the proposed methods appear to be most useful in highly dependant or confusing feature-spaces, a hypothesis supported by the outstanding performance of the single hybrid-space classifier in the difficult texture feature subspace: 42. 6% accuracy, a 6% increase over the best benchmark performance. <br /><br /> The generalized framework proposed appears promising, because classifier performance over feature sets formed by the union of independently optimized feature subsets regularly met and exceeded those classifiers operating on feature sets formed by the optimization of the feature set in its entirety. This finding corroborates earlier work with similar results [3, 9], and is an aspect of pattern recognition that should be examined further.

APA, Harvard, Vancouver, ISO, and other styles

6

Sjöholm, Alexander. "Closing the Loop : Mobile Visual Location Recognition." Thesis, Linköpings universitet, Datorseende, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-112547.

Full text

Abstract:

Visual simultaneous localization and mapping (SLAM) as field has been researched for ten years, but with recent advances in mobile performance visual SLAM is entering the consumer market in a completely new way. A visual SLAM system will however be sensitive to non cautious use that may result in severe motion, occlusion or poor surroundings in terms of visual features that will cause the system to temporarily fail. The procedure of recovering from such a fail is called relocalization. Together with two similar problems localization, to find your position in an existing SLAM session, and loop closing, the online reparation and perfection of the map in an active SLAM session, these can be grouped as visual location recognition (VLR). This thesis presents novel results by combining the scalability of FabMap and the precision of 13th Lab's tracking yielding high-precision VLR, +/- 10 cm, while maintaining above 99 % precision and 60 % recall for sessions containing thousands of images. Everything functional purely on a normal mobile phone. The applications of VLR are many. Indoors, where GPS is not functioning, VLR can still provide positional information and navigate you through big complexes like airports and museums. Outdoors, VLR can improve the precision of GPS tenfold yielding a new level of navigational experience. Virtual and augmented reality applications are other areas that benefit from improved positioning and localization.

APA, Harvard, Vancouver, ISO, and other styles

7

Su, Ying-fung. "Role of temporal texture in visual system exploration with computer simulations /." Click to view the E-thesis via HKUTO, 2010. http://sunzi.lib.hku.hk/hkuto/record/B43703768.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Kaplan, Bernhard. "Modeling prediction and pattern recognition in the early visual and olfactory systems." Doctoral thesis, KTH, Beräkningsbiologi, CB, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166127.

Full text

Abstract:

Our senses are our mind's window to the outside world and determine how we perceive our environment.Sensory systems are complex multi-level systems that have to solve a multitude of tasks that allow us to understand our surroundings.However, questions on various levels and scales remain to be answered ranging from low-level neural responses to behavioral functions on the highest level.Modeling can connect different scales and contribute towards tackling these questions by giving insights into perceptual processes and interactions between processing stages.In this thesis, numerical simulations of spiking neural networks are used to deal with two essential functions that sensory systems have to solve: pattern recognition and prediction.The focus of this thesis lies on the question as to how neural network connectivity can be used in order to achieve these crucial functions.The guiding ideas of the models presented here are grounded in the probabilistic interpretation of neural signals, Hebbian learning principles and connectionist ideas.The main results are divided into four parts.The first part deals with the problem of pattern recognition in a multi-layer network inspired by the early mammalian olfactory system with biophysically detailed neural components.Learning based on Hebbian-Bayesian principles is used to organize the connectivity between and within areas and is demonstrated in behaviorally relevant tasks.Besides recognition of artificial odor patterns, phenomena like concentration invariance, noise robustness, pattern completion and pattern rivalry are investigated.It is demonstrated that learned recurrent cortical connections play a crucial role in achieving pattern recognition and completion.The second part is concerned with the prediction of moving stimuli in the visual system.The problem of motion-extrapolation is studied using different recurrent connectivity patterns.The main result shows that connectivity patterns taking the tuning properties of cells into account can be advantageous for solving the motion-extrapolation problem.The third part focuses on the predictive or anticipatory response to an approaching stimulus.Inspired by experimental observations, particle filtering and spiking neural network frameworks are used to address the question as to how stimulus information is transported within a motion sensitive network.In particular, the question if speed information is required to build up a trajectory dependent anticipatory response is studied by comparing different network connectivities.Our results suggest that in order to achieve a dependency of the anticipatory response to the trajectory length, a connectivity that uses both position and speed information seems necessary.The fourth part combines the self-organization ideas from the first part with motion perception as studied in the second and third parts.There, the learning principles used in the olfactory system model are applied to the problem of motion anticipation in visual perception.Similarly to the third part, different connectivities are studied with respect to their contribution to anticipate an approaching stimulus.The contribution of this thesis lies in the development and simulation of large-scale computational models of spiking neural networks solving prediction and pattern recognition tasks in biophysically plausible frameworks.<br><p>QC 20150504</p>

APA, Harvard, Vancouver, ISO, and other styles

9

Adjei-Kumi, Theophilus. "The development of an intelligent system for visual simulation of construction projects." Thesis, University of Strathclyde, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.311845.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Su, Ying-fung, and 蘇盈峰. "Role of temporal texture in visual system: exploration with computer simulations." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2010. http://hub.hku.hk/bib/B43703768.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Isik, Leyla. "The dynamics of invariant object and action recognition in the human visual system." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/98000.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Computational and Systems Biology Program, 2015.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (pages 123-138).<br>Humans can quickly and effortlessly recognize objects, and people and their actions from complex visual inputs. Despite the ease with which the human brain solves this problem, the underlying computational steps have remained enigmatic. What makes object and action recognition challenging are identity-preserving transformations that alter the visual appearance of objects and actions, such as changes in scale, position, and viewpoint. The majority of visual neuroscience studies examining visual recognition either use physiology recordings, which provide high spatiotemporal resolution data with limited brain coverage, or functional MRI, which provides high spatial resolution data from across the brain with limited temporal resolution. High temporal resolution data from across the brain is needed to break down and understand the computational steps underlying invariant visual recognition. In this thesis I use magenetoencephalography, machine learning, and computational modeling to study invariant visual recognition. I show that a temporal association learning rule for learning invariance in hierarchical visual systems is very robust to manipulations and visual disputations that happen during development (Chapter 2). I next show that object recognition occurs very quickly, with invariance to size and position developing in stages beginning around 100ms after stimulus onset (Chapter 3), and that action recognition occurs on a similarly fast time scale, 200 ms after video onset, with this early representation being invariant to changes in actor and viewpoint (Chapter 4). Finally, I show that the same hierarchical feedforward model can explain both the object and action recognition timing results, putting this timing data in the broader context of computer vision systems and models of the brain. This work sheds light on the computational mechanisms underlying invariant object and action recognition in the brain and demonstrates the importance of using high temporal resolution data to understand neural computations.<br>by Leyla Isik.<br>Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

12

Stone, Thomas Jonathan. "Mechanisms of place recognition and path integration based on the insect visual system." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28909.

Full text

Abstract:

Animals are often able to solve complex navigational tasks in very challenging terrain, despite using low resolution sensors and minimal computational power, providing inspiration for robots. In particular, many species of insect are known to solve complex navigation problems, often combining an array of different behaviours (Wehner et al., 1996; Collett, 1996). Their nervous system is also comparatively simple, relative to that of mammals and other vertebrates. In the first part of this thesis, the visual input of a navigating desert ant, Cataglyphis velox, was mimicked by capturing images in ultraviolet (UV) at similar wavelengths to the ant’s compound eye. The natural segmentation of ground and sky lead to the hypothesis that skyline contours could be used by ants as features for navigation. As proof of concept, sky-segmented binary images were used as input for an established localisation algorithm SeqSLAM (Milford and Wyeth, 2012), validating the plausibility of this claim (Stone et al., 2014). A follow-up investigation sought to determine whether using the sky as a feature would help overcome image matching problems that the ant often faced, such as variance in tilt and yaw rotation. A robotic localisation study showed that using spherical harmonics (SH), a representation in the frequency domain, combined with extracted sky can greatly help robots localise on uneven terrain. Results showed improved performance to state of the art point feature localisation methods on fast bumpy tracks (Stone et al., 2016a). In the second part, an approach to understand how insects perform a navigational task called path integration was attempted by modelling part of the brain of the sweat bee Megalopta genalis. A recent discovery that two populations of cells act as a celestial compass and visual odometer, respectively, led to the hypothesis that circuitry at their point of convergence in the central complex (CX) could give rise to path integration. A firing rate-based model was developed with connectivity derived from the overlap of observed neural arborisations of individual cells and successfully used to build up a home vector and steer an agent back to the nest (Stone et al., 2016b). This approach has the appeal that neural circuitry is highly conserved across insects, so findings here could have wide implications for insect navigation in general. The developed model is the first functioning path integrator that is based on individual cellular connections.

APA, Harvard, Vancouver, ISO, and other styles

13

Amundberg, Joel, and Martin Moberg. "System Agnostic GUI Testing : Analysis of Augmented Image Recognition Testing." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21441.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Eccles, John. "Stochastic relaxation labelling of visual features in a multi-sensor sensory system for robotic assembly." Thesis, Heriot-Watt University, 1994. http://hdl.handle.net/10399/1370.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Wallenberg, Marcus. "Components of Embodied Visual Object Recognition : Object Perception and Learning on a Robotic Platform." Licentiate thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-93812.

Full text

Abstract:

Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, and the implementation of the system itself. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. Finally, in order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. All of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.<br>Embodied Visual Object Recognition

APA, Harvard, Vancouver, ISO, and other styles

16

Evans, Benjamin D. "Learning transformation-invariant visual representations in spiking neural networks." Thesis, University of Oxford, 2012. https://ora.ox.ac.uk/objects/uuid:15bdf771-de28-400e-a1a7-82228c7f01e4.

Full text

Abstract:

This thesis aims to understand the learning mechanisms which underpin the process of visual object recognition in the primate ventral visual system. The computational crux of this problem lies in the ability to retain specificity to recognize particular objects or faces, while exhibiting generality across natural variations and distortions in the view (DiCarlo et al., 2012). In particular, the work presented is focussed on gaining insight into the processes through which transformation-invariant visual representations may develop in the primate ventral visual system. The primary motivation for this work is the belief that some of the fundamental mechanisms employed in the primate visual system may only be captured through modelling the individual action potentials of neurons and therefore, existing rate-coded models of this process constitute an inadequate level of description to fully understand the learning processes of visual object recognition. To this end, spiking neural network models are formulated and applied to the problem of learning transformation-invariant visual representations, using a spike-time dependent learning rule to adjust the synaptic efficacies between the neurons. The ways in which the existing rate-coded CT (Stringer et al., 2006) and Trace (Földiák, 1991) learning mechanisms may operate in a simple spiking neural network model are explored, and these findings are then applied to a more accurate model using realistic 3-D stimuli. Three mechanisms are then examined, through which a spiking neural network may solve the problem of learning separate transformation-invariant representations in scenes composed of multiple stimuli by temporally segmenting competing input representations. The spike-time dependent plasticity in the feed-forward connections is then shown to be able to exploit these input layer dynamics to form individual stimulus representations in the output layer. Finally, the work is evaluated and future directions of investigation are proposed.

APA, Harvard, Vancouver, ISO, and other styles

17

Rezazadegan, Tavakoli H. (Hamed). "Visual saliency and eye movement:modeling and applications." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526205816.

Full text

Abstract:

Abstract Humans are capable of narrowing their focus on the highlights of visual information in a fraction of time in order to handle enormous mass of data. Akin to human, computers should deal with a tremendous amount of visual information. To replicate such a focusing mechanism, computer vision relies on techniques that filter out redundant information. Consequently, saliency has recently been a popular subject of discussion in the computer vision community, though it is an old subject matter in the disciplines of cognitive sciences rather than computer science. The reputation of saliency techniques – particularly in the computer vision domain – is greatly due to their inexpensive and fast computation which facilitates their use in many computer vision applications, e.g., image/video compression, object recognition, tracking, etc. This study investigates visual saliency modeling, which is the transformation of an image into a salience map such that the identified conspicuousness agrees with the statistics of human eye movements. It explores the extent of image and video processing to develop saliency techniques suitable for computer vision, e.g., it adopts sparse sampling scheme and kernel density estimation to introduce a saliency measure for images. Also, it studies the role of eye movement in salience modeling. To this end, it introduces a particle filter based framework of saccade generation incorporated into a salience model. Moreover, eye movements and salience are exploited in several applications. The contributions of this study lie on the proposal of a number of salience models for image and video stimuli, a framework to incorporate a model of eye movement generation in salience modeling, and the investigation of the application of salience models and eye movements in tracking, background subtraction, scene recognition, and valence recognition<br>Tiivistelmä Ihmiset kykenevät kohdistamaan katseensa hetkessä näkymän keskeisiin asioihin, mikä vaatii näköjärjestelmältä valtavan suurten tietomäärien käsittelyä. Kuten ihmisen myös tietokoneen pitäisi pystyä käsittelemään vastaavasti suurta määrää visuaalista informaatiota. Tällaisen mekanismin toteuttaminen tietokonenäöllä edellyttää menetelmiä, joilla redundanttista tietoa voidaan suodattaa. Tämän vuoksi salienssista eli silmiinpistävyydestä on muodostunut viime aikoina suosittu tutkimusaihe tietotekniikassa ja erityisesti tietokonenäön tutkimusyhteisössä, vaikka sitä sinänsä on jo pitkään tutkittu kognitiivisissa tieteissä. Salienssimenetelmien tunnettavuus erityisesti tietokonenäössä johtuu pääasiassa niiden laskennallisesta tehokkuudesta, mikä taas mahdollistaa menetelmien käytön monissa tietokonenäön sovelluksissa kuten kuvan ja videon pakkaamisessa, objektin tunnistuksessa, seurannassa, etc. Tässä väitöskirjassa tutkitaan visuaalisen salienssin mallintamista, millä tarkoitetaan muunnosta kuvasta salienssikartaksi siten, että laskennallinen silmiinpistävyys vastaa ihmisen silmänliikkeistä muodostettavaa statistiikkaa. Työssä tarkastellaan keinoja, miten kuvan- ja videonkäsittelyä voidaan käyttää kehittämään salienssimenetelmiä tietokonenäön tarpeisiin. Työssä esitellään esimerkiksi harvaa näytteistystä ja ydinestimointia hyödyntävä kuvien salienssimitta. Työssä tutkitaan myös silmänliikkeiden merkitystä salienssin mallintamisen kannalta. Tätä varten esitellään partikkelisuodatusta hyödyntävä lähestymistapa sakkadien generointiin, joka voidaan liittää salienssimalliin. Lisäksi silmänliikkeitä ja salienssia hyödynnetään useissa sovelluksissa. Suoritetun tutkimuksen tieteellisiin kontribuutioihin sisältyvät useat esitetyt salienssimallit kuvasta ja videosta saatavalle herätteelle, lähestymistapa silmänliikkeiden laskennalliseen mallintamiseen ja generointiin osana salienssimallia sekä salienssimallien ja silmänliikkeiden sovellettavuuden tutkiminen visuaalisessa seurannassa, taustanvähennyksessä, näkymäanalyysissa ja valenssin tunnistuksessa

APA, Harvard, Vancouver, ISO, and other styles

18

North, Ben. "Learning dynamical models for visual tracking." Thesis, University of Oxford, 1998. http://ora.ox.ac.uk/objects/uuid:6ed12552-4c30-4d80-88ef-7245be2d8fb8.

Full text

Abstract:

Using some form of dynamical model in a visual tracking system is a well-known method for increasing robustness and indeed performance in general. Often, quite simple models are used and can be effective, but prior knowledge of the likely motion of the tracking target can often be exploited by using a specially-tailored model. Specifying such a model by hand, while possible, is a time-consuming and error-prone process. Much more desirable is for an automated system to learn a model from training data. A dynamical model learnt in this manner can also be a source of useful information in its own right, and a set of dynamical models can provide discriminatory power for use in classification problems. Methods exist to perform such learning, but are limited in that they assume the availability of 'ground truth' data. In a visual tracking system, this is rarely the case. A learning system must work from visual data alone, and this thesis develops methods for learning dynamical models while explicitly taking account of the nature of the training data --- they are noisy measurements. The algorithms are developed within two tracking frameworks. The Kalman filter is a simple and fast approach, applicable where the visual clutter is limited. The recently-developed Condensation algorithm is capable of tracking in more demanding situations, and can also employ a wider range of dynamical models than the Kalman filter, for instance multi-mode models. The success of the learning algorithms is demonstrated experimentally. When using a Kalman filter, the dynamical models learnt using the algorithms presented here produce better tracking when compared with those learnt using current methods. Learning directly from training data gathered using Condensation is an entirely new technique, and experiments show that many aspects of a multi-mode system can be successfully identified using very little prior information. Significant computational effort is required by the implementation of the methods, and there is scope for improvement in this regard. Other possibilities for future work include investigation of the strong links this work has with learning problems in other areas. Most notable is the study of the 'graphical models' commonly used in expert systems, where the ideas presented here promise to give insight and perhaps lead to new techniques.

APA, Harvard, Vancouver, ISO, and other styles

19

Tromans, James Matthew. "Computational neuroscience of natural scene processing in the ventral visual pathway." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:b82e1332-df7b-41db-9612-879c7a7dda39.

Full text

Abstract:

Neural responses in the primate ventral visual system become more complex in the later stages of the pathway. For example, not only do neurons in IT cortex respond to complete objects, they also learn to respond invariantly with respect to the viewing angle of an object and also with respect to the location of an object. These types of neural responses have helped guide past research with VisNet, a computational model of the primate ventral visual pathway that self-organises during learning. In particular, previous research has focussed on presenting to the model one object at a time during training, and has placed emphasis on the transform invariant response properties of the output neurons of the model that consequently develop. This doctoral thesis extends previous VisNet research and investigates the performance of the model with a range of more challenging and ecologically valid training paradigms. For example, when multiple objects are presented to the network during training, or when objects partially occlude one another during training. The different mechanisms that help output neurons to develop object selective, transform invariant responses during learning are proposed and explored. Such mechanisms include the statistical decoupling of objects through multiple object pairings, and the separation of object representations by independent motion. Consideration is also given to the heterogeneous response properties of neurons that develop during learning. For example, although IT neurons demonstrate a number of differing invariances, they also convey spatial information and view specific information about the objects presented on the retina. A updated, scaled-up version of the VisNet model, with a significantly larger retina, is introduced in order to explore these heterogeneous neural response properties.

APA, Harvard, Vancouver, ISO, and other styles

20

Zukauskis, Ronald L. "Tachistoscopic recognition of vertical and horizontal letter symmetry in response to the contralateral organization of the human nervous system." Virtual Press, 2001. http://liblink.bsu.edu/uhtbin/catkey/1221268.

Full text

Abstract:

Eight-letter upper case arrays containing vertically symmetrical (VS), e.g., A-T-U-W, horizontally symmetrical (HS), e.g., B-D-C-E, doubly symmetrical (DS), e.g., H-I-O-X, and non-symmetrical (NS), e.g., F-G-L-R, were tachistoscopically exposed bilaterally for 50 ms. to fifteen male and fifteen female undergraduates. The number of letters correctly recognized for each classification condition was used as the criterion measure. A fixed, two-factor design with the second factor being repeated was analyzed using a repeated measures analysis of variance. Consequent to testing Null Hypothesis 1 (that there is no difference between the classification conditions), a check was made for the presence of a significant interaction between gender and classification condition (Null Hypothesis 2). Because Null Hypothesis 1 was rejected and there was no interaction present, the classification group means were tested using a post hoc multiple comparison procedure identified as Tukey's Honestly Significant Difference (HSD) test. Test statistics for the Tukey HSD contrasts found that significantly more VS letters were reported than DS, HS, and NS letters. Significantly more DS letters were reported than HS and NS letters. No difference in report accuracy was found between HS and NS letters. This is in sharp contrast to studies that count only responses reported in the same left-to-right order as the tachistoscopic presentation, i.e., order of report. Previous studies using an order of report method found vertically asymmetrical letters to be reported more accurately than vertically symmetrical ones. The present study disregarded order of from an order of report. It was emphasized that the subject maintain focus on the fixation dot and not attempt to scan the letter-array pattern in a left-to-right direction, as the lettersdid not have to be reported in their respective positions. A different explanation for the Harcum (1964) directionality and Bryden (1968) masking interpretations follows from an order of report method activating additional processing mechanisms such as working memory that are ordinarily not needed to process letter features.Results obtained by the present study are discussed in terms of a reversal of spatial information for touch, kinesthesis, and sound to match the brain’s reversed retino-cortical projection.<br>Department of Educational Psychology

APA, Harvard, Vancouver, ISO, and other styles

21

Teichmann, Michael. "A plastic multilayer network of the early visual system inspired by the neocortical circuit." Universitätsverlag der Technischen Universität Chemnitz, 2018. https://monarch.qucosa.de/id/qucosa%3A31832.

Full text

Abstract:

The ability of the visual system for object recognition is remarkable. A better understanding of its processing would lead to better computer vision systems and could improve our understanding of the underlying principles which produce intelligence. We propose a computational model of the visual areas V1 and V2, implementing a rich connectivity inspired by the neocortical circuit. We combined the three most important cortical plasticity mechanisms. 1) Hebbian synaptic plasticity to learn the synapse strengths of excitatory and inhibitory neurons, including trace learning to learn invariant representations. 2) Intrinsic plasticity to regulate the neurons responses and stabilize the learning in deeper layers. 3) Structural plasticity to modify the connections and to overcome the bias for the learnings from the initial definitions. Among others, we show that our model neurons learn comparable receptive fields to cortical ones. We verify the invariant object recognition performance of the model. We further show that the developed weight strengths and connection probabilities are related to the response correlations of the neurons. We link the connection probabilities of the inhibitory connections to the underlying plasticity mechanisms and explain why inhibitory connections appear unspecific. The proposed model is more detailed than previous approaches. It can reproduce neuroscientific findings and fulfills the purpose of the visual system, invariant object recognition.<br>Das visuelle System des Menschen hat die herausragende Fähigkeit zur invarianten Objekterkennung. Ein besseres Verständnis seiner Arbeitsweise kann zu besseren Computersystemen für das Bildverstehen führen und könnte darüber hinaus unser Verständnis von den zugrundeliegenden Prinzipien unserer Intelligenz verbessern. Diese Arbeit stellt ein Modell der visuellen Areale V1 und V2 vor, welches eine komplexe, von den Strukturen des Neokortex inspirierte, Verbindungsstruktur integriert. Es kombiniert die drei wichtigsten kortikalen Plastizitäten: 1) Hebbsche synaptische Plastizität, um die Stärke der exzitatorischen und inhibitorischen Synapsen zu lernen, welches auch „trace“-Lernen, zum Lernen invarianter Repräsentationen, umfasst. 2) Intrinsische Plastizität, um das Antwortverhalten der Neuronen zu regulieren und damit das Lernen in tieferen Schichten zu stabilisieren. 3) Strukturelle Plastizität, um die Verbindungen zu modifizieren und damit den Einfluss anfänglicher Festlegungen auf das Lernergebnis zu reduzieren. Neben weiteren Ergebnissen wird gezeigt, dass die Neuronen des Modells vergleichbare rezeptive Felder zu Neuronen des visuellen Kortex erlernen. Ebenso wird die Leistungsfähigkeit des Modells zur invariante Objekterkennung verifiziert. Des Weiteren wird der Zusammenhang von Gewichtsstärke und Verbindungswahrscheinlichkeit zur Korrelation der Aktivitäten der Neuronen aufgezeigt. Die gefundenen Verbindungswahrscheinlichkeiten der inhibitorischen Neuronen werden in Zusammenhang mit der Funktionsweise der inhibitorischen Plastizität gesetzt, womit erklärt wird warum inhibitorische Verbindungen unspezifisch erscheinen. Das vorgestellte Modell ist detaillierter als vorangegangene Arbeiten. Es ermöglicht neurowissenschaftliche Erkenntnisse nachzuvollziehen, wobei es ebenso die Hauptleistung des visuellen Systems erbringt, invariante Objekterkennung. Darüber hinaus ermöglichen sein Detailgrad und seine Selbstorganisationsprinzipien weitere neurowissenschaftliche Erkenntnisse und die Modellierung komplexerer Modelle der Verarbeitung im Gehirn.

APA, Harvard, Vancouver, ISO, and other styles

22

Beuth, Frederik. "Visual attention in primates and for machines - neuronal mechanisms." Universitätsverlag Chemnitz, 2017. https://monarch.qucosa.de/id/qucosa%3A35655.

Full text

Abstract:

Visual attention is an important cognitive concept for the daily life of humans, but still not fully understood. Due to this, it is also rarely utilized in computer vision systems. However, understanding visual attention is challenging as it has many and seemingly-different aspects, both at neuronal and behavioral level. Thus, it is very hard to give a uniform explanation of visual attention that can account for all aspects. To tackle this problem, this thesis has the goal to identify a common set of neuronal mechanisms, which underlie both neuronal and behavioral aspects. The mechanisms are simulated by neuro-computational models, thus, resulting in a single modeling approach to explain a wide range of phenomena at once. In the thesis, the chosen aspects are multiple neurophysiological effects, real-world object localization, and a visual masking paradigm (OSM). In each of the considered fields, the work also advances the current state-of-the-art to better understand this aspect of attention itself. The three chosen aspects highlight that the approach can account for crucial neurophysiological, functional, and behavioral properties, thus the mechanisms might constitute the general neuronal substrate of visual attention in the cortex. As outlook, our work provides for computer vision a deeper understanding and a concrete prototype of attention to incorporate this crucial aspect of human perception in future systems.:1. General introduction 2. The state-of-the-art in modeling visual attention 3. Microcircuit model of attention 4. Object localization with a model of visual attention 5. Object substitution masking 6. General conclusion<br>Visuelle Aufmerksamkeit ist ein wichtiges kognitives Konzept für das tägliche Leben des Menschen. Es ist aber immer noch nicht komplett verstanden, so dass es ein langjähriges Ziel der Neurowissenschaften ist, das Phänomen grundlegend zu durchdringen. Gleichzeitig wird es aufgrund des mangelnden Verständnisses nur selten in maschinellen Sehsystemen in der Informatik eingesetzt. Das Verständnis von visueller Aufmerksamkeit ist jedoch eine komplexe Herausforderung, da Aufmerksamkeit äußerst vielfältige und scheinbar unterschiedliche Aspekte besitzt. Sie verändert multipel sowohl die neuronalen Feuerraten als auch das menschliche Verhalten. Daher ist es sehr schwierig, eine einheitliche Erklärung von visueller Aufmerksamkeit zu finden, welche für alle Aspekte gleichermaßen gilt. Um dieses Problem anzugehen, hat diese Arbeit das Ziel, einen gemeinsamen Satz neuronaler Mechanismen zu identifizieren, welche sowohl den neuronalen als auch den verhaltenstechnischen Aspekten zugrunde liegen. Die Mechanismen werden in neuro-computationalen Modellen simuliert, wodurch ein einzelnes Modellierungsframework entsteht, welches zum ersten Mal viele und verschiedenste Phänomene von visueller Aufmerksamkeit auf einmal erklären kann. Als Aspekte wurden in dieser Dissertation multiple neurophysiologische Effekte, Realwelt Objektlokalisation und ein visuelles Maskierungsparadigma (OSM) gewählt. In jedem dieser betrachteten Felder wird gleichzeitig der State-of-the-Art verbessert, um auch diesen Teilbereich von Aufmerksamkeit selbst besser zu verstehen. Die drei gewählten Gebiete zeigen, dass der Ansatz grundlegende neurophysiologische, funktionale und verhaltensbezogene Eigenschaften von visueller Aufmerksamkeit erklären kann. Da die gefundenen Mechanismen somit ausreichend sind, das Phänomen so umfassend zu erklären, könnten die Mechanismen vielleicht sogar das essentielle neuronale Substrat von visueller Aufmerksamkeit im Cortex darstellen. Für die Informatik stellt die Arbeit damit ein tiefergehendes Verständnis von visueller Aufmerksamkeit dar. Darüber hinaus liefert das Framework mit seinen neuronalen Mechanismen sogar eine Referenzimplementierung um Aufmerksamkeit in zukünftige Systeme integrieren zu können. Aufmerksamkeit könnte laut der vorliegenden Forschung sehr nützlich für diese sein, da es im Gehirn eine Aufgabenspezifische Optimierung des visuellen Systems bereitstellt. Dieser Aspekt menschlicher Wahrnehmung fehlt meist in den aktuellen, starken Computervisionssystemen, so dass eine Integration in aktuelle Systeme deren Leistung sprunghaft erhöhen und eine neue Klasse definieren dürfte.:1. General introduction 2. The state-of-the-art in modeling visual attention 3. Microcircuit model of attention 4. Object localization with a model of visual attention 5. Object substitution masking 6. General conclusion

APA, Harvard, Vancouver, ISO, and other styles

23

Lindqvist, Zebh. "Design Principles for Visual Object Recognition Systems." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-80769.

Full text

Abstract:

Today's smartphones are capable of accomplishing far more advanced tasks than reading emails. With the modern framework TensorFlow, visual object recognition becomes possible using smartphone resources. This thesis shows that the main challenge does not lie in developing an artifact which performs visual object recognition. Instead, the main challenge lies in developing an ecosystem which allows for continuous improvement of the system’s ability to accomplish the given task without laborious and inefficient data collection. This thesis presents four design principles which contribute to an efficient ecosystem with quick initiation of new object classes and efficient data collection which is used to continuously improve the system’s ability to recognize smart meters in varying environments in an automated fashion.

APA, Harvard, Vancouver, ISO, and other styles

24

Freytag, Alexander [Verfasser]. "Lifelong Learning for Visual Recognition Systems / Alexander Freytag." München : Verlag Dr. Hut, 2017. http://d-nb.info/1126297100/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Rao, Ram Raghavendra. "Audio-visual interaction in multimedia." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/13349.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Rabi, Gihad. "Visual speech recognition by recurrent neural networks." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0010/MQ36169.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Erhard, Matthew John. "Visual intent recognition in a multiple camera environment /." Online version of thesis, 2006. http://hdl.handle.net/1850/3365.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Barb, Adrian S. "Knowledge representation and exchange of visual patterns using semantic abstractions." Diss., Columbia, Mo. : University of Missouri-Columbia, 2008. http://hdl.handle.net/10355/6674.

Full text

Abstract:

Thesis (Ph. D.)--University of Missouri-Columbia, 2008.<br>The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on July 21, 2009) Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

29

Khan, Rizwan Ahmed. "Détection des émotions à partir de vidéos dans un environnement non contrôlé." Thesis, Lyon 1, 2013. http://www.theses.fr/2013LYO10227/document.

Full text

Abstract:

Dans notre communication quotidienne avec les autres, nous avons autant de considération pour l’interlocuteur lui-même que pour l’information transmise. En permanence coexistent en effet deux modes de transmission : le verbal et le non-verbal. Sur ce dernier thème intervient principalement l’expression faciale avec laquelle l’interlocuteur peut révéler d’autres émotions et intentions. Habituellement, un processus de reconnaissance d’émotions faciales repose sur 3 étapes : le suivi du visage, l’extraction de caractéristiques puis la classification de l’expression faciale. Pour obtenir un processus robuste apte à fournir des résultats fiables et exploitables, il est primordial d’extraire des caractéristiques avec de forts pouvoirs discriminants (selon les zones du visage concernées). Les avancées récentes de l’état de l’art ont conduit aujourd’hui à diverses approches souvent bridées par des temps de traitement trop couteux compte-tenu de l’extraction de descripteurs sur le visage complet ou sur des heuristiques mathématiques et/ou géométriques.En fait, aucune réponse bio-inspirée n’exploite la perception humaine dans cette tâche qu’elle opère pourtant régulièrement. Au cours de ces travaux de thèse, la base de notre approche fut ainsi de singer le modèle visuel pour focaliser le calcul de nos descripteurs sur les seules régions du visage essentielles pour la reconnaissance d’émotions. Cette approche nous a permis de concevoir un processus plus naturel basé sur ces seules régions émergentes au regard de la perception humaine. Ce manuscrit présente les différentes méthodologies bio-inspirées mises en place pour aboutir à des résultats qui améliorent généralement l’état de l’art sur les bases de référence. Ensuite, compte-tenu du fait qu’elles se focalisent sur les seules parties émergentes du visage, elles améliorent les temps de calcul et la complexité des algorithmes mis en jeu conduisant à une utilisation possible pour des applications temps réel<br>Communication in any form i.e. verbal or non-verbal is vital to complete various daily routine tasks and plays a significant role inlife. Facial expression is the most effective form of non-verbal communication and it provides a clue about emotional state, mindset and intention. Generally automatic facial expression recognition framework consists of three step: face tracking, feature extraction and expression classification. In order to built robust facial expression recognition framework that is capable of producing reliable results, it is necessary to extract features (from the appropriate facial regions) that have strong discriminative abilities. Recently different methods for automatic facial expression recognition have been proposed, but invariably they all are computationally expensive and spend computational time on whole face image or divides the facial image based on some mathematical or geometrical heuristic for features extraction. None of them take inspiration from the human visual system in completing the same task. In this research thesis we took inspiration from the human visual system in order to find from where (facial region) to extract features. We argue that the task of expression analysis and recognition could be done in more conducive manner, if only some regions are selected for further processing (i.e.salient regions) as it happens in human visual system. In this research thesis we have proposed different frameworks for automatic recognition of expressions, all getting inspiration from the human vision. Every subsequently proposed addresses the shortcomings of the previously proposed framework. Our proposed frameworks in general, achieve results that exceeds state-of-the-artmethods for expression recognition. Secondly, they are computationally efficient and simple as they process only perceptually salient region(s) of face for feature extraction. By processing only perceptually salient region(s) of the face, reduction in feature vector dimensionality and reduction in computational time for feature extraction is achieved. Thus making them suitable for real-time applications

APA, Harvard, Vancouver, ISO, and other styles

30

Athukorala, Aravinda S. "A strategy for the visual recognition of objects in an industrial environment." Thesis, University of Edinburgh, 1985. http://hdl.handle.net/1842/4860.

Full text

Abstract:

This thesis is concerned with the problem of recognizing industrial objects rapidly and flexibly. The system design is based on a general strategy that consists of a generalized local feature detector, an extended learning algorithm and the use of unique structure of the objects. Thus, the system is not designed to be limited to the industrial environment. The generalized local feature detector uses the gradient image of the scene to provide a feature description that is insensitive to a range of imaging conditions such as object position, and overall light intensity. The feature detector is based on a representative point algorithm which is able to reduce the data content of the image without restricting the allowed object geometry. Thus, a major advantage of the local feature detector is its ability to describe and represent complex object structure. The reliance on local features also allows the system to recognize partially visible objects. The task of the learning algorithm is to observe the feature description generated by the feature detector in order to select features that are reliable over the range of imaging conditions of interest. Once a set of reliable features is found for each object, the system finds unique relational structure which is later used to recognize the objects. Unique structure is a set of descriptions of unique subparts of the objects of interest. The present implementation is limited to the use of unique local structure. The recognition routine uses these unique descriptions to recognize objects in new images. An important feature of this strategy is the transference of a large amount of processing required for graph matching from the recognition stage to the learning stage, which allows the recognition routine to execute rapidly. The test results show that the system is able to function with a significant level of insensitivity to operating conditions; The system shows insensitivity to its 3 main assumptions -constant scale, constant lighting, and 2D images- displaying a degree of graceful degradation when the operating conditions degrade. For example, for one set of test objects, the recognition threshold was reached when the absolute light level was reduced by 70%-80%, or the object scale was reduced by 30%-40%, or the object was tilted away from the learned 2D plane by 300-400. This demonstrates a very important feature of the learning strategy: It shows that the generalizations made by the system are not only valid within the domain of the sampled set of images, but extend outside this domain. The test results also show that the recognition routine is able to execute rapidly, requiring 10ms-500ms (on a PDP11/24 minicomputer) in the special case when ideal operating conditions are guaranteed. (Note: This does not include pre-processing time). This thesis describes the strategy, the architecture and the implementation of the vision system in detail, and gives detailed test results. A proposal for extending the system to scale independent 3D object recognition is also given.

APA, Harvard, Vancouver, ISO, and other styles

31

Gobin, Paméla. "Propagation de l’activation entre le lexique orthographique et le système affectif." Thesis, Bordeaux 2, 2011. http://www.theses.fr/2011BOR21821/document.

Full text

Abstract:

L’objectif de cette thèse est d’étudier l’activation du système affectif médiée par le lexique orthographique au cours de la reconnaissance visuelle des mots. Plus précisément, nous avons étudié l’influence du voisinage orthographique émotionnel négatif et la sensibilité de l’amorçage orthographique à la valence négative de voisins plus fréquents dans une tâche de décision lexicale (TDL) combinée à un paradigme d’amorçage. Le recueil de mesures comportementales et électrophysiologiques (potentiels évoqués) nous a également permis d’évaluer la précocité de l’activation des composantes émotionnelles des voisins. Des mots neutres (e.g., FUSEAU, TOISON) ayant un seul voisin orthographique plus fréquent neutre (e.g., museau) ou négatif (e.g., poison) ont ainsi été présentés dans la TDL. Ils étaient précédés de leur voisin ou d’une amorce contrôle non alphabétique présenté pendant 66 ou 166 ms. Dans un premier temps, l’état émotionnel des participants a été contrôlé (Expériences 1-4). Dans un second temps, il a été manipulé a priori par une induction d’humeur triste (Expériences 5 et 7) ou déterminé a posteriori en considérant le niveau d’épuisement professionnel des participants (Expériences 7-8). Le traitement des mots fréquents neutres ou négatifs a été examiné en complément (Expérience 6). Les résultats montrent un effet inhibiteur du voisinage orthographique émotionnel négatif sur les temps de reconnaissance des mots cibles ainsi qu’un effet inhibiteur d’amorçage orthographique, accru par la durée de présentation des amorces. Trois composantes (P150, N200 et N400) constituent les corrélats électrophysiologiques de l’effet d’amorçage orthographique, sensibles à la valence négative des voisins et à la durée de présentation des amorces. Enfin, l’état émotionnel des individus modifie l’effet d’amorçage orthographique. Les résultats sont interprétés dans un modèle de type Activation Interactive de reconnaissance visuelle des mots adapté aux traitements affectifs<br>The aim of this thesis was to study the activation of the affective system mediated by the orthographic lexicon during visual word recognition. More precisely, we have investigated the influence of the negative emotional orthographic neighbourhood and the sensitivity of orthographic priming to the negative valence of higher-frequency neighbours in the lexical decision task (LDT) combined with a priming paradigm. The recording of behavioural and electrophysiological (event-related brain potentials) measures provides also evidences on the early activation of affective components of the neighbours. Neutral words (e.g., FUSEAU [spindle], TOISON [fleece]) with one higher-frequency neighbour, that was either neutral (e.g., museau [muzzle]) or negative (e.g., poison), were presented in the LDT. They were preceded either by their neighbour or by a nonalphabetic control prime, presented 66 or 166 ms. Firstly, the emotional state of participants was controlled (Experiments 1-4). Secondly, it was manipulated a priori by a sad mood induction (Experiments 5 and 7) or determined a posteriori by considering the burnout level of participants (Experiments 7-8). The processing of negative or neutral frequent words have been also examined (Experiment 6). The results showed an inhibitory effect of negative emotional orthographic neighbourhood on target recognition time and an inhibitory effect of orthographic priming, increased by prime duration. Three components (P150, N200, and N400) were the electrophysiological correlates of orthographic priming effect, also depending on the negative valence of higher-frequency neighbours and prime duration. Finally, the emotional state of individuals modified the orthographic priming effect. The results are interpreted in an Interactive Activation model extended to affective processing

APA, Harvard, Vancouver, ISO, and other styles

32

Bridges, Seth. "Low-power visual pattern classification in analog VLSI /." Thesis, Connect to this title online; UW restricted, 2006. http://hdl.handle.net/1773/6984.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Tivive, Fok Hing Chi. "A new class of convolutional neural networks based on shunting inhibition with applications to visual pattern recognition." Access electronically, 2006. http://www.library.uow.edu.au/adt-NWU/public/adt-NWU20061025.164437/index.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Jafari, Moghadamfard Ramtin, and Saeid Payvar. "The Potential of Visual Features : to Improve Voice Recognition Systems in Vehicles Noisy Environment." Thesis, Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-27273.

Full text

Abstract:

Multimodal biometric systems have been subject of study in recent decades, theirunique characteristic of Anti spoofing and liveness detection plus ability to deal withaudio noise made them technology candidates for improving current systems such asvoice recognition, verification and identification systems.In this work we studied feasibility of incorporating audio-visual voice recognitionsystem for dealing with audio noise in the truck cab environment. Speech recognitionsystems suffer from excessive noise from the engine and road traffic and cars stereosystem. To deal with this noise different techniques including active and passive noisecancelling have been studied.Our results showed that although audio-only systems are performing better in noisefree environment their performance drops significantly by increase in the level of noisein truck cabins, which by contrast does not affect the performance of visual features.Final fused system comprising both visual and audio cues, proved to be superior toboth audio-only and video-only systems.

APA, Harvard, Vancouver, ISO, and other styles

35

Li, Jun. "Image texture decomposition and application in food quality analysis /." free to MU campus, to others for purchase, 2001. http://wwwlib.umi.com/cr/mo/fullcit?p3036842.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Penatti, Otávio Augusto Bizetto 1984. "Image and video representations based on visual = Representações de imagens e vídeos baseadas em dicionários visuais." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275667.

Full text

Abstract:

Orientador: Ricardo da Silva Torres<br>Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação<br>Made available in DSpace on 2018-08-22T02:56:23Z (GMT). No. of bitstreams: 1 Penatti_OtavioAugustoBizetto_D.pdf: 9249507 bytes, checksum: cb1c8b77d85ae6c83d2572ab7848025b (MD5) Previous issue date: 2012<br>Resumo: Codificar de maneira eficaz as propriedades visuais de conteúdo multimídia é um desafio. Uma abordagem popular para tratar esse desafio consiste no modelo de dicionários visuais. Neste modelo, imagens são consideradas como um conjunto desordenado de características locais e são representadas por um saco de palavras visuais (bag of visual words). Nesta tese, trabalhamos em três problemas de pesquisa relacionados ao modelo de dicionários visuais. O primeiro deles é relacionado ao poder de generalização dos dicionários visuais, que se refere à capacidade de criar boas representações para imagens de uma dada coleção mesmo usando um dicionário criado sobre outra coleção ou usando um dicionário criado sobre pequenas amostras da coleção. Experimentos foram realizados em coleções fechadas de imagens e em um ambiente Web. Os resultados obtidos sugerem que o uso de amostras diversas em termos de aparência é suficiente para se gerar bons dicionários. O segundo problema de pesquisa é relacionado à importância da informação espacial das palavras visuais no espaço da imagem. Esta informação pode ser fundamental para diferenciar tipos de objetos e cenas. As técnicas mais comuns de pooling normalmente descartam a configuração espacial das palavras visuais na imagem. Propomos uma nova técnica de pooling, chamada de Word Spatial Arrangement (WSA), que codifica a posição relativa das palavras visuais na imagem e tem a vantagem de gerar vetores de características mais compactos do que a maioria das técnicas de pooling espacial existentes. Experimentos em recuperação de imagens mostram que o WSA supera em eficácia a técnica mais popular de pooling espacial, as pirâmides espaciais. O terceiro problema de pesquisa em investigação nesta tese é relacionado à falta de informação semântica no modelo de dicionários visuais. Mostramos que o problema de não haver semântica no espaço de características de baixo nível é reduzido ao passarmos para o espaço das representações baseadas em sacos de palavras visuais. Contudo, mesmo no espaço destas representações, mostramos que existe pouca separabilidade entre distribuições de distância de conceitos semânticos diferentes. Portanto, questionamos sobre passar para um novo espaço e propomos uma representação baseada em palavras visuais que carreguem mais semântica de acordo com a percepção visual humana. Propomos um modelo de saco de protótipos, segundo o qual os protótipos são elementos com maior significado. Esta abordagem tem potencial para reduzir a chamada lacuna semântica entre a interpretação do usuário sobre uma imagem e a sua representação. Propomos um dicionário baseado em cenas, que é usado para representar vídeos em experimentos de geolocalização. Geo-localização de vídeos é a tarefa de atribuir uma posição geográfica para um dado vídeo. A avaliação foi conduzida no contexto da Placing Task da competição MediaEval e o modelo proposto mostrou resultados promissores<br>Abstract: Effectively encoding visual properties from multimedia content is challenging. One popular approach to deal with this challenge is the visual dictionary model. In this model, images are handled as an unordered set of local features being represented by the so-called bag-of-(visual-) words vector. In this thesis, we work on three research problems related to the visual dictionary model. The first research problem is concerned with the generalization power of dictionaries, which is related to the ability of representing well images from one dataset even using a dictionary created over other dataset, or using a dictionary created on small dataset samples. We perform experiments in closed datasets, as well as in a Web environment. Obtained results suggest that diverse samples in terms of appearances are enough to generate a good dictionary. The second research problem is related to the importance of the spatial information of visual words in the image space, which could be crucial to distinguish types of objects and scenes. The traditional pooling methods usually discard the spatial configuration of visual words in the image. We have proposed a pooling method, named Word Spatial Arrangement (WSA), which encodes the relative position of visual words in the image, having the advantage of generating more compact feature vectors than most of the existing spatial pooling strategies. Experiments for image retrieval show that WSA outperforms the most popular spatial pooling method, the Spatial Pyramids. The third research problem under investigation in this thesis is related to the lack of semantic information in the visual dictionary model. We show that the problem of having no semantics in the space of low-level descriptions is reduced when we move to the bag-of-words representation. However, even in the bag-of-words space, we show that there is little separability between distance distributions of different semantic concepts. Therefore, we question about moving one step further and propose a representation based on visual words which carry more semantics, according to the human visual perception. We have proposed a bag-of-prototypes model, according to which the prototypes are the elements containing more semantics. This approach goes in the direction of reducing the so-called semantic gap problem. We propose a dictionary based on scenes that are used ix for video representation in experiments for video geocoding. Video geocoding is the task of assigning a geographic location to a given video. The evaluation was performed in the context of the Placing Task of the MediaEval challenge and the proposed bag-of-scenes model has shown promising performance<br>Doutorado<br>Ciência da Computação<br>Doutor em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

37

Ma, Xiren. "Deep Learning-Based Vehicle Recognition Schemes for Intelligent Transportation Systems." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42247.

Full text

Abstract:

With the increasing highlighted security concerns in Intelligent Transportation System (ITS), Vision-based Automated Vehicle Recognition (VAVR) has attracted considerable attention recently. A comprehensive VAVR system contains three components: Vehicle Detection (VD), Vehicle Make and Model Recognition (VMMR), and Vehicle Re-identification (VReID). These components perform coarse-to-fine recognition tasks in three steps. The VAVR system can be widely used in suspicious vehicle recognition, urban traffic monitoring, and automated driving system. Vehicle recognition is complicated due to the subtle visual differences between different vehicle models. Therefore, how to build a VAVR system that can fast and accurately recognize vehicle information has gained tremendous attention. In this work, by taking advantage of the emerging deep learning methods, which have powerful feature extraction and pattern learning abilities, we propose several models used for vehicle recognition. First, we propose a novel Recurrent Attention Unit (RAU) to expand the standard Convolutional Neural Network (CNN) architecture for VMMR. RAU learns to recognize the discriminative part of a vehicle on multiple scales and builds up a connection with the prominent information in a recurrent way. The proposed ResNet101-RAU achieves excellent recognition accuracy of 93.81% on the Stanford Cars dataset and 97.84% on the CompCars dataset. Second, to construct efficient vehicle recognition models, we simplify the structure of RAU and propose a Lightweight Recurrent Attention Unit (LRAU). The proposed LRAU extracts the discriminative part features by generating attention masks to locate the keypoints of a vehicle (e.g., logo, headlight). The attention mask is generated based on the feature maps received by the LRAU and the preceding attention state generated by the preceding LRAU. Then, by adding LRAUs to the standard CNN architectures, we construct three efficient VMMR models. Our models achieve the state-of-the-art results with 93.94% accuracy on the Stanford Cars dataset, 98.31% accuracy on the CompCars dataset, and 99.41% on the NTOU-MMR dataset. In addition, we construct a one-stage Vehicle Detection and Fine-grained Recognition (VDFG) model by combining our LRAU with the general object detection model. Results show the proposed VDFG model can achieve excellent performance with real-time processing speed. Third, to address the VReID task, we design the Compact Attention Unit (CAU). CAU has a compact structure, and it relies on a single attention map to extract the discriminative local features of a vehicle. We add two CAUs to the truncated ResNet to construct a small but efficient VReID model, ResNetT-CAU. Compared with the original ResNet, the model size of ResNetT-CAU is reduced by 60%. Extensive experiments on the VeRi and VehicleID dataset indicate the proposed ResNetT-CAU achieve the best re-identification results on both datasets. In summary, the experimental results on the challenging benchmark VMMR and VReID datasets indicate our models achieve the best VMMR and VReID performance, and our models have a small model size and fast image processing speed.

APA, Harvard, Vancouver, ISO, and other styles

38

Kozlovski, Nikolai. "TEXT-IMAGE RESTORATION AND TEXT ALIGNMENT FOR MULTI-ENGINE OPTICAL CHARACTER RECOGNITION SYSTEMS." Master's thesis, University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3607.

Full text

Abstract:

Previous research showed that combining three different optical character recognition (OCR) engines (ExperVision® OCR, Scansoft OCR, and Abbyy® OCR) results using voting algorithms will get higher accuracy rate than each of the engines individually. While a voting algorithm has been realized, several aspects to automate and improve the accuracy rate needed further research. This thesis will focus on morphological image preprocessing and morphological text restoration that goes to OCR engines. This method is similar to the one used in restoration partial finger prints. Series of morphological dilating and eroding filters of various mask shapes and sizes were applied to text of different font sizes and types with various noises added. These images were then processed by the OCR engines, and based on these results successful combinations of text, noise, and filters were chosen. The thesis will also deal with the problem of text alignment. Each OCR engine has its own way of dealing with noise and corrupted characters; as a result, the output texts of OCR engines have different lengths and number of words. This in turn, makes it impossible to use spaces a delimiter as a method to separate the words for processing by the voting part of the system. Text aligning determines, using various techniques, what is an extra word, what is supposed to be two or more words instead of one, which words are missing in one document compared to the other, etc. Alignment algorithm is made up of a series of shifts in the two texts to determine which parts are similar and which are not. Since errors made by OCR engines are due to visual misrecognition, in addition to simple character comparison (equal or not), a technique was developed that allows comparison of characters based on how they look.<br>M.S.E.E.<br>Department of Electrical and Computer Engineering<br>Engineering and Computer Science<br>Electrical Engineering

APA, Harvard, Vancouver, ISO, and other styles

39

Ndiour, Ibrahima Jacques. "Dynamic curve estimation for visual tracking." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37283.

Full text

Abstract:

This thesis tackles the visual tracking problem as a target contour estimation problem in the face of corrupted measurements. The major aim is to design robust recursive curve filters for accurate contour-based tracking. The state-space representation adopted comprises of a group component and a shape component describing the rigid motion and the non-rigid shape deformation respectively; filtering strategies on each component are then decoupled. The thesis considers two implicit curve descriptors, a classification probability field and the traditional signed distance function, and aims to develop an optimal probabilistic contour observer and locally optimal curve filters. For the former, introducing a novel probabilistic shape description simplifies the filtering problem on the infinite-dimensional space of closed curves to a series of point-wise filtering tasks. The definition and justification of a novel update model suited to the shape space, the derivation of the filtering equations and the relation to Kalman filtering are studied. In addition to the temporal consistency provided by the filtering, extensions involving distributed filtering methods are considered in order to maintain spatial consistency. For the latter, locally optimal closed curve filtering strategies involving curve velocities are explored. The introduction of a local, linear description for planar curve variation and curve uncertainty enables the derivation of a mechanism for estimating the optimal gain associated to the curve filtering process, given quantitative uncertainty levels. Experiments on synthetic and real sequences of images validate the filtering designs.

APA, Harvard, Vancouver, ISO, and other styles

40

Osorio, Fernando Santos. "Um estudo sobre reconhecimento visual de caracteres através de redes neurais." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 1991. http://hdl.handle.net/10183/24184.

Full text

Abstract:

Este trabalho apresenta um estudo sabre reconhecimento visual de caracteres através da utilização das redes neurais. São abordados os assuntos referentes ao Processamento Digital de Imagens, aos sistemas de reconhecimento de caracteres, e as redes neurais. Ao final é apresentada uma proposta de implementação de um sistema OCR orientado ao reconhecimento de caracteres impressos, que utiliza uma rede neural desenvolvida especificamente para esta aplicação. O sistema proposto, que é denominado de sistema N2OCR, possui um protótipo implementado que também é descrito neste trabalho. Em relação ao Processamento Digital de Imagens são apresentados diversos temas, abrangendo os assuntos referentes à aquisição de imagens, ao tratamento das imagens e ao reconhecimento de padrões. A respeito da aquisição de imagens são destacados os aspectos referentes aos dispositivos de aquisição e os tipos de imagens obtidas através destes. Sobre o tratamento de imagens são abordados os aspectos referentes a imagens textuais, incluindo: halftoning, geração e modificação de histograma, limiarização e operações de filtragem. Quanto ao reconhecimento de padrões é feita uma breve análise das técnicas relacionadas a este tema. Os diversos tipos de sistemas de reconhecimento de caracteres são abordados, assim coma as técnicas e algoritmos empregados por estes. Além destes tópicos é apresentada uma discussão a respeito da avaliação dos resultados obtidos por estes sistemas, assim como é feita uma análise das principais dificuldades enfrentadas por estas aplicações. Neste trabalho é feita uma apresentação a respeito das redes neurais, suas características, histórico e evolução das pesquisas nesta área. É feita uma descrição dos principais modelos de redes neurais em destaque na atualidade: Perceptron, Adaline, Madaline, redes multinível, ART, modelo de Hopfield, máquina de Boltzmann, BAM e modelo de Kohonen. A partir da análise dos diferentes modelos de redes neurais empregados na atualidade, chega-se a proposta de um novo modelo de rede a ser utilizado pelo sistema N2OCR. São descritos os itens referentes ao aprendizado, ao reconhecimento e as possíveis extensões deste novo modelo. Também é abordada a possibilidade de implementação de um hardware dedicado para este modelo. No final deste trabalho é fornecida uma visão global do sistema N2OCR, descrevendo cada um de seus módulos. Também é feita uma descrição do protótipo implementado e de suas funções.<br>This work presents a study of visual character recognition using neural networks. It describes some aspects related to Digital Image Processing, character recognition systems and neural networks. The implementation proposal of one OCR system, for printed character recognition, is also presented. This system uses one neural network specifically developed for this purpose. The OCR system, named N2OCR, has a prototype implementation, which is also described. Several topics related to Digital Image Processing are presented, including some referent to image acquisition, image processing and pattern recognition. Some aspects on image acquisiton are treated, like acquisition equipments and kinds of image data obtained from those equipments. The following items about text image processing are mentioned: halftoning, hystogram generation and alteration, thresholding and filtering operations. A brief analysis about pattern recognition related to this theme is done. Different kinds of character recognition systems are described, as the techniques and algorithms used by them. Besides, a di cussi on about performance estimation of this OCR systems is done, including typical OCR problems description and analysis. In this work, neural networks are presented, describing their characteristics, historical aspects and research evolution in this field. Different famous neural network models are described: Perceptron, Adaline, Madaline, multilevel networks. ART, Hopfield's model , Boltzmann machine, BAM and Kohonen's model. From the analysis of such different neural network models, we arrive to a proposal of a new neural net model, where are described items related to learning, recognition and possible model extensions. A possible hardware implementation of this model is also presented. A global vision of N2OCR system is presented at the end of this work, describing each of its modules. A description of the prototype implementation and functions is also provided.

APA, Harvard, Vancouver, ISO, and other styles

41

Makrushin, Andrey [Verfasser], and Jana [Akademischer Betreuer] Dittmann. "Visual recognition systems in a car passenger compartment with the focus on facial driver identification / Andrey Makrushin. Betreuer: Jana Dittmann." Magdeburg : Universitätsbibliothek, 2014. http://d-nb.info/1054638888/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Wåhlén, Herje. "Voice Assisted Visual Search." Thesis, Umeå universitet, Institutionen för informatik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-38204.

Full text

Abstract:

The amount and variety of visual information presented on electronic displays is ever-increasing. Finding and acquiring relevant information in the most effective manner possible is of course desirable. While there are advantages to presenting a large number of information objects on a screen at the same time, it can also hinder fast detection of objects of interest. One way of addressing that problem is Voice Assisted Visual Search (VAVS). A user supported by VAVS calls out an object of interest and is immediately guided to the object by a highlighting cue. This thesis is an initial study of the VAVS user interface technique. The findings suggest that VAVS is a promising approach, supported by theory and practice. A working prototype shows that locating objects of interest can be sped up significantly, requiring only half the amount of time taken without the use of VAVS, on average.<br>Voice-Assisted Visual Search

APA, Harvard, Vancouver, ISO, and other styles

43

Lee, Jehoon. "Statistical and geometric methods for visual tracking with occlusion handling and target reacquisition." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/43582.

Full text

Abstract:

Computer vision is the science that studies how machines understand scenes and automatically make decisions based on meaningful information extracted from an image or multi-dimensional data of the scene, like human vision. One common and well-studied field of computer vision is visual tracking. It is challenging and active research area in the computer vision community. Visual tracking is the task of continuously estimating the pose of an object of interest from the background in consecutive frames of an image sequence. It is a ubiquitous task and a fundamental technology of computer vision that provides low-level information used for high-level applications such as visual navigation, human-computer interaction, and surveillance system. The focus of the research in this thesis is visual tracking and its applications. More specifically, the object of this research is to design a reliable tracking algorithm for a deformable object that is robust to clutter and capable of occlusion handling and target reacquisition in realistic tracking scenarios by using statistical and geometric methods. To this end, the approaches developed in this thesis make extensive use of region-based active contours and particle filters in a variational framework. In addition, to deal with occlusions and target reacquisition problems, we exploit the benefits of coupling 2D and 3D information of an image and an object. In this thesis, first, we present an approach for tracking a moving object based on 3D range information in stereoscopic temporal imagery by combining particle filtering and geometric active contours. Range information is weighted by the proposed Gaussian weighting scheme to improve segmentation achieved by active contours. In addition, this work present an on-line shape learning method based on principal component analysis to reacquire track of an object in the event that it disappears from the field of view and reappears later. Second, we propose an approach to jointly track a rigid object in a 2D image sequence and to estimate its pose in 3D space. In this work, we take advantage of knowledge of a 3D model of an object and we employ particle filtering to generate and propagate the translation and rotation parameters in a decoupled manner. Moreover, to continuously track the object in the presence of occlusions, we propose an occlusion detection and handling scheme based on the control of the degree of dependence between predictions and measurements of the system. Third, we introduce the fast level-set based algorithm applicable to real-time applications. In this algorithm, a contour-based tracker is improved in terms of computational complexity and the tracker performs real-time curve evolution for detecting multiple windows. Lastly, we deal with rapid human motion in context of object segmentation and visual tracking. Specifically, we introduce a model-free and marker-less approach for human body tracking based on a dynamic color model and geometric information of a human body from a monocular video sequence. The contributions of this thesis are summarized as follows: 1. Reliable algorithm to track deformable objects in a sequence consisting of 3D range data by combining particle filtering and statistics-based active contour models. 2. Effective handling scheme based on object's 2D shape information for the challenging situations in which the tracked object is completely gone from the image domain during tracking. 3. Robust 2D-3D pose tracking algorithm using a 3D shape prior and particle filters on SE(3). 4. Occlusion handling scheme based on the degree of trust between predictions and measurements of the tracking system, which is controlled in an online fashion. 5. Fast level set based active contour models applicable to real-time object detection. 6. Model-free and marker-less approach for tracking of rapid human motion based on a dynamic color model and geometric information of a human body.

APA, Harvard, Vancouver, ISO, and other styles

44

Pereira, Joaquim Jose Fantin. "Uma ferramenta de programação visual para previsão e reconhecimento de padrões." [s.n.], 2007. http://repositorio.unicamp.br/jspui/handle/REPOSIP/259920.

Full text

Abstract:

Orientador: Takaaki Ohishi<br>Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação<br>Made available in DSpace on 2018-08-10T05:34:59Z (GMT). No. of bitstreams: 1 Pereira_JoaquimJoseFantin_M.pdf: 1686812 bytes, checksum: 5ff18327a2f501a5035fbf6c56ae0eda (MD5) Previous issue date: 2007<br>Resumo: A tomada de decisão, em qualquer setor e nos mais diversos níveis, é um processo cada vez mais complexo, principalmente em função do nível de incerteza em relação ao futuro. Neste contexto, a disponibilidade de previsões torna-se um fator importante para uma decisão mais eficaz. As ferramentas de reconhecimento de padrões, por sua vez, são importantes em muitas áreas, tais como nas determinações de comportamentos típicos e em sistemas de controle. Nessa conjuntura, a proposta deste trabalho consistiu em explorar a criação e o uso de uma linguagem de programação visual, denominada Linguagem VisualPREV, de modo a facilitar a concepção e a execução dos modelos de previsão e classificação. Nesta Linguagem, blocos visuais colocados num diagrama (interface visual computacional) representam conceitos envolvidos num processo de modelagem do problema. O modelo pode então ser configurado, executado e armazenado para acesso futuro. Embora essa escolha implique uma perda de vantagens exclusivas da programação em código tradicional, como a maior flexibilidade para programação genérica, por exemplo, a linguagem diminui sensivelmente o tempo de criação dos modelos específicos para tratamento de dados em previsão de séries temporais e reconhecimento de padrões. Em algumas aplicações com dados relevantes, a linguagem foi avaliada com critérios baseados em métricas de usabilidade e os resultados foram discutidos ao longo do trabalho<br>Abstract : Decision making, in any area and in many different levels, is a process with growing complexity, mainly if you consider the level of uncertainty related to the future. In this context, the possibility of forecasting plays a major role in an efficient decision. On the other hand, pattern recognition tools are important in many areas, like fitting typical behaviors and in control systems, as well. In this context, we propose a visual programming language, called VisualPREV Language, intended to make easier the conception and execution of forecasting and pattern recognition models. Within this language, visual blocks that can be put into a diagram (computational visual interface) represent concepts involved when modeling the processes. These models can be configured, executed and stored for future access. Although these approach implies losing exclusive advantages of traditional programming (like flexibility of generic programming, for example), VisualPREV decreases considerably the amount of time needed for creating specific models for forecasting and pattern recognition. In few applications with relevant data, the language was evaluated based on usability metrics, and the results were discussed throughout the text<br>Mestrado<br>Energia Eletrica<br>Mestre em Engenharia Elétrica

APA, Harvard, Vancouver, ISO, and other styles

45

Blanco, Myra. "Relationship Between Driver Characteristics, Nighttime Driving Risk Perception, and Visual Performance under Adverse and Clear Weather Conditions and Different Vision Enhancement Systems." Diss., Virginia Tech, 2002. http://hdl.handle.net/10919/27806.

Full text

Abstract:

Vehicle crashes remain the leading cause of accidental death and injuries in the United States, claiming tens of thousands of lives and injuring millions of people each year. Many of these crashes occur during nighttime, where a variety of modifiers affect the risk of a crash, primarily through the reduction of object visibility. Furthermore, many of these modifiers also affect the nighttime mobility of older drivers, who avoid driving during the nighttime. Thus, a two-fold need exists for new technologies that enhance night visibility. Two separate studies were completed as part of this research. Study 1 served as a baseline by evaluating visual performance during nighttime driving under clear weather conditions. Visual performance was evaluated in terms of the detection and recognition distances obtained when different vision enhancement systems were used at the Smart Road testing facility. Study 2, also using detection and recognition distances, compared the visual performance of drivers during low visibility conditions (i.e., due to rain) to the risk perception of driving during nighttime under low visibility conditions. These comparisons were made as a function of various vision enhancement systems. The age of the driver and the characteristics of the object presented (e.g., contrast, motion) were variables of interest in both studies. The pivotal contribution of this investigation is the generation of a model describing the relationships between driver characteristics, risk perception, and visual performance in nighttime driving in the context of a variety of standard and prototype vision enhancement systems. Improvement of mobility, especially for older individuals, can be achieved through better understanding of the factors that increase risk perception, identification of systems that improve detection and recognition distances, and consideration of drivers' opinions on possible solutions that improve nighttime driving safety. In addition, this research effort empirically described the night vision enhancement capabilities of 12 different vision enhancement systems during clear and adverse weather environments.<br>Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

46

Kenklies, Kai Malte. "Instructing workers through a head-worn Augmented Reality display and through a stationary screen on manual industrial assembly tasks : A comparison study." Thesis, Umeå universitet, Institutionen för informatik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-172888.

Full text

Abstract:

It was analyzed if instructions on a head-worn Augmented Reality display (AR-HWD) are better for manual industrial assembly tasks than instructions on a stationary screen. A prototype was built which consisted of virtual instruction screens for two example assembly tasks. In a comparison study participants performed the tasks with instructions through an AR-HWD and alternatively through a stationary screen. Questionnaires, interviews and observation notes were used to evaluate the task performances and the user experience. The study revealed that the users were excited and enjoyed trying the technology. The perceived usefulness at the current state was diverse, but the users saw a huge potential in AR-HWDs for the future. The task accuracy with instructions on the AR-HWD was equally good as with instructions on the screen. AR-HWDs are found to be a better approach than a stationary screen, but technological limitations need to be overcome and workers need to train using the new technology to make its application efficient.

APA, Harvard, Vancouver, ISO, and other styles

47

Lam, Benny, and Jakob Nilsson. "Creating Good User Experience in a Hand-Gesture-Based Augmented Reality Game." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-156878.

Full text

Abstract:

The dissemination of new innovative technology requires feasibility and simplicity. The problem with marker-based augmented reality is similar to glove-based hand gesture recognition: they both require an additional component to function. This thesis investigates the possibility of combining markerless augmented reality together with appearance-based hand gesture recognition by implementing a game with good user experience. The methods employed in this research consist of a game implementation and a pre-study meant for measuring interactive accuracy and precision, and for deciding upon which gestures should be utilized in the game. A test environment was realized in Unity using ARKit and Manomotion SDK. Similarly, the implementation of the game used the same development tools. However, Blender was used for creating the 3D models. The results from 15 testers showed that the pinching gesture was the most favorable one. The game was evaluated with a System Usability Scale (SUS) and received a score of 70.77 among 12 game testers, which indicates that the augmented reality game, which interaction method is solely based on bare-hands, can be quite enjoyable.

APA, Harvard, Vancouver, ISO, and other styles

48

Hill, Evelyn June. "Applying statistical and syntactic pattern recognition techniques to the detection of fish in digital images." University of Western Australia. School of Mathematics and Statistics, 2004. http://theses.library.uwa.edu.au/adt-WU2004.0070.

Full text

Abstract:

This study is an attempt to simulate aspects of human visual perception by automating the detection of specific types of objects in digital images. The success of the methods attempted here was measured by how well results of experiments corresponded to what a typical human’s assessment of the data might be. The subject of the study was images of live fish taken underwater by digital video or digital still cameras. It is desirable to be able to automate the processing of such data for efficient stock assessment for fisheries management. In this study some well known statistical pattern classification techniques were tested and new syntactical/ structural pattern recognition techniques were developed. For testing of statistical pattern classification, the pixels belonging to fish were separated from the background pixels and the EM algorithm for Gaussian mixture models was used to locate clusters of pixels. The means and the covariance matrices for the components of the model were used to indicate the location, size and shape of the clusters. Because the number of components in the mixture is unknown, the EM algorithm has to be run a number of times with different numbers of components and then the best model chosen using a model selection criterion. The AIC (Akaike Information Criterion) and the MDL (Minimum Description Length) were tested.The MDL was found to estimate the numbers of clusters of pixels more accurately than the AIC, which tended to overestimate cluster numbers. In order to reduce problems caused by initialisation of the EM algorithm (i.e. starting positions of mixtures and number of mixtures), the Dynamic Cluster Finding algorithm (DCF) was developed (based on the Dog-Rabbit strategy). This algorithm can produce an estimate of the locations and numbers of clusters of pixels. The Dog-Rabbit strategy is based on early studies of learning behaviour in neurons. The main difference between Dog-Rabbit and DCF is that DCF is based on a toroidal topology which removes the tendency of cluster locators to migrate to the centre of mass of the data set and miss clusters near the edges of the image. In the second approach to the problem, data was extracted from the image using an edge detector. The edges from a reference object were compared with the edges from a new image to determine if the object occurred in the new image. In order to compare edges, the edge pixels were first assembled into curves using an UpWrite procedure; then the curves were smoothed by fitting parametric cubic polynomials. Finally the curves were converted to arrays of numbers which represented the signed curvature of the curves at regular intervals. Sets of curves from different images can be compared by comparing the arrays of signed curvature values, as well as the relative orientations and locations of the curves. Discrepancy values were calculated to indicate how well curves and sets of curves matched the reference object. The total length of all matched curves was used to indicate what fraction of the reference object was found in the new image. The curve matching procedure gave results which corresponded well with what a human being being might observe.

APA, Harvard, Vancouver, ISO, and other styles

49

Hernández-Vela, Antonio. "From pixels to gestures: learning visual representations for human analysis in color and depth data sequences." Doctoral thesis, Universitat de Barcelona, 2015. http://hdl.handle.net/10803/292488.

Full text

Abstract:

The visual analysis of humans from images is an important topic of interest due to its relevance to many computer vision applications like pedestrian detection, monitoring and surveillance, human-computer interaction, e-health or content-based image retrieval, among others. In this dissertation in learning different visual representations of the human body that are helpful for the visual analysis of humans in images and video sequences. To that end, we analyze both RCB and depth image modalities and address the problem from three different research lines, at different levels of abstraction; from pixels to gestures: human segmentation, human pose estimation and gesture recognition. First, we show how binary segmentation (object vs. background) of the human body in image sequences is helpful to remove all the background clutter present in the scene. The presented method, based on “Graph cuts” optimization, enforces spatio-temporal consistency of the produced segmentation masks among consecutive frames. Secondly, we present a framework for multi-label segmentation for obtaining much more detailed segmentation masks: instead of just obtaining a binary representation separating the human body from the background, finer segmentation masks can be obtained separating the different body parts. At a higher level of abstraction, we aim for a simpler yet descriptive representation of the human body. Human pose estimation methods usually rely on skeletal models of the human body, formed by segments (or rectangles) that represent the body limbs, appropriately connected following the kinematic constraints of the human body, In practice, such skeletal models must fulfill some constraints in order to allow for efficient inference, while actually Iimiting the expressiveness of the model. In order to cope with this, we introduce a top-down approach for predicting the position of the body parts in the model, using a mid-level part representation based on Poselets. Finally, we propose a framework for gesture recognition based on the bag of visual words framework. We leverage the benefits of RGB and depth image modalities by combining modality-specific visual vocabularies in a late fusion fashion. A new rotation-variant depth descriptor is presented, yielding better results than other state-of-the-art descriptors. Moreover, spatio-temporal pyramids are used to encode rough spatial and temporal structure. In addition, we present a probabilistic reformulation of Dynamic Time Warping for gesture segmentation in video sequences, A Gaussian-based probabilistic model of a gesture is learnt, implicitly encoding possible deformations in both spatial and time domains.<br>L’anàlisi visual de persones a partir d'imatges és un tema de recerca molt important, atesa la rellevància que té a una gran quantitat d'aplicacions dins la visió per computador, com per exemple: detecció de vianants, monitorització i vigilància,interacció persona-màquina, “e-salut” o sistemes de recuperació d’matges a partir de contingut, entre d'altres. En aquesta tesi volem aprendre diferents representacions visuals del cos humà, que siguin útils per a la anàlisi visual de persones en imatges i vídeos. Per a tal efecte, analitzem diferents modalitats d'imatge com són les imatges de color RGB i les imatges de profunditat, i adrecem el problema a diferents nivells d'abstracció, des dels píxels fins als gestos: segmentació de persones, estimació de la pose humana i reconeixement de gestos. Primer, mostrem com la segmentació binària (objecte vs. fons) del cos humà en seqüències d'imatges ajuda a eliminar soroll pertanyent al fons de l'escena en qüestió. El mètode presentat, basat en optimització “Graph cuts”, imposa consistència espai-temporal a Ies màscares de segmentació obtingudes en “frames” consecutius. En segon lloc, presentem un marc metodològic per a la segmentació multi-classe, amb la qual podem obtenir una descripció més detallada del cos humà, en comptes d'obtenir una simple representació binària separant el cos humà del fons, podem obtenir màscares de segmentació més detallades, separant i categoritzant les diferents parts del cos. A un nivell d'abstraccíó més alt, tenim com a objectiu obtenir representacions del cos humà més simples, tot i ésser suficientment descriptives. Els mètodes d'estimació de la pose humana sovint es basen en models esqueletals del cos humà, formats per segments (o rectangles) que representen les extremitats del cos, connectades unes amb altres seguint les restriccions cinemàtiques del cos humà. A la pràctica, aquests models esqueletals han de complir certes restriccions per tal de poder aplicar mètodes d'inferència que permeten trobar la solució òptima de forma eficient, però a la vegada aquestes restriccions suposen una gran limitació en l'expressivitat que aques.ts models son capaços de capturar. Per tal de fer front a aquest problema, proposem un enfoc “top-down” per a predir la posició de les parts del cos del model esqueletal, introduïnt una representació de parts de mig nivell basada en “Poselets”. Finalment. proposem un marc metodològic per al reconeixement de gestos, basat en els “bag of visual words”. Aprofitem els avantatges de les imatges RGB i les imatges; de profunditat combinant vocabularis visuals específiques per a cada modalitat, emprant late fusion. Proposem un nou descriptor per a imatges de profunditat invariant a rotació, que millora l'estat de l'art, i fem servir piràmides espai-temporals per capturar certa estructura espaial i temporal dels gestos. Addicionalment, presentem una reformulació probabilística del mètode “Dynamic Time Warping” per al reconeixement de gestos en seqüències d'imatges. Més específicament, modelem els gestos amb un model probabilistic gaussià que implícitament codifica possibles deformacions tant en el domini espaial com en el temporal.

APA, Harvard, Vancouver, ISO, and other styles

50

Dekhtiar, Jonathan. "Deep Learning and unsupervised learning to automate visual inspection in the manufacturing industry." Thesis, Compiègne, 2019. http://www.theses.fr/2019COMP2513.

Full text

Abstract:

La croissance exponentielle des besoins et moyens informatiques implique un besoin croissant d’automatisation des procédés industriels. Ce constat est en particulier visible pour l’inspection visuelle automatique sur ligne de production. Bien qu’étudiée depuis 1970, peine toujours à être appliquée à de larges échelles et à faible coûts. Les méthodes employées dépendent grandement de la disponibilité des experts métiers. Ce qui provoque inévitablement une augmentation des coûts et une réduction de la flexibilité des méthodes employées. Depuis 2012, les avancées dans le domaine associé à l’étude des réseaux neuronaux profonds (i.e. Deep Learning) a permis de nombreux progrès en ce sens, notamment grâce au réseaux neuronaux convolutif qui ont atteint des performances proches de l’humain dans de nombreux domaines associées à la perception visuelle (e.g. reconnaissance et détection d’objets, etc.). Cette thèse propose une approche non supervisée pour répondre aux besoins de l’inspection visuelle automatique. Cette méthode, baptisé AnoAEGAN, combine l’apprentissage adversaire et l’estimation d’une fonction de densité de probabilité. Ces deux approches complémentaires permettent d’estimer jointement la probabilité pixel par pixel d’un défaut visuel sur une image. Le modèle est entrainé à partir d’un nombre très limités d’images (i.e. inférieur à 1000 images) sans utilisation de connaissance expert pour « étiqueter » préalablement les données. Cette méthode permet une flexibilité accrue par la rapidité d’entrainement du modèle et une grande versatilité, démontrée sur dix tâches différentes sans la moindre modification du modèle. Cette méthode devrait permettre de réduire les coûts de développement et le temps nécessaire de déploiement en production. Cette méthode peut être également déployée de manière complémentaire à une approche supervisée afin de bénéficier des avantages de chaque approche<br>Although studied since 1970, automatic visual inspection on production lines still struggles to be applied on a large scale and at low cost. The methods used depend greatly on the availability of domain experts. This inevitably leads to increased costs and reduced flexibility in the methods used. Since 2012, advances in the field of Deep Learning have enabled many advances in this direction, particularly thanks to convolutional neura networks that have achieved near-human performance in many areas associated with visual perception (e.g. object recognition and detection, etc.). This thesis proposes an unsupervised approach to meet the needs of automatic visual inspection. This method, called AnoAEGAN, combines adversarial learning and the estimation of a probability density function. These two complementary approaches make it possible to jointly estimate the pixel-by-pixel probability of a visual defect on an image. The model is trained from a very limited number of images (i.e. less than 1000 images) without using expert knowledge to "label" the data beforehand. This method allows increased flexibility with a limited training time and therefore great versatility, demonstrated on ten different tasks without any modification of the model. This method should reduce development costs and the time required to deploy in production. This method can also be deployed in a complementary way to a supervised approach in order to benefit from the advantages of each approach

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!