To see the other types of publications on this topic, follow the link: Visual object recognition.

Dissertations / Theses on the topic 'Visual object recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Visual object recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Figueroa, Flores Carola. "Visual Saliency for Object Recognition, and Object Recognition for Visual Saliency." Doctoral thesis, Universitat Autònoma de Barcelona, 2021. http://hdl.handle.net/10803/671964.

Full text
Abstract:
Per als humans, el reconeixement d’objectes és un procés gairebé instantani, precís i extremadament adaptable. A més, tenim la capacitat innata d’aprendre classes d’objectes nous a partir d’uns pocs exemples. El cervell humà redueix la complexitat de les dades entrants filtrant part de la informació i processant només aquelles coses que ens capturen l’atenció. Això, barrejat amb la nostra predisposició biològica per respondre a determinades formes o colors, ens permet reconèixer en un simple cop d’ull les regions més importants o destacades d’una imatge. Aquest mecanisme es pot observar analitzant sobre quines parts de les imatges hi posa l’atenció; on es fixen els ulls quan se’ls mostra una imatge. La forma més precisa de registrar aquest comportament és fer un seguiment dels moviments oculars mentre es mostren imatges. L’estimació computacional de la salubritat té com a objectiu identificar fins a quin punt les regions o els objectes destaquen respecte als seus entorns per als observadors humans. Els mapes Saliency es poden utilitzar en una àmplia gamma d’aplicacions, inclosa la detecció d’objectes, la compressió d’imatges i vídeos i el seguiment visual. La majoria de les investigacions en aquest camp s’han centrat en estimar automàticament els mapes de salubritat donats una imatge d’entrada. En el seu lloc, en aquesta tesi, ens proposem incorporar mapes de salubritat en una canalització de reconeixement d’objectes: volem investigar si els mapes de salubritat poden millorar els resultats del reconeixement d’objectes.En aquesta tesi, identifiquem diversos problemes relacionats amb l’estimació de la salubritat visual. En primer lloc, fins a quin punt es pot aprofitar l’estimació de la salubritat per millorar la formació d’un model de reconeixement d’objectes quan es disposa de dades d’entrenament escasses. Per solucionar aquest problema, dissenyem una xarxa de classificació d’imatges que incorpori informació d’informació salarial com a entrada. Aquesta xarxa processa el mapa de saliència a través d’una branca de xarxa dedicada i utilitza les característiques resultants per modular les característiques visuals estàndard de baix a dalt de l’entrada d’imatge original. Ens referirem a aquesta tècnica com a classificació d’imatges modulades en salinitat (SMIC). En amplis experiments sobre conjunts de dades de referència estàndard per al reconeixement d’objectes de gra fi, demostrem que la nostra arquitectura proposada pot millorar significativament el rendiment, especialment en el conjunt de dades amb dades de formació escasses.A continuació, abordem l’inconvenient principal de la canonada anterior: SMIC requereix un algorisme de saliència explícit que s’ha de formar en un conjunt de dades de saliència. Per solucionar-ho, implementem un mecanisme d’al·lucinació que ens permet incorporar la branca d’estimació de la salubritat en una arquitectura de xarxa neuronal entrenada de punta a punta que només necessita la imatge RGB com a entrada. Un efecte secundari d’aquesta arquitectura és l’estimació de mapes de salubritat. En experiments, demostrem que aquesta arquitectura pot obtenir resultats similars en reconeixement d’objectes com SMIC, però sense el requisit de mapes de salubritat de la veritat del terreny per entrenar el sistema. Finalment, hem avaluat la precisió dels mapes de salubritat que es produeixen com a efecte secundari del reconeixement d’objectes. Amb aquest propòsit, fem servir un conjunt de conjunts de dades de referència per a l’avaluació de la validesa basats en experiments de seguiment dels ulls. Sorprenentment, els mapes de salubritat estimats són molt similars als mapes que es calculen a partir d’experiments de rastreig d’ulls humans. Els nostres resultats mostren que aquests mapes de salubritat poden obtenir resultats competitius en els mapes de salubritat de referència. En un conjunt de dades de saliència sintètica, aquest mètode fins i tot obté l’estat de l’art sense la necessitat d’haver vist mai una imatge de saliència real.
El reconocimiento de objetos para los seres humanos es un proceso instantáneo, preciso y extremadamente adaptable. Además, tenemos la capacidad innata de aprender nuevas categorias de objetos a partir de unos pocos ejemplos. El cerebro humano reduce la complejidad de los datos entrantes filtrando parte de la información y procesando las cosas que captan nuestra atención. Esto, combinado con nuestra predisposición biológica a responder a determinadas formas o colores, nos permite reconocer en una simple mirada las regiones más importantes o destacadas de una imagen. Este mecanismo se puede observar analizando en qué partes de las imágenes los sujetos ponen su atención; por ejemplo donde fijan sus ojos cuando se les muestra una imagen. La forma más precisa de registrar este comportamiento es rastrear los movimientos de los ojos mientras se muestran imágenes. La estimación computacional del ‘saliency’, tiene como objetivo diseñar algoritmos que, dada una imagen de entrada, estimen mapas de ‘saliency’. Estos mapas se pueden utilizar en una variada gama de aplicaciones, incluida la detección de objetos, la compresión de imágenes y videos y el seguimiento visual. La mayoría de la investigación en este campo se ha centrado en estimar automáticamente estos mapas de ‘saliency’, dada una imagen de entrada. En cambio, en esta tesis, nos propusimos incorporar la estimación de ‘saliency’ en un procedimiento de reconocimiento de objeto, puesto que, queremos investigar si los mapas de ‘saliency’ pueden mejorar los resultados de la tarea de reconocimiento de objetos. En esta tesis, identificamos varios problemas relacionados con la estimación del ‘saliency’ visual. Primero, pudimos determinar en qué medida se puede aprovechar la estimación del ‘saliency’ para mejorar el entrenamiento de un modelo de reconocimiento de objetos cuando se cuenta con escasos datos de entrenamiento. Para resolver este problema, diseñamos una red de clasificación de imágenes que incorpora información de ‘saliency’ como entrada. Esta red procesa el mapa de ‘saliency’ a través de una rama de red dedicada y utiliza las características resultantes para modular las características visuales estándar ascendentes de la entrada de la imagen original. Nos referiremos a esta técnica como clasificación de imágenes moduladas por prominencia (SMIC en inglés). En numerosos experimentos realizando sobre en conjuntos de datos de referencia estándar para el reconocimiento de objetos ‘fine-grained’, mostramos que nuestra arquitectura propuesta puede mejorar significativamente el rendimiento, especialmente en conjuntos de datos con datos con escasos datos de entrenamiento. Luego, abordamos el principal inconveniente del problema anterior: es decir, SMIC requiere explícitamente un algoritmo de ‘saliency’, el cual debe entrenarse en un conjunto de datos de ‘saliency’. Para resolver esto, implementamos un mecanismo de alucinación que nos permite incorporar la rama de estimación de ‘saliency’ en una arquitectura de red neuronal entrenada de extremo a extremo que solo necesita la imagen RGB como entrada. Un efecto secundario de esta arquitectura es la estimación de mapas de ‘saliency’. En varios experimentos, demostramos que esta arquitectura puede obtener resultados similares en el reconocimiento de objetos como SMIC pero sin el requisito de mapas de ‘saliency’ para entrenar el sistema. Finalmente, evaluamos la precisión de los mapas de ‘saliency’ que ocurren como efecto secundario del reconocimiento de objetos. Para ello, utilizamos un de conjuntos de datos de referencia para la evaluación de la prominencia basada en experimentos de seguimiento ocular. Sorprendentemente, los mapas de ‘saliency’ estimados son muy similares a los mapas que se calculan a partir de experimentos de seguimiento ocular humano. Nuestros resultados muestran que estos mapas de ‘saliency’ pueden obtener resultados competitivos en mapas de ‘saliency’ de referencia.
For humans, the recognition of objects is an almost instantaneous, precise and extremely adaptable process. Furthermore, we have the innate capability to learn new object classes from only few examples. The human brain lowers the complexity of the incoming data by filtering out part of the information and only processing those things that capture our attention. This, mixed with our biological predisposition to respond to certain shapes or colors, allows us to recognize in a simple glance the most important or salient regions from an image. This mechanism can be observed by analyzing on which parts of images subjects place attention; where they fix their eyes when an image is shown to them. The most accurate way to record this behavior is to track eye movements while displaying images. Computational saliency estimation aims to identify to what extent regions or objects stand out with respect to their surroundings to human observers. Saliency maps can be used in a wide range of applications including object detection, image and video compression, and visual tracking. The majority of research in the field has focused on automatically estimating saliency maps given an input image. Instead, in this thesis, we set out to incorporate saliency maps in an object recognition pipeline: we want to investigate whether saliency maps can improve object recognition results. In this thesis, we identify several problems related to visual saliency estimation. First, to what extent the estimation of saliency can be exploited to improve the training of an object recognition model when scarce training data is available. To solve this problem, we design an image classification network that incorporates saliency information as input. This network processes the saliency map through a dedicated network branch and uses the resulting characteristics to modulate the standard bottom-up visual characteristics of the original image input. We will refer to this technique as saliency-modulated image classification (SMIC). In extensive experiments on standard benchmark datasets for fine-grained object recognition, we show that our proposed architecture can significantly improve performance, especially on dataset with scarce training data. Next, we address the main drawback of the above pipeline: SMIC requires an explicit saliency algorithm that must be trained on a saliency dataset. To solve this, we implement a hallucination mechanism that allows us to incorporate the saliency estimation branch in an end-to-end trained neural network architecture that only needs the RGB image as an input. A side-effect of this architecture is the estimation of saliency maps. In experiments, we show that this architecture can obtain similar results on object recognition as SMIC but without the requirement of ground truth saliency maps to train the system. Finally, we evaluated the accuracy of the saliency maps that occur as a side-effect of object recognition. For this purpose, we use a set of benchmark datasets for saliency evaluation based on eye-tracking experiments. Surprisingly, the estimated saliency maps are very similar to the maps that are computed from human eye-tracking experiments. Our results show that these saliency maps can obtain competitive results on benchmark saliency maps. On one synthetic saliency dataset this method even obtains the state-of-the-art without the need of ever having seen an actual saliency image for training.
Universitat Autònoma de Barcelona. Programa de Doctorat en Informàtica
APA, Harvard, Vancouver, ISO, and other styles
2

Fergus, Robert. "Visual object category recognition." Thesis, University of Oxford, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.425029.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wallenberg, Marcus. "Embodied Visual Object Recognition." Doctoral thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-132762.

Full text
Abstract:
Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. This is especially challenging due to the high dimensionality of image data. In cases where end-to-end learning from pixels to output is needed, mechanisms designed to make inputs tractable are often necessary for less computationally capable embodied systems.Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. Therefore, the way in which attention mechanisms should be introduced into feature extraction and estimation algorithms must be carefully considered when constructing a recognition system.This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, problem-specific feature selection, efficient estimator training and attentional modulation in convolutional neural networks. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. In order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. Feature selection and efficient discriminant sampling for decision tree-based estimators have also been implemented. Finally, attentional multi-layer modulation of convolutional neural networks for recognition in cluttered scenes has been evaluated. Several of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.
Embodied Visual Object Recognition
FaceTrack
APA, Harvard, Vancouver, ISO, and other styles
4

Breuel, Thomas M. "Geometric Aspects of Visual Object Recognition." Thesis, Massachusetts Institute of Technology, 1992. http://hdl.handle.net/1721.1/7342.

Full text
Abstract:
This thesis presents there important results in visual object recognition based on shape. (1) A new algorithm (RAST; Recognition by Adaptive Sudivisions of Tranformation space) is presented that has lower average-case complexity than any known recognition algorithm. (2) It is shown, both theoretically and empirically, that representing 3D objects as collections of 2D views (the "View-Based Approximation") is feasible and affects the reliability of 3D recognition systems no more than other commonly made approximations. (3) The problem of recognition in cluttered scenes is considered from a Bayesian perspective; the commonly-used "bounded-error errorsmeasure" is demonstrated to correspond to an independence assumption. It is shown that by modeling the statistical properties of real-scenes better, objects can be recognized more reliably.
APA, Harvard, Vancouver, ISO, and other styles
5

Meger, David Paul. "Visual object recognition for mobile platforms." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44682.

Full text
Abstract:
A robot must recognize objects in its environment in order to complete numerous tasks. Significant progress has been made in modeling visual appearance for image recognition, but the performance of current state-of-the-art approaches still falls short of that required by applications. This thesis describes visual recognition methods that leverage the spatial information sources available on-board mobile robots, such as the position of the platform in the world and the range data from its sensors, in order to significantly improve performance. Our research includes: a physical robotic platform that is capable of state-of-the-art recognition performance; a re-usable data set that facilitates study of the robotic recognition problem by the scientific community; and a three dimensional object model that demonstrates improved robustness to clutter. Based on our 3D model, we describe algorithms that integrate information across viewpoints, relate objects to auxiliary 3D sensor information, plan paths to next-best-views, explicitly model object occlusions and reason about the sub-parts of objects in 3D. Our approaches have been proven experimentally on-board the Curious George robot platform, which placed first in an international object recognition challenge for mobile robots for several years. We have also collected a large set of visual experiences from a robot, annotated the true objects in this data and made it public to the research community for use in performance evaluation. A path planning system derived from our model has been shown to hasten confident recognition by allowing informative viewpoints to be observed quickly. In each case studied, our system demonstrates significant improvements in recognition rate, in particular on realistic cluttered scenes, which promises more successful task execution for robotic platforms in the future.
APA, Harvard, Vancouver, ISO, and other styles
6

Mahmood, Hamid. "Visual Attention-based Object Detection and Recognition." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-94024.

Full text
Abstract:
This thesis is all about the visual attention, starting from understanding the human visual system up till applying this mechanism to a real-world computer vision application. This has been achieved by taking the advantage of latest findings about the human visual attention and the increased performance of the computers. These two facts played a vital role in simulating the many different aspects of this visual behavior. In addition, the concept of bio-inspired visual attention systems have become applicable due to the emergence of different interdisciplinary approaches to vision which leads to a beneficial interaction between the scientists related to different fields. The problems of high complexities in computer vision lead to consider the visual attention paradigm to become a part of real time computer vision solutions which have increasing demand.  In this thesis work, different aspects of visual attention paradigm have been dealt ranging from the biological modeling to the real-world computer vision tasks implementation based on this visual behavior. The implementation of traffic signs detection and recognition system benefited from this mechanism is the central part of this thesis work.
APA, Harvard, Vancouver, ISO, and other styles
7

Villalba, Michael Joseph. "Fast visual recognition of large object sets." Thesis, Massachusetts Institute of Technology, 1990. http://hdl.handle.net/1721.1/42211.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Lindqvist, Zebh. "Design Principles for Visual Object Recognition Systems." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-80769.

Full text
Abstract:
Today's smartphones are capable of accomplishing far more advanced tasks than reading emails. With the modern framework TensorFlow, visual object recognition becomes possible using smartphone resources. This thesis shows that the main challenge does not lie in developing an artifact which performs visual object recognition. Instead, the main challenge lies in developing an ecosystem which allows for continuous improvement of the system’s ability to accomplish the given task without laborious and inefficient data collection. This thesis presents four design principles which contribute to an efficient ecosystem with quick initiation of new object classes and efficient data collection which is used to continuously improve the system’s ability to recognize smart meters in varying environments in an automated fashion.
APA, Harvard, Vancouver, ISO, and other styles
9

Teynor, Alexandra. "Visual object class recognition using local descriptions." [S.l. : s.n.], 2008. http://nbn-resolving.de/urn:nbn:de:bsz:25-opus-62371.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Pemula, Latha. "Low-shot Visual Recognition." Thesis, Virginia Tech, 2016. http://hdl.handle.net/10919/73321.

Full text
Abstract:
Many real world datasets are characterized by having a long tailed distribution, with several samples for some classes and only a few samples for other classes. While many Deep Learning based solutions exist for object recognition when hundreds of samples are available, there are not many solutions for the case when there are only a few samples available per class. Recognition in the regime where the number of training samples available for each class are low, ranging from 1 to couple of tens of examples is called Lowshot Recognition. In this work, we attempt to solve this problem. Our framework is similar to [1]. We use a related dataset with sufficient number (a couple of hundred) of samples per class to learn representations using a Convolutional Neural Network (CNN). This CNN is used to extract features of the lowshot samples and learn a classifier . During representation learning, we enforce the learnt representations to obey certain property by using a custom loss function. We believe that when the lowshot sample obey this property the classification step becomes easier. We show that the proposed solution performs better than the softmax classifier by a good margin.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
11

Naha, Shujon. "Zero-shot Learning for Visual Recognition Problems." IEEE, 2015. http://hdl.handle.net/1993/31806.

Full text
Abstract:
In this thesis we discuss different aspects of zero-shot learning and propose solutions for three challenging visual recognition problems: 1) unknown object recognition from images 2) novel action recognition from videos and 3) unseen object segmentation. In all of these three problems, we have two different sets of classes, the “known classes”, which are used in the training phase and the “unknown classes” for which there is no training instance. Our proposed approach exploits the available semantic relationships between known and unknown object classes and use them to transfer the appearance models from known object classes to unknown object classes to recognize unknown objects. We also propose an approach to recognize novel actions from videos by learning a joint model that links videos and text. Finally, we present a ranking based approach for zero-shot object segmentation. We represent each unknown object class as a semantic ranking of all the known classes and use this semantic relationship to extend the segmentation model of known classes to segment unknown class objects.
October 2016
APA, Harvard, Vancouver, ISO, and other styles
12

Yang, Fan. "Visual Infrastructure based Accurate Object Recognition and Localization." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492752246062673.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Piñol, Naranjo Mónica. "Reinforcement learning of visual descriptors for object recognition." Doctoral thesis, Universitat Autònoma de Barcelona, 2014. http://hdl.handle.net/10803/283927.

Full text
Abstract:
El sistema visual humà és capaç de reconéixe l'objecte que hi ha en una imatge encara que l'objecte estigui parcialment oclòs, des de diferents punts de vista, en diferents colors i amb independència de la distància a la que es troba l'objecte de la càmera. Per poder realitzar això, l'ull obté l'imatge i extreu unes caracterítiques que són enviades al cervell i és allà on es classifica l'objecte per poder identificar-lo. En el reconeixement d'objectes, la visió per computador intenta imitar el sistema humà. Així, s'utilitza un algoritme per detectar característiques representatives de l'escena (detector), un altre algoritme per descriure les característiques extretes (descriptor) i finalment la informació es enviada a un tercer algoritme per fer la classificació (aprenentatge). Escollir aquests algoritmes és molt complicat i tant mateix una àrea d'investigació molt activa. En aquesta tesis ens hem enfocat en la selecció/aprenentatge del millor descriptor per a cada imatge. A l'actualitat hi ha molts descriptors a l'estat de l'art però no sabem quin es el millor, ja que no depèn sols d'ell mateix sinó també depen de les característiques de les imatges (base de dades) i dels algoritmes de classificació. Nosaltres proposem un marc de treball basat en l'aprenentatge per reforç i la bossa de característiques per poder escollir el millor descriptor per a cada imatge. El sistema permet analitzar el comportament de diferents classiicadors i conjunts de descriptors. A més el sistema que proposem per a la millora del reconeixement/classificació pot ser utilizat en altres àmbits de la visió per computador, com per exemple el video retrieval
The human visual system is able to recognize the object in an image even if the object is partially occluded, from various points of view, in different colors, or with independence of the distance to the object. To do this, the eye obtains an image and extracts features that are sent to the brain, and then, in the brain the object is recognized. In computer vision, the object recognition branch tries to learns from the human visual system behaviour to achieve its goal. Hence, an algorithm is used to identify representative features of the scene (detection), then another algorithm is used to describe these points (descriptor) and finally the extracted information is used for classifying the object in the scene. The selection of this set of algorithms is a very complicated task and thus, a very active research field. In this thesis we are focused on the selection/learning of the best descriptor for a given image. In the state of the art there are several descriptors but we do not know how to choose the best descriptor because depends on scenes that we will use (dataset) and the algorithm chosen to do the classification. We propose a framework based on reinforcement learning and bag of features to choose the best descriptor according to the given image. The system can analyse the behaviour of different learning algorithms and descriptor sets. Further- more the proposed framework for improving the classification/recognition ratio can be used with minor changes in other computer vision fields, such as video retrieval.
APA, Harvard, Vancouver, ISO, and other styles
14

Wilson, Susan E. "Perceptual organization and symmetry in visual object recognition." Thesis, University of British Columbia, 1991. http://hdl.handle.net/2429/29802.

Full text
Abstract:
A system has been implemented which is able to detect symmetrical groupings in edge images. The initial stages of the algorithm consist of edge detection, curve smoothing, and the extension of the perceptual grouping phase of the SCERPO [Low87] vision system to enable detection of instances of endpoint proximity and curvilinearity among curved segments. The symmetry detection stage begins by first locating points along object boundaries which are significant in terms of curvature. These key points are then tested against each other in order to detect locally symmetric pairs. An iterative grouping procedure is then applied which matches these pairs together using a more global definition of symmetry. The end result of this process is a set of pairs of key points along the boundary of an object which are bilaterally symmetric, along with the axis of symmetry for the object or sub-object. This paper describes the implementation of this system and presents several examples of the results obtained using real images. The output of the system is intended for use as indexing features in a model-based object recognition system, such as SCERPO, which requires as input a set of spatial correspondences between image features and model features.
Science, Faculty of
Computer Science, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
15

Wallenberg, Marcus, and Per-Erik Forssén. "A Research Platform for Embodied Visual Object Recognition." Linköpings universitet, Datorseende, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70769.

Full text
Abstract:
We present in this paper a research platform for development and evaluation of embodied visual object recognition strategies. The platform uses a stereoscopic peripheral-foveal camera system and a fast pan-tilt unit to perform saliency-based visual search. This is combined with a classification framework based on the bag-of-features paradigm with the aim of targeting, classifying and recognising objects. Interaction with the system is done via typed commands and speech synthesis. We also report the current classification performance of the system.
APA, Harvard, Vancouver, ISO, and other styles
16

Lovell, Kylie Sarah. "Implicit and explicit processes in visual object recognition." Thesis, University of Reading, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.430835.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Sudderth, Erik B. (Erik Blaine) 1977. "Graphical models for visual object recognition and tracking." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/34023.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 277-301).
We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications.
(cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples.
(cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories.
by Erik B. Sudderth.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
18

Craddock, Matthew Peter. "Comparing the attainment of object constancy in haptic and visual object recognition." Thesis, University of Liverpool, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.539615.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Osman, Erol. "Relational Strategies for the Study of Visual Object Recognition." Diss., lmu, 2008. http://nbn-resolving.de/urn:nbn:de:bvb:19-90393.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Shotton, Jamie Daniel Joseph. "Contour and texture for visual recognition of object categories." Thesis, University of Cambridge, 2007. https://www.repository.cam.ac.uk/handle/1810/252047.

Full text
Abstract:
The recognition of categories of objects in images has become a central topic in computer vision. Automatic visual recognition systems are rapidly becoming central to applications such as image search, robotics, vehicle safety systems, and image editing. This work addresses three sub-problems of recognition: image classification, object detection, and semantic segmentation. The task of classification is to determine whether an object of a particular category is present or not. Object detection aims to localize any objects of the category. Semantic segmentation is a more complete image understanding, whereby an image is partitioned into coherent regions that are assigned meaningful class labels. This thesis proposes novel discriminative learning approaches to these problems. Our primary contributions are threefold. Firstly, we demonstrate that the contours (the outline and interior edges) of an object are, alone, sufficient for accurate visual recognition. Secondly, we propose two powerful new feature types: (i) a learned codebook of contour fragments matched with an improved oriented chamfer distance, and (ii) a set of texture-based features that simultaneously exploit local appearance, approximate shape, and appearance context. The efficacy of these new features types is evaluated on a wide variety of datasets. Thirdly, we show how, in combination, these two largely orthogonal feature types can substantially improve recognition performance above that achieved by either alone.
APA, Harvard, Vancouver, ISO, and other styles
21

Wojnowski, Christine. "Reasoning with visual knowledge in an object recognition system /." Online version of thesis, 1990. http://hdl.handle.net/1850/10596.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Carreira, Joao [Verfasser]. "Bottom-up Object Segmentation for Visual Recognition / Joao Carreira." Bonn : Universitäts- und Landesbibliothek Bonn, 2013. http://d-nb.info/1044868961/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Corradi, Tadeo. "Integrating visual and tactile robotic perception." Thesis, University of Bath, 2018. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.761005.

Full text
Abstract:
The aim of this project is to enable robots to recognise objects and object categories by combining vision and touch. In this thesis, a novel inexpensive tactile sensor design is presented, together with a complete, probabilistic sensor-fusion model. The potential of the model is demonstrated in four areas: (i) Shape Recognition, here the sensor outperforms its most similar rival, (ii) Single-touch Object Recognition, where state-of-the-art results are produced, (iii) Visuo-tactile object recognition, demonstrating the benefits of multi-sensory object representations, and (iv) Object Classification, which has not been reported in the literature to date. Both the sensor design and the novel database were made available. Tactile data collection is performed by a robot. An extensive analysis of data encodings, data processing, and classification methods is presented. The conclusions reached are: (i) the inexpensive tactile sensor can be used for basic shape and object recognition, (ii) object recognition combining vision and touch in a probabilistic manner provides an improvement in accuracy over either modality alone, (iii) when both vision and touch perform poorly independently, the sensor-fusion model proposed provides faster learning, i.e. fewer training samples are required to achieve similar accuracy, and (iv) such a sensor-fusion model is more accurate than either modality alone when attempting to classify unseen objects, as well as when attempting to recognise individual objects from amongst similar other objects of the same class. (v) The preliminary potential is identified for real-life applications: underwater object classification. (vi) The sensor fusion model providesimprovements in classification even for award-winning deep-learning basedcomputer vision models.
APA, Harvard, Vancouver, ISO, and other styles
24

Wallenberg, Marcus. "Components of Embodied Visual Object Recognition : Object Perception and Learning on a Robotic Platform." Licentiate thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-93812.

Full text
Abstract:
Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, and the implementation of the system itself. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. Finally, in order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. All of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.
Embodied Visual Object Recognition
APA, Harvard, Vancouver, ISO, and other styles
25

Gathers, Ann D. "DEVELOPMENTAL FMRI STUDY: FACE AND OBJECT RECOGNITION." Lexington, Ky. : [University of Kentucky Libraries], 2005. http://lib.uky.edu/ETD/ukyanne2005d00276/etd.pdf.

Full text
Abstract:
Thesis (Ph. D.)--University of Kentucky, 2005.
Title from document title page (viewed on November 4, 2005). Document formatted into pages; contains xi, 152 p. : ill. Includes abstract and vita. Includes bibliographical references (p. 134-148).
APA, Harvard, Vancouver, ISO, and other styles
26

Loos, Hartmut S. [Verfasser]. "User-Assisted Learning of Visual Object Recognition / Hartmut S Loos." Aachen : Shaker, 2003. http://d-nb.info/117451339X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Lakshmi, Ratan Aparna. "The role of fixation and visual attention in object recognition." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/38734.

Full text
Abstract:
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.
Includes bibliographical references (p. 94-97).
by Aparna Lakshmi Ratan.
M.S.
APA, Harvard, Vancouver, ISO, and other styles
28

Zoccoli, Sandra L. "Object features and object recognition Semantic memory abilities during the normal aging process /." Ann Arbor, Mich. : ProQuest, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3288933.

Full text
Abstract:
Thesis (Ph.D. in Psychology)--S.M.U., 2007.
Title from PDF title page (viewed Nov. 19, 2009). Source: Dissertation Abstracts International, Volume: 68-11, Section: B, page: 7695. Adviser: Alan S. Brown. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
29

Wu, Jia Jane. "Comparing Visual Features for Morphing Based Recognition." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/30547.

Full text
Abstract:
This thesis presents a method of object classification using the idea of deformable shape matching. Three types of visual features, geometric blur, C1 and SIFT, are used to generate feature descriptors. These feature descriptors are then used to find point correspondences between pairs of images. Various morphable models are created by small subsets of these correspondences using thin-plate spline. Given these morphs, a simple algorithm, least median of squares (LMEDS), is used to find the best morph. A scoring metric, using both LMEDS and distance transform, is used to classify test images based on a nearest neighbor algorithm. We perform the experiments on the Caltech 101 dataset [5]. To ease computation, for each test image, a shortlist is created containing 10 of the most likely candidates. We were unable to duplicate the performance of [1] in the shortlist stage because we did not use hand-segmentation to extract objects for our training images. However, our gain from the shortlist to correspondence stage is comparable to theirs. In our experiments, we improved from 21% to 28% (gain of 33%), while [1] improved from 41% to 48% (gain of 17%). We find that using a non-shape based approach, C2 [14], the overall classification rate of 33.61% is higher than all of the shaped based methods tested in our experiments.
APA, Harvard, Vancouver, ISO, and other styles
30

Leeds, Daniel Demeny. "Searching for the Visual Components of Object Perception." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/313.

Full text
Abstract:
The nature of visual properties used for object perception in mid- and high-level vision areas of the brain is poorly understood. Past studies have employed simplistic stimuli probing models limited in descriptive power and mathematical under-pinnings. Unfortunately, pursuit of more complex stimuli and properties requires searching through a wide, unknown space of models and of images. The difficulty of this pursuit is exacerbated in brain research by the limited number of stimulus responses that can be collected for a given human subject over the course of an experiment. To more quickly identify complex visual features underlying cortical object perception, I develop, test, and use a novel method in which stimuli for use in the ongoing study are selected in realtime based on fMRI-measured cortical responses to recently-selected and displayed stimuli. A variation of the simplex method controls this ongoing selection as part of a search in visual space for images producing maximal activity — measured in realtime — in a pre-determined 1 cm3 brain region. I probe cortical selectivities during this search using photographs of real-world objects and synthetic “Fribble” objects. Real-world objects are used to understand perception of naturally-occurring visual properties. These objects are characterized based on feature descriptors computed from the scale invariant feature transform (SIFT), a popular computer vision method that is well established in its utility for aiding in computer object recognition and that I recently found to account for intermediate-level representations in the visual object processing pathway in the brain. Fribble objects are used to study object perception in an arena in which visual properties are well defined a priori. They are constructed from multiple well-defined shapes, and variation of each of these component shapes produces a clear space of visual stimuli. I study the behavior of my novel realtime fMRI search method, to assess its value in the investigation of cortical visual perception, and I study the complex visual properties my method identifies as highly-activating selected brain regions in the visual object processing pathway. While there remain further technical and biological challenges to overcome, my method uncovers reliable and interesting cortical properties for most subjects — though only for selected searches performed for each subject. I identify brain regions selective for holistic and component object shapes and for varying surface properties, providing examples of more precise selectivities within classes of visual properties previously associated with cortical object representation. I also find examples of “surround suppression,” in which cortical activity is inhibited upon viewing stimuli slightly deviation from the visual properties preferred by a brain region, expanding on similar observations at lower levels of vision.
APA, Harvard, Vancouver, ISO, and other styles
31

Smith, Wendy. "The contribution of meaning in forming holistic and segmented based visual representations." Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.340325.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Wang, Josiah Kwok-Siang. "Learning visual recognition of fine-grained object categories from textual descriptions." Thesis, University of Leeds, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597096.

Full text
Abstract:
This thesis investigates the task of learning visual object category recognition from textual descriptions. The work contributes primarily to the recognition of fine-grained object categories, such as animal and plant species, where it may be difficult to collect many images for training. but where textual descriptions are readily available, for example from online nature guides. The idea of using textual descriptions for fine-grained object category recognition is explored in three separate but related tasks. The first is the task of learning recognition of object categories solely from textual descriptions; no category-specific training images are used. Our proposed framework comprises three components: (i) natural language processing to build object category models from textual descriptions; (ii) visual processing to extract visual attributes from test images; (Hi) generative model connecting textual terms and visual attributes from images. As an 'upper-bound' we also evaluate how well humans perform in a similar task. The proposed method was evaluated on a butterfly dataset as an example, performing substantially better than chance, and interestingly comparable to the performance of non-native English speakers. The second task is an extension to the first. Here we focus on the problem of learning models for attribute terms (e.g. "orange bands"), from a set of training classes disjoint from the test classes. Attribute models are learnt independently for each attribute term in a weakly supervised fashion from textual descriptions, and are used in conjunction with textual descriptions of the test classes to build probabilistic models for object category recognition. A modest accuracy was achieved with our method when evaluated on a butterfly dataset, although performance was substantially improved with some human supervision to combine similar attribute terms. The third task explores how textual descriptions can be used to automatically harvest training images for each object category. Starting with just the category name, a textual description and no example images, web pages are gathered from search engines, and images filtered based on how similar their surrounding texts are to the given textual description. The idea is that images in close proximity to texts that are similar to the textual description are more likely to be example images of the desired category. The proposed method is demonstrated for a set of butterfly categories. where images were successfully re-ranked based on their corresponding text blocks alone, with many categories achieving higher precision than their baselines at early stages of recall. The proposed approaches of exploiting textual descriptions, although still in their infancy, shows potential for visual object recognition tasks, effectively reducing the amount of human supervision required for annotating images.
APA, Harvard, Vancouver, ISO, and other styles
33

Salama, F. A. O. "The role of depth cues on visual object recognition and naming." Thesis, Swansea University, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.638745.

Full text
Abstract:
In ten experiments, 130 subjects (aged 23035) were shown stereoscopic photographs of three rotated common objects, presented tachistoscopically, for the purpose of examining the effects of stereopsis, angle of view and colour on reaction time for recognition and naming of those objects. The results of these studies show that naming speed is related to viewing condition and angle of view. Angle of view is also related to the speed with which objects are recognized with yes/no paradigm. Furthermore, the results indicate that under these experimental conditions, stereopsis significantly affects naming and recognition but for different reasons. The data provide clues about the interpretation, by the visual system, of shape from stereopsis cues and about the relationship of shape from stereopsis to other depth cues in determining the perception of objects rotated in depth. The results also provide an understanding of how surface details such as colour of object are perceived by the visual system. The results were discussed within the framework of the 2-5-dimensional sketch (Marr 1982), hypothesized as representing the orientations and distances of visible surfaces relative to the viewer.
APA, Harvard, Vancouver, ISO, and other styles
34

Alter, Tao Daniel. "The role of saliencey and error propagation in visual object recognition." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/38055.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Ramanan, Amirthalingam. "Designing a resource-allocating codebook for patch-based visual object recognition." Thesis, University of Southampton, 2010. https://eprints.soton.ac.uk/159175/.

Full text
Abstract:
The state-of-the-art approach in visual object recognition is the use of local information extracted at several points or image patches from an image. Local information at specific points can deal with object shape variability and partial occlusions. The underlying idea is that, in different images, the statistical distribution of the patches is different, which can be effectively exploited for recognition. In such a patch-based object recognition system, the key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook plays a central role that affects the model’s complexity. The construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. This thesis demonstrates a novel approach, that we call resource-allocating codebook (RAC), to constructing a discriminant codebook in a one-pass design procedure inspired by the resource-allocation network family of algorithms. The RAC approach slightly outperforms more traditional approaches due to its tendency to spread out the cluster centres over a broader range of the feature space thereby including rare low-level features in the codebook than density-preserving clustering-based codebooks. Our algorithm achieves this performance at drastically reduced computing times, because apart from an initial scan through a small subset to determine length scales, each data item is processed only once. We illustrate some properties of our method and compare it to a closely related approach known as the mean-shift clustering technique. A pruning strategy has been employed to tackle a few outliers when assigning each feature in images to the closest codeword to create a histogram representation for each image. Features whose distance from the closest codeword exceeds an empirical distance maximum are neglected. A recognition system that learns incrementally with training images and the output classifier accounting for class-specific discriminant features is also presented. Furthermore, we address an approach which, instead of clustering, adaptively constructs a codebook by computing Fisher scores between the classes of interest. This thesis also demonstrates a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small subset of the data, while the remaining data are processed sequentially and the tree adapted constructively. Evaluations performed with this approach show that the performance is comparable while reducing the computational needs. Finally, during the process of classification, we demonstrate a new learning architecture for multi-class classification tasks using support vector machines. This technique is faster in testing compared to directed acyclic graph (DAG) SVMs, while maintaining comparable performance to the standard multi-class classification techniques.
APA, Harvard, Vancouver, ISO, and other styles
36

Viau, Claude. "Multispectral Image Analysis for Object Recognition and Classification." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34532.

Full text
Abstract:
Computer and machine vision applications are used in numerous fields to analyze static and dynamic imagery in order to assist or automate some form of decision-making process. Advancements in sensor technologies now make it possible to capture and visualize imagery at various wavelengths (or bands) of the electromagnetic spectrum. Multispectral imaging has countless applications in various field including (but not limited to) security, defense, space, medical, manufacturing and archeology. The development of advanced algorithms to process and extract salient information from the imagery is a critical component of the overall system performance. The fundamental objectives of this research project were to investigate the benefits of combining imagery from the visual and thermal bands of the electromagnetic spectrum to improve the recognition rates and accuracy of commonly found objects in an office setting. The goal was not to find a new way to “fuse” the visual and thermal images together but rather establish a methodology to extract multispectral descriptors in order to improve a machine vision system’s ability to recognize specific classes of objects.A multispectral dataset (visual and thermal) was captured and features from the visual and thermal images were extracted and used to train support vector machine (SVM) classifiers. The SVM’s class prediction ability was evaluated separately on the visual, thermal and multispectral testing datasets. Commonly used performance metrics were applied to assess the sensitivity, specificity and accuracy of each classifier. The research demonstrated that the highest recognition rate was achieved by an expert system (multiple classifiers) that combined the expertise of the visual-only classifier, the thermal-only classifier and the combined visual-thermal classifier.
APA, Harvard, Vancouver, ISO, and other styles
37

Collin, Charles Alain. "Effects of spatial frequency overlap on face and object recognition." Thesis, McGill University, 2000. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=36896.

Full text
Abstract:
There has recently been much interest in how limitations in spatial frequency range affect face and object perception. This work has mainly focussed on determining which bands of frequencies are most useful for visual recognition. However, a fundamental question not yet addressed is how spatial frequency overlap (i.e., the range of spatial frequencies shared by two images) affects complex image recognition. Aside from the basic theoretical interest this question holds, it also bears on research about effects of display format (e.g., line-drawings, Mooney faces, etc.) and studies examining the nature of mnemonic representations of faces and objects. Examining the effects of spatial frequency overlap on face and object recognition is the main goal of this thesis.
A second question that is examined concerns the effect of calibration of stimuli on recognition of spatially filtered images. Past studies using non-calibrated presentation methods have inadvertently introduced aberrant frequency content to their stimuli. The effect this has on recognition performance has not been examined, leading to doubts about the comparability of older and newer studies. Examining the impact of calibration on recognition is an ancillary goal of this dissertation.
Seven experiments examining the above questions are reported here. Results suggest that spatial frequency overlap had a strong effect on face recognition and a lesser effect on object recognition. Indeed, contrary to much previous research it was found that the band of frequencies occupied by a face image had little effect on recognition, but that small variations in overlap had significant effects. This suggests that the overlap factor is important in understanding various phenomena in visual recognition. Overlap effects likely contribute to the apparent superiority of certain spatial bands for different recognition tasks, and to the inferiority of line drawings in face recognition. Results concerning the mnemonic representation of faces and objects suggest that these are both encoded in a format that retains spatial frequency information, and do not support certain proposed fundamental differences in how these two stimulus classes are stored. Data on calibration generally shows non-calibration having little impact on visual recognition, suggesting moderate confidence in results of older studies.
APA, Harvard, Vancouver, ISO, and other styles
38

Rouhafzay, Ghazal. "3D Object Representation and Recognition Based on Biologically Inspired Combined Use of Visual and Tactile Data." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42122.

Full text
Abstract:
Recent research makes use of biologically inspired computation and artificial intelligence as efficient means to solve real-world problems. Humans show a significant performance in extracting and interpreting visual information. In the cases where visual data is not available, or, for example, if it fails to provide comprehensive information due to occlusions, tactile exploration assists in the interpretation and better understanding of the environment. This cooperation between human senses can serve as an inspiration to embed a higher level of intelligence in computational models. In the context of this research, in the first step, computational models of visual attention are explored to determine salient regions on the surface of objects. Two different approaches are proposed. The first approach takes advantage of a series of contributing features in guiding human visual attention, namely color, contrast, curvature, edge, entropy, intensity, orientation, and symmetry are efficiently integrated to identify salient features on the surface of 3D objects. This model of visual attention also learns to adaptively weight each feature based on ground-truth data to ensure a better compatibility with human visual exploration capabilities. The second approach uses a deep Convolutional Neural Network (CNN) for feature extraction from images collected from 3D objects and formulates saliency as a fusion map of regions where the CNN looks at, while classifying the object based on their geometrical and semantic characteristics. The main difference between the outcomes of the two algorithms is that the first approach results in saliencies spread over the surface of the objects while the second approach highlights one or two regions with concentrated saliency. Therefore, the first approach is an appropriate simulation of visual exploration of objects, while the second approach successfully simulates the eye fixation locations on objects. In the second step, the first computational model of visual attention is used to determine scattered salient points on the surface of objects based on which simplified versions of 3D object models preserving the important visual characteristics of objects are constructed. Subsequently, the thesis focuses on the topic of tactile object recognition, leveraging the proposed model of visual attention. Beyond the sensor technologies which are instrumental in ensuring data quality, biological models can also assist in guiding the placement of sensors and support various selective data sampling strategies that allow exploring an object’s surface faster. Therefore, the possibility to guide the acquisition of tactile data based on the identified visually salient features is tested and validated in this research. Different object exploration and data processing approaches were used to identify the most promising solution. Our experiments confirm the effectiveness of computational models of visual attention as a guide for data selection for both simplifying 3D representation of objects as well as enhancing tactile object recognition. In particular, the current research demonstrates that: (1) the simplified representation of objects by preserving visually salient characteristics shows a better compatibility with human visual capabilities compared to uniformly simplified models, and (2) tactile data acquired based on salient visual features are more informative about the objects’ characteristics and can be employed in tactile object manipulation and recognition scenarios. In the last section, the thesis addresses the issue of transfer of learning from vision to touch. Inspired from biological studies that attest similarities between the processing of visual and tactile stimuli in human brain, the thesis studies the possibility of transfer of learning from vision to touch using deep learning architectures and proposes a hybrid CNN that handles both visual and tactile object recognition.
APA, Harvard, Vancouver, ISO, and other styles
39

Choi, Changhyun. "Visual object perception in unstructured environments." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/53003.

Full text
Abstract:
As robotic systems move from well-controlled settings to increasingly unstructured environments, they are required to operate in highly dynamic and cluttered scenarios. Finding an object, estimating its pose, and tracking its pose over time within such scenarios are challenging problems. Although various approaches have been developed to tackle these problems, the scope of objects addressed and the robustness of solutions remain limited. In this thesis, we target a robust object perception using visual sensory information, which spans from the traditional monocular camera to the more recently emerged RGB-D sensor, in unstructured environments. Toward this goal, we address four critical challenges to robust 6-DOF object pose estimation and tracking that current state-of-the-art approaches have, as yet, failed to solve. The first challenge is how to increase the scope of objects by allowing visual perception to handle both textured and textureless objects. A large number of 3D object models are widely available in online object model databases, and these object models provide significant prior information including geometric shapes and photometric appearances. We note that using both geometric and photometric attributes available from these models enables us to handle both textured and textureless objects. This thesis presents our efforts to broaden the spectrum of objects to be handled by combining geometric and photometric features. The second challenge is how to dependably estimate and track the pose of an object despite the clutter in backgrounds. Difficulties in object perception rise with the degree of clutter. Background clutter is likely to lead to false measurements, and false measurements tend to result in inaccurate pose estimates. To tackle significant clutter in backgrounds, we present two multiple pose hypotheses frameworks: a particle filtering framework for tracking and a voting framework for pose estimation. Handling of object discontinuities during tracking, such as severe occlusions, disappearances, and blurring, presents another important challenge. In an ideal scenario, a tracked object is visible throughout the entirety of tracking. However, when an object happens to be occluded by other objects or disappears due to the motions of the object or the camera, difficulties ensue. Because the continuous tracking of an object is critical to robotic manipulation, we propose to devise a method to measure tracking quality and to re-initialize tracking as necessary. The final challenge we address is performing these tasks within real-time constraints. Our particle filtering and voting frameworks, while time-consuming, are composed of repetitive, simple and independent computations. Inspired by that observation, we propose to run massively parallelized frameworks on a GPU for those robotic perception tasks which must operate within strict time constraints.
APA, Harvard, Vancouver, ISO, and other styles
40

Caywood, Matthew Shields. "Approaches to the function of object recognition areas of the visual cortex." Diss., Search in ProQuest Dissertations & Theses. UC Only, 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3378530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Misra, Navendu. "Comparison of motor-based versus visual sensory representations in object recognition tasks." Texas A&M University, 2005. http://hdl.handle.net/1969.1/2544.

Full text
Abstract:
Various works have demonstrated the usage of action as a critical component in allowing autonomous agents to learn about objects in the environment. The importance of memory becomes evident when these agents try to learn about complex objects. This necessity primarily stems from the fact that simpler agents behave reactively to stimuli in their attempt to learn about the nature of the object. However, complex objects have the property of giving rise to temporally varying sensory data as the agent interacts with the object. Therefore, reactive behavior becomes a hindrance in learning these complex objects, thus, prompting the need for memory. A straightforward approach to memory, visual memory, is where sensory data is directly represented. Another mechanism is skill-based memory or habit formation. In the latter mechanism the sequence of actions performed for a task is retained. The main hypothesis of this thesis is that since action seems to play an important role in simple perceptual understanding it may also serve as a good memory representation. In order to test this hypothesis a series of comparative tests were carried out to determine the merits of each of these representations. It turns out that skill memory performs significantly better at recognition tasks than visual memory. Furthermore, it was demonstrated in a related experiment that action forms a good intermediate representation of the sensory data. This provides support to theories that propose that various sensory modalities can ideally be represented in terms of action. This thesis successfully extends action to the role of understanding of complex objects.
APA, Harvard, Vancouver, ISO, and other styles
42

Saifullah, Mohammad. "Biologically-Based Interactive Neural Network Models for Visual Attention and Object Recognition." Doctoral thesis, Linköpings universitet, Institutionen för datavetenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-79336.

Full text
Abstract:
The main focus of this thesis is to develop biologically-based computational models for object recognition. A series of models for attention and object recognition were developed in the order of increasing functionality and complexity. These models are based on information processing in the primate brain, and specially inspired from the theory of visual information processing along the two parallel processing pathways of the primate visual cortex. To capture the true essence of incremental, constraint satisfaction style processing in the visual system, interactive neural networks were used for implementing our models. Results from eye-tracking studies on the relevant visual tasks, as well as our hypothesis regarding the information processing in the primate visual system, were implemented in the models and tested with simulations. As a first step, a model based on the ventral pathway was developed to recognize single objects. Through systematic testing, structural and algorithmic parameters of these models were fine tuned for performing their task optimally. In the second step, the model was extended by considering the dorsal pathway, which enables simulation of visual attention as an emergent phenomenon. The extended model was then investigated for visual search tasks. In the last step, we focussed on occluded and overlapped object recognition. A couple of eye-tracking studies were conducted in this regard and on the basis of the results we made some hypotheses regarding information processing in the primate visual system. The models were further advanced on the lines of the presented hypothesis, and simulated on the tasks of occluded and overlapped object recognition. On the basis of the results and analysis of our simulations we have further found that the generalization performance of interactive hierarchical networks improves with the addition of a small amount of Hebbian learning to an otherwise pure error-driven learning. We also concluded that the size of the receptive fields in our networks is an important parameter for the generalization task and depends on the object of interest in the image. Our results show that networks using hard coded feature extraction perform better than the networks that use Hebbian learning for developing feature detectors. We have successfully demonstrated the emergence of visual attention within an interactive network and also the role of context in the search task. Simulation results with occluded and overlapped objects support our extended interactive processing approach, which is a combination of the interactive and top-down approach, to the segmentation-recognition issue. Furthermore, the simulation behavior of our models is in line with known human behavior for similar tasks. In general, the work in this thesis will improve the understanding and performance of biologically-based interactive networks for object recognition and provide a biologically-plausible solution to recognition of occluded and overlapped objects. Moreover, our models provide some suggestions for the underlying neural mechanism and strategies behind biological object recognition.
APA, Harvard, Vancouver, ISO, and other styles
43

Gosling, Angela. "An electrophysiological investigation of the role of attention in visual object recognition." Thesis, Goldsmiths College (University of London), 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.523119.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Rajalingham, Rishi. "How does the primate ventral visual stream causally support core object recognition?" Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/120625.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 161-173).
Primates are able to rapidly, accurately and effortlessly perform the computationally difficult visual task of invariant object recognition - the ability to discriminate between different objects in the face of high variation in object viewing parameters and background conditions. This ability is thought to rely on the ventral visual stream, a hierarchy of visual cortical areas culminating in inferior temporal (IT) cortex. In particular, decades of research strongly suggests that the population of neurons in IT supports invariant object recognition behavior. However, direct causal evidence for this decoding hypothesis has been equivocal to date, especially beyond the specific case of face-selective sub-regions of IT. This research aims to directly test the general causal role of IT in invariant object recognition. To do so, we first characterized human and macaque monkey behavior over a large behavioral domain consisting of binary discriminations between images of basic-level objects, establishing behavioral metrics and benchmarks for computational models of this behavior. This work suggests that, in the domain of basic-level core object recognition, humans and monkeys are remarkably similar in their behavioral responses, while leading models of the visual system significantly diverge from primate behavior. We then reversibly inactivated individual, millimeter-scale regions of IT via injection of muscimol while monkeys performed several interleaved binary object discrimination tasks. We found that inactivating different millimeter-scale regions of primate IT resulted in different patterns of object recognition deficits, each predicted by the local region's neuronal selectivity. Our results provide causal evidence that IT directly underlies primate object recognition behavior in a topographically organized manner. Taken together, these results establish quantitative experimental constraints for computational models of the ventral visual stream and object recognition behavior.
by Rishi Rajalingham.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
45

Durán, Gabriela. "Effects of concurrent task performance on object processing." To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2009. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Saifullah, Mohammad. "Exploring Biologically-Inspired Interactive Networks for Object Recognition." Licentiate thesis, Linköpings universitet, Institutionen för datavetenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-64692.

Full text
Abstract:
This thesis deals with biologically-inspired interactive neural networks for the task of object recognition. Such networks offer an interesting alternative approach to traditional image processing techniques. Although the networks are very powerful classification tools, they are difficult to handle due to their bidirectional interactivity. It is one of the main reasons why these networks do not perform the task of generalization to novel objects well. Generalization is a very important property for any object recognition system, as it is impractical for a system to learn all instances of an object class before classifying. In this thesis, we have investigated the working of an interactive neural network by fine tuning different structural and algorithmic parameters.  The performance of the networks was evaluated by analyzing the generalization ability of the trained network to novel objects. Furthermore, the interactivity of the network was utilized to simulate focus of attention during object classification. Selective attention is an important visual mechanism for object recognition and provides an efficient way of using the limited computational resources of the human visual system. Unlike most previous work in the field of image processing, in this thesis attention is considered as an integral part of object processing. Attention focus, in this work, is computed within the same network and in parallel with object recognition. As a first step, a study into the efficacy of Hebbian learning as a feature extraction method was conducted. In a second study, the receptive field size in the network, which controls the size of the extracted features as well as the number of layers in the network, was varied and analyzed to find its effect on generalization. In a continuation study, a comparison was made between learnt (Hebbian learning) and hard coded feature detectors. In the last study, attention focus was computed using interaction between bottom-up and top-down activation flow with the aim to handle multiple objects in the visual scene. On the basis of the results and analysis of our simulations we have found that the generalization performance of the bidirectional hierarchical network improves with the addition of a small amount of Hebbian learning to an otherwise error-driven learning. We also conclude that the optimal size of the receptive fields in our network depends on the object of interest in the image. Moreover, each receptive field must contain some part of the object in the input image. We have also found that networks using hard coded feature extraction perform better than the networks that use Hebbian learning for developing feature detectors. In the last study, we have successfully demonstrated the emergence of visual attention within an interactive network that handles more than one object in the input field. Our simulations demonstrate how bidirectional interactivity directs attention focus towards the required object by using both bottom-up and top-down effects. In general, the findings of this thesis will increase understanding about the working of biologically-inspired interactive networks. Specifically, the studied effects of the structural and algorithmic parameters that are critical for the generalization property will help develop these and similar networks and lead to improved performance on object recognition tasks. The results from the attention simulations can be used to increase the ability of networks to deal with multiple objects in an efficient and effective manner.
APA, Harvard, Vancouver, ISO, and other styles
47

Wang, Qian. "Zero-shot visual recognition via latent embedding learning." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/zeroshot-visual-recognition-via-latent-embedding-learning(bec510af-6a53-4114-9407-75212e1a08e1).html.

Full text
Abstract:
Traditional supervised visual recognition methods require a great number of annotated examples for each concerned class. The collection and annotation of visual data (e.g., images and videos) could be laborious, tedious and time-consuming when the number of classes involved is very large. In addition, there are such situations where the test instances are from novel classes for which training examples are unavailable in the training stage. These issues can be addressed by zero-shot learning (ZSL), an emerging machine learning technique enabling the recognition of novel classes. The key issue in zero-shot visual recognition is the semantic gap between visual and semantic representations. We address this issue in this thesis from three different perspectives: visual representations, semantic representations and the learning models. We first propose a novel bidirectional latent embedding framework for zero-shot visual recognition. By learning a latent space from visual representations and labelling information of the training examples, instances of different classes can be mapped into the latent space with the preserving of both visual and semantic relatedness, hence the semantic gap can be bridged. We conduct experiments on both object and human action recognition benchmarks to validate the effectiveness of the proposed ZSL framework. Then we extend the ZSL to the multi-label scenarios for multi-label zero-shot human action recognition based on weakly annotated video data. We employ a long short term memory (LSTM) neural network to explore the multiple actions underlying the video data. A joint latent space is learned by two component models (i.e. the visual model and the semantic model) to bridge the semantic gap. The two component embedding models are trained alternately to optimize the ranking based objectives. Extensive experiments are carried out on two multi-label human action datasets to evaluate the proposed framework. Finally, we propose alternative semantic representations for human actions towards narrowing the semantic gap from the perspective of semantic representation. A simple yet effective solution based on the exploration of web data has been investigated to enhance the semantic representations for human actions. The novel semantic representations are proved to benefit the zero-shot human action recognition significantly compared to the traditional attributes and word vectors. In summary, we propose novel frameworks for zero-shot visual recognition towards narrowing and bridging the semantic gap, and achieve state-of-the-art performance in different settings on multiple benchmarks.
APA, Harvard, Vancouver, ISO, and other styles
48

Recktenwald, Eric William. "VISUAL RECOGNITION OF THE STATIONARY ENVIRONMENT IN LEOPARD FROGS." Diss., Temple University Libraries, 2014. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/292229.

Full text
Abstract:
Biology
Ph.D.
Leopard frogs (Rana pipiens) rely on vision to recognize behaviorally meaningful aspects of their environment. The optic tectum has been shown to mediate the frog's ability to recognize and respond to moving prey and looming objects. Nonetheless, atectal frogs are still able to appropriately respond to non-moving aspects of their environment. There appears to be independent visual systems operating in the frog: one system for recognizing moving objects; and another system for recognizing stationary objects. Little is known about the neural mechanisms mediating the recognition of stationary objects in frogs. Our laboratory showed that a retino-recipient area in the anterior lateral thalamus--the NB/CG zone--is involved in processing visual information concerning stationary aspects of the environment. This thesis aims to characterize the frog's responses to a range of stationary stimuli, and to elucidate the thalamic visual system that mediates those responses. I tested leopard frogs' responses to different stationary stimuli and found they respond in stereotypical ways. I discovered that leopard frogs are attracted to dark, stationary, opaque objects; and tested the extent of this attraction under different conditions. I found that frogs' preference to move toward a dark area versus a light source depends on the intensity of the light source relative to the intensity of ambient light. Unilateral lesions applied to the NB/CG zone of the anterior lateral thalamus resulted in temporary deficits in frogs' responses to stationary stimuli presented in the contralateral visual field. Deficits were observed in response to: dark objects, entrances to dark areas, light sources, and gaps between stationary barriers. However, responses to moving prey and looming stimuli were unaffected. Interestingly, these deficits tended to recover after about 6 days in most cases. Recovery time ranged from 2 - 28 days. The NB/CG zone is anatomically and functionally connected to a structure in the posterior thalamus called the "PMDT." The PMDT has no other connections in the brain. Thus, I have discovered a "satellite" of the NB/CG zone. Preliminary evidence suggests that the PMDT is another component of the visual system mediating stationary object recognition in the frog.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
49

Boisard, Olivier. "Optimization and implementation of bio-inspired feature extraction frameworks for visual object recognition." Thesis, Dijon, 2016. http://www.theses.fr/2016DIJOS016/document.

Full text
Abstract:
L'industrie a des besoins croissants en systèmes dits intelligents, capable d'analyserles signaux acquis par des capteurs et prendre une décision en conséquence. Cessystèmes sont particulièrement utiles pour des applications de vidéo-surveillanceou de contrôle de qualité. Pour des questions de coût et de consommation d'énergie,il est souhaitable que la prise de décision ait lieu au plus près du capteur. Pourrépondre à cette problématique, une approche prometteuse est d'utiliser des méthodesdites bio-inspirées, qui consistent en l'application de modèles computationels issusde la biologie ou des sciences cognitives à des problèmes industriels. Les travauxmenés au cours de ce doctorat ont consisté à choisir des méthodes d'extractionde caractéristiques bio-inspirées, et à les optimiser dans le but de les implantersur des plateformes matérielles dédiées pour des applications en vision par ordinateur.Tout d'abord, nous proposons un algorithme générique pouvant être utilisés dans différentscas d'utilisation, ayant une complexité acceptable et une faible empreinte mémoire.Ensuite, nous proposons des optimisations pour une méthode plus générale, baséesessentiellement sur une simplification du codage des données, ainsi qu'une implantationmatérielle basées sur ces optimisations. Ces deux contributions peuvent par ailleurss'appliquer à bien d'autres méthodes que celles étudiées dans ce document
Industry has growing needs for so-called “intelligent systems”, capable of not only ac-quire data, but also to analyse it and to make decisions accordingly. Such systems areparticularly useful for video-surveillance, in which case alarms must be raised in case ofan intrusion. For cost saving and power consumption reasons, it is better to perform thatprocess as close to the sensor as possible. To address that issue, a promising approach isto use bio-inspired frameworks, which consist in applying computational biology modelsto industrial applications. The work carried out during that thesis consisted in select-ing bio-inspired feature extraction frameworks, and to optimize them with the aim toimplement them on a dedicated hardware platform, for computer vision applications.First, we propose a generic algorithm, which may be used in several use case scenarios,having an acceptable complexity and a low memory print. Then, we proposed opti-mizations for a more global framework, based on precision degradation in computations,hence easing up its implementation on embedded systems. Results suggest that whilethe framework we developed may not be as accurate as the state of the art, it is moregeneric. Furthermore, the optimizations we proposed for the more complex frameworkare fully compatible with other optimizations from the literature, and provide encourag-ing perspective for future developments. Finally, both contributions have a scope thatgoes beyond the sole frameworks that we studied, and may be used in other, more widelyused frameworks as well
APA, Harvard, Vancouver, ISO, and other styles
50

Farivar-Mohseni, Reza. "Object recognition by integration of information across the dorsal and ventral visual pathways." Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=21982.

Full text
Abstract:
The brain decomposes visual information into its form and motion components and processes the two aspects largely independently by way of anatomically distinct pathways that originate early in the visual system and continue ventrally to the occipito-temporal visual areas and dorsally to the occipito-parietal visual areas, respectively. Certain cues of shape, such as 3-D structure-from-motion (SFM), appear to be computed exclusively by dorsal-stream mechanisms, yet these cues can describe complex objects whose recognition depends on mechanisms in the ventral stream. This dissertation discusses theoretical means by which dorsally-computed 3-D cues may provide input to ventral stream object recognition mechanisms. Psychophysical and neuropsychological data presented here suggest that 3-D SFM cues do indeed empower complex object recognition, and recognition of shapes defined by 3-D SFM likely require integration of information across the two pathways. Additionally, neuropsychological data are presented for a dissociation of 3-D SFM processing from 2-D form-from-motion processing. Finally, utilizing functional imaging (FMRI), data are presented to suggest that SFM-defined objects do not engage category-selective areas in the human brain in the same manner as photographs of those objects do. Together these results suggest that visual object recognition may be subserved by mechanisms distributed between the two pathways.
Le cerveau décompose l'informations visuelle en ses composants de forme et de mouvement, et les traite de manière indépendante par deux voies anatomiques distinctes‹l¹information ayant attrait au mouvement et à la relation spatiale par la voie dorsale qui se termine dans le lobe pariétal et l¹information ayant attrait à la forme par la voie ventrale qui se termine dans le cortex inférotemporal. Certaines informations de profondeur, tel que la structure-par-mouvement 3-D (SPM), sont presque entièrement analysées par la voie dorsale; toutefois, les objets décris par la SPM sont aussi reconnus par les voies ventrales. Cette thèse débute par une discussion théorique décrivant la manière dont l¹information de profondeur calculée par la voie dorsale peut contribuer aux machinismes de reconnaissance des objets (voie ventrale). Les résultats des expériences psychophysiques et neuropsychologiques indiquent que l¹information de SPM peut permettre la reconnaissance des objets complexes, même des visages peu familiers, et cela peut constituer un case d¹intégration entre les deux voies indépendantes. De plus, les résultats des expériences neuropsychologiques présentées suggèrent que la perception de forme-par-mouvement 2-D est dissociable de celle de structure par mouvement 3-D. Finalement, par le biais d'imagerie par résonance magnétique fonctionnelle, nous avons démontré que les objets décris par SPM n¹activent pas le même méchanisme cérébral que des photos de ces mêmes objets. Ensemble, les résultats présentés ci-après suggèrent que la reconnaissance des objets visuels peut être distribuée entre les deux voies visuelles.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography