To see the other types of publications on this topic, follow the link: Vision, Monocular.

Dissertations / Theses on the topic 'Vision, Monocular'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Vision, Monocular.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Jama, Michal. "Monocular vision based localization and mapping." Diss., Kansas State University, 2011. http://hdl.handle.net/2097/8561.

Full text
Abstract:
Doctor of Philosophy
Department of Electrical and Computer Engineering
Balasubramaniam Natarajan
Dale E. Schinstock
In this dissertation, two applications related to vision-based localization and mapping are considered: (1) improving navigation system based satellite location estimates by using on-board camera images, and (2) deriving position information from video stream and using it to aid an auto-pilot of an unmanned aerial vehicle (UAV). In the first part of this dissertation, a method for analyzing a minimization process called bundle adjustment (BA) used in stereo imagery based 3D terrain reconstruction to refine estimates of camera poses (positions and orientations) is presented. In particular, imagery obtained with pushbroom cameras is of interest. This work proposes a method to identify cases in which BA does not work as intended, i.e., the cases in which the pose estimates returned by the BA are not more accurate than estimates provided by a satellite navigation systems due to the existence of degrees of freedom (DOF) in BA. Use of inaccurate pose estimates causes warping and scaling effects in the reconstructed terrain and prevents the terrain from being used in scientific analysis. Main contributions of this part of work include: 1) formulation of a method for detecting DOF in the BA; and 2) identifying that two camera geometries commonly used to obtain stereo imagery have DOF. Also, this part presents results demonstrating that avoidance of the DOF can give significant accuracy gains in aerial imagery. The second part of this dissertation proposes a vision based system for UAV navigation. This is a monocular vision based simultaneous localization and mapping (SLAM) system, which measures the position and orientation of the camera and builds a map of the environment using a video-stream from a single camera. This is different from common SLAM solutions that use sensors that measure depth, like LIDAR, stereoscopic cameras or depth cameras. The SLAM solution was built by significantly modifying and extending a recent open-source SLAM solution that is fundamentally different from a traditional approach to solving SLAM problem. The modifications made are those needed to provide the position measurements necessary for the navigation solution on a UAV while simultaneously building the map, all while maintaining control of the UAV. The main contributions of this part include: 1) extension of the map building algorithm to enable it to be used realistically while controlling a UAV and simultaneously building the map; 2) improved performance of the SLAM algorithm for lower camera frame rates; and 3) the first known demonstration of a monocular SLAM algorithm successfully controlling a UAV while simultaneously building the map. This work demonstrates that a fully autonomous UAV that uses monocular vision for navigation is feasible, and can be effective in Global Positioning System denied environments.
APA, Harvard, Vancouver, ISO, and other styles
2

Cheda, Diego. "Monocular Depth Cues in Computer Vision Applications." Doctoral thesis, Universitat Autònoma de Barcelona, 2012. http://hdl.handle.net/10803/121644.

Full text
Abstract:
La percepción de la profundidad es un aspecto clave en la visión humana. El ser humano realiza esta tarea sin esfuerzo alguno con el objetivo de efectuar diversas actividades cotidianas. A menudo, la percepción de la profundidad se ha asociado con la visión binocular. Pese a esto, los seres humanos tienen una capacidad asombrosa de percibir las relaciones de profundidad, incluso a partir de una sola imagen, mediante el uso de varias pistas monoculares. En el campo de la visión por ordenador, si la información de la profundidad de una imagen estuviera disponible, muchas tareas podr´ıan ser planteadas desde una perspectiva diferente en aras de un mayor rendimiento y robustez. Sin embargo, dada una única imagen, esta posibilidad es generalmente descartada, ya que la obtención de la información de profundidad es frecuentemente obtenida por las técnicas de reconstrucción tridimensional, que requieren dos o más imágenes de la misma escena tomadas desde diferentes puntos de vista. Recientemente, algunas propuestas han demostrado que es posible obtener información de profundidad a partir de imágenes individuales. En esencia, la idea es aprovechar el conocimiento a priori de las condiciones de adquisición de la imagen y de la escena observada para estimar la profundidad empleando pistas pictóricas monoculares. Estos enfoques tratan de estimar con precisión los mapas de profundidad de la escena empleando técnicas computacionalmente costosas. Sin embargo, muchos algoritmos de visión por ordenador no necesitan un mapa de profundidad detallado de la imagen. De hecho, sólo una descripción en profundidad aproximada puede ser muy valiosa en muchos problemas. En nuestro trabajo, hemos demostrado que incluso la información aproximada de profundidad puede integrarse en diferentes tareas siguiendo una estrategia holística con el fin de obtener resultados más precisos y robustos. En ese sentido, hemos propuesto una técnica simple, pero fiable, por medio de la cual regiones de la imagen de una escena se clasifican en rangos de profundidad discretos para construir un mapa tosco de la profundidad. Sobre la base de esta representación, hemos explorado la utilidad de nuestro método en tres dominios de aplicación desde puntos de vista novedosos: la estimación de la rotación de la cámara, la estimación del fondo de una escena y la generación de ventanas de interés para la detección de peatones. En el primer caso, calculamos la rotación de la cámara montada en un veh´ıculo en movimiento mediante dos nuevos m˜A c ⃝todos que identifican elementos distantes en la imagen a través de nuestros mapas de profundidad. En la reconstrucción del fondo de una imagen, propusimos un método novedoso que penaliza las regiones cercanas en una función de coste que integra, además, información del color y del movimiento. Por último, empleamos la información geométrica y de la profundidad de una escena para la generación de peatones candidatos. Este método reduce significativamente el número de ventanas generadas, las cuales serán posteriormente procesadas por un clasificador de peatones. En todos los casos, los resultados muestran que los enfoques basados en la profundidad contribuyen a un mejor rendimiento de las aplicaciones estudidadas.
Depth perception is a key aspect of human vision. It is a routine and essential visual task that the human do effortlessly in many daily activities. This has often been associated with stereo vision, but humans have an amazing ability to perceive depth relations even from a single image by using several monocular cues. In the computer vision field, if image depth information were available, many tasks could be posed from a different perspective for the sake of higher performance and robustness. Nevertheless, given a single image, this possibility is usually discarded, since obtaining depth information has frequently been performed by three-dimensional reconstruction techniques, requiring two or more images of the same scene taken from different viewpoints. Recently, some proposals have shown the feasibility of computing depth information from single images. In essence, the idea is to take advantage of a priori knowledge of the acquisition conditions and the observed scene to estimate depth from monocular pictorial cues. These approaches try to precisely estimate the scene depth maps by employing computationally demanding techniques. However, to assist many computer vision algorithms, it is not really necessary computing a costly and detailed depth map of the image. Indeed, just a rough depth description can be very valuable in many problems. In this thesis, we have demonstrated how coarse depth information can be integrated in different tasks following holistic and alternative strategies to obtain more precise and robustness results. In that sense, we have proposed a simple, but reliable enough technique, whereby image scene regions are categorized into discrete depth ranges to build a coarse depth map. Based on this representation, we have explored the potential usefulness of our method in three application domains from novel viewpoints: camera rotation parameters estimation, background estimation and pedestrian candidate generation. In the first case, we have computed camera rotation mounted in a moving vehicle from two novels methods that identify distant elements in the image, where the translation component of the image flow field is negligible. In background estimation, we have proposed a novel method to reconstruct the background by penalizing close regions in a cost function, which integrates color, motion, and depth terms. Finally, we have benefited of geometric and depth information available on single images for pedestrian candidate generation to significantly reduce the number of generated windows to be further processed by a pedestrian classifier. In all cases, results have shown that our depth-based approaches contribute to better performances.
APA, Harvard, Vancouver, ISO, and other styles
3

Veldman, Kyle John. "Monocular vision for collision avoidance in vehicles." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101478.

Full text
Abstract:
Thesis: S.B., Massachusetts Institute of Technology, Department of Mechanical Engineering, 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (page 21).
An experimental study facilitated by Ford Global Technologies, Inc. on the potential substitution of stereovision systems in car automation with monocular vision systems. The monocular system pairs a camera and passive lens with an active lens. Most active lenses require linear actuating systems to adjust the optical parameters of the system but this experiment employed an Optotune focus tunable lens adjusted by a Lorentz actuator for a much more reliable system. Tests were conducted in a lab environment to capture images of environmental objects at different distances from the system, pass those images through an image processing algorithm operating a high-pass filter to separate in-focus aspects of the image from out-of focus ones. Although the system is in the early phases of testing, monocular vision shows the ability to replace stereovision system. However, additional testing must be done to acclimate the apparatus to environmental factors, minimize the processing speed, and redesign the system for portability.
by Kyle John Veldman.
S.B.
APA, Harvard, Vancouver, ISO, and other styles
4

Ng, Matthew James. "Corridor Navigation for Monocular Vision Mobile Robots." DigitalCommons@CalPoly, 2018. https://digitalcommons.calpoly.edu/theses/1856.

Full text
Abstract:
Monocular vision robots use a single camera to process information about its environment. By analyzing this scene, the robot can determine the best navigation direction. Many modern approaches to robot hallway navigation involve using a plethora of sensors to detect certain features in the environment. This can be laser range finders, inertial measurement units, motor encoders, and cameras. By combining all these sensors, there is unused data which could be useful for navigation. To draw back and develop a baseline approach, this thesis explores the reliability and capability of solely using a camera for navigation. The basic navigation structure begins by taking frames from the camera and breaking them down to find the most prominent lines. The location where these lines intersect determine the forward direction to drive the robot. To improve the accuracy of navigation, algorithm improvements and additional features from the camera frames are used. This includes line intersection weighting to reduce noise from extraneous lines, floor segmentation to improve rotational stability, and person detection.
APA, Harvard, Vancouver, ISO, and other styles
5

Pereira, Fabio Irigon. "High precision monocular visual odometry." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2018. http://hdl.handle.net/10183/183233.

Full text
Abstract:
Extrair informação de profundidade a partir de imagens bidimensionais é um importante problema na área de visão computacional. Diversas aplicações se beneficiam desta classe de algoritmos tais como: robótica, a indústria de entretenimento, aplicações médicas para diagnóstico e confecção de próteses e até mesmo exploração interplanetária. Esta aplicação pode ser dividida em duas etapas interdependentes: a estimação da posição e orientação da câmera no momento em que a imagem foi gerada, e a estimativa da estrutura tridimensional da cena. Este trabalho foca em técnicas de visão computacional usadas para estimar a trajetória de um veículo equipado com uma câmera, problema conhecido como odometria visual. Para obter medidas objetivas de eficiência e precisão, e poder comparar os resultados obtidos com o estado da arte, uma base de dados de alta precisão, bastante utilizada pela comunidade científica foi utilizada. No curso deste trabalho novas técnicas para rastreamento de detalhes, estimativa de posição de câmera, cálculo de posição 3D de pontos e recuperação de escala são propostos. Os resultados alcançados superam os mais bem ranqueados trabalhos na base de dados escolhida até o momento da publicação desta tese.
Recovering three-dimensional information from bi-dimensional images is an important problem in computer vision that finds several applications in our society. Robotics, entertainment industry, medical diagnose and prosthesis, and even interplanetary exploration benefit from vision based 3D estimation. The problem can be divided in two interdependent operations: estimating the camera position and orientation when each image was produced, and estimating the 3D scene structure. This work focuses on computer vision techniques, used to estimate the trajectory of a vehicle equipped camera, a problem known as visual odometry. In order to provide an objective measure of estimation efficiency and to compare the achieved results to the state-of-the-art works in visual odometry a high precision popular dataset was selected and used. In the course of this work new techniques for image feature tracking, camera pose estimation, point 3D position calculation and scale recovery are proposed. The achieved results outperform the best ranked results in the popular chosen dataset.
APA, Harvard, Vancouver, ISO, and other styles
6

Goroshin, Rostislav. "Obstacle detection using a monocular camera." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24697.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Benoit, Stephen M. "Monocular optical flow for real-time vision systems." Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=23862.

Full text
Abstract:
This thesis introduces a monocular optical flow algorithm that has been shown to perform well at nearly real-time frame rates (4 FPS) on natural image sequences. The system is completely bottom-up, using pixel region-matching techniques. A coordinated gradient descent method is broken down into two stages; pixel region matching error measures are locally minimized, and flow field consistency constraints apply non-linear adaptive diffusion, causing confident measurements to influence their less confident neighbors. Convergence is usually accomplished with one iteration for an image frame pair. Temporal integration and Kalman filtering predicts upcoming flow fields and figure/ground separation. The algorithm is designed for flexibility: large displacements are tracked as easily as sub-pixel displacements, and higher-level information can feed flow field predictions into the measurement predictions into the measurement process.
APA, Harvard, Vancouver, ISO, and other styles
8

李宏釗 and Wan-chiu Li. "Localization of a mobile robot by monocular vision." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B31226371.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Malan, Daniel Francois. "3D tracking between satellites using monocular computer vision." Thesis, Stellenbosch : University of Stellenbosch, 2005. http://hdl.handle.net/10019.1/3081.

Full text
Abstract:
Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2005.
Visually estimating three-dimensional position, orientation and motion, between an observer and a target, is an important problem in computer vision. Solutions which compute threedimensional movement from two-dimensional intensity images, usually rely on stereoscopic vision. Some research has also been done in systems utilising a single (monocular) camera. This thesis investigates methods for estimating position and pose from monocular image sequences. The intended future application is of visual tracking between satellites flying in close formation. The ideas explored in this thesis build on methods developed for use in camera calibration, and structure from motion (SfM). All these methods rely heavily on the use of different variations of the Kalman Filter. After describing the problem from a mathematical perspective we develop different approaches to solving the estimation problem. The different approaches are successfully tested on simulated as well as real-world image sequences, and their performance analysed.
APA, Harvard, Vancouver, ISO, and other styles
10

Li, Wan-chiu. "Localization of a mobile robot by monocular vision /." Hong Kong : University of Hong Kong, 2001. http://sunzi.lib.hku.hk/hkuto/record.jsp?B23765896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Watanabe, Yoko. "Stochastically optimized monocular vision-based navigation and guidance." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/22545.

Full text
Abstract:
Thesis (Ph. D.)--Aerospace Engineering, Georgia Institute of Technology, 2008.
Committee Chair: Johnson, Eric; Committee Co-Chair: Calise, Anthony; Committee Member: Prasad, J.V.R.; Committee Member: Tannenbaum, Allen; Committee Member: Tsiotras, Panagiotis.
APA, Harvard, Vancouver, ISO, and other styles
12

Salama, Gouda Ismail Mohamed. "Monocular and Binocular Visual Tracking." Diss., Virginia Tech, 1999. http://hdl.handle.net/10919/37179.

Full text
Abstract:
Visual tracking is one of the most important applications of computer vision. Several tracking systems have been developed which either focus mainly on the tracking of targets moving on a plane, or attempt to reduce the 3-dimensional tracking problem to the tracking of a set of characteristic points of the target. These approaches are seriously handicapped in complex visual situations, particularly those involving significant perspective, textures, repeating patterns, or occlusion. This dissertation describes a new approach to visual tracking for monocular and binocular image sequences, and for both passive and active cameras. The method combines Kalman-type prediction with steepest-descent search for correspondences, using 2-dimensional affine mappings between images. This approach differs significantly from many recent tracking systems, which emphasize the recovery of 3-dimensional motion and/or structure of objects in the scene. We argue that 2-dimensional area-based matching is sufficient in many situations of interest, and we present experimental results with real image sequences to illustrate the efficacy of this approach. Image matching between two images is a simple one to one mapping, if there is no occlusion. In the presence of occlusion wrong matching is inevitable. Few approaches have been developed to address this issue. This dissertation considers the effect of occlusion on tracking a moving object for both monocular and binocular image sequences. The visual tracking system described here attempts to detect occlusion based on the residual error computed by the matching method. If the residual matching error exceeds a user-defined threshold, this means that the tracked object may be occluded by another object. When occlusion is detected, tracking continues with the predicted locations based on Kalman filtering. This serves as a predictor of the target position until it reemerges from the occlusion again. Although the method uses a constant image velocity Kalman filtering, it has been shown to function reasonably well in a non-constant velocity situation. Experimental results show that tracking can be maintained during periods of substantial occlusion. The area-based approach to image matching often involves correlation-based comparisons between images, and this requires the specification of a size for the correlation windows. Accordingly, a new approach based on moment invariants was developed to select window size adaptively. This approach is based on the sudden increasing or decreasing in the first Maitra moment invariant. We applied a robust regression model to smooth the first Maitra moment invariant to make the method robust against noise. This dissertation also considers the effect of spatial quantization on several moment invariants. Of particular interest are the affine moment invariants, which have emerged, in recent years as a useful tool for image reconstruction, image registration, and recognition of deformed objects. Traditional analysis assumes moments and moment invariants for images that are defined in the continuous domain. Quantization of the image plane is necessary, because otherwise the image cannot be processed digitally. Image acquisition by a digital system imposes spatial and intensity quantization that, in turn, introduce errors into moment and invariant computations. This dissertation also derives expressions for quantization-induced error in several important cases. Although it considers spatial quantization only, this represents an important extension of work by other researchers. A mathematical theory for a visual tracking approach of a moving object is presented in this dissertation. This approach can track a moving object in an image sequence where the camera is passive, and when the camera is actively controlled. The algorithm used here is computationally cheap and suitable for real-time implementation. We implemented the proposed method on an active vision system, and carried out experiments of monocular and binocular tracking for various kinds of objects in different environments. These experiments demonstrated that very good performance using real images for fairly complicated situations.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
13

Avanzini, Pierre. "Modélisation et commande d'un convoi de véhicules urbains par vision." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2010. http://tel.archives-ouvertes.fr/tel-00683626.

Full text
Abstract:
Cette thèse concerne la commande d'un convoi de véhicules avec l'objectif sociétal de réduire les problèmes de pollution et d'engorgement dans les milieux urbains. La recherche se concentre ici sur la navigation coopérative d'une flotte de véhicules communicants en s'appuyant sur une approche de commande globale : chaque véhicule est contrôlé à partir d'informations partagées par l'ensemble de la flotte, en s'appuyant sur des techniques de linéarisation exacte. Le développement de nouvelles fonctionnalités de navigation fait l'objet des deux contributions théoriques développées dans ce manuscrit et leur mise en oeuvre constitue la contribution pratique. Dans un premier temps, on s'intéresse à la mise en place d'un mode manuel de navigation dans lequel le premier véhicule, guidé par un opérateur, définit et retransmet la trajectoire à suivre aux membres du convoi. Il convient que la trajectoire, dont la représentation évolue au fur et à mesure de l'avancée du véhicule de tête, soit numériquement stable afin que les véhicules suiveurs qui sont asservis dessus puissent être contrôlés avec précision et sans subir de perturbations. A ces fins, la trajectoire a été modélisée par des courbes B-Spline et un algorithme itératif a été développé pour étendre cette dernière selon un critère d'optimisation évalué au regard des positions successives occupées par le véhicule de tête. Une analyse paramétrique a finalement permis d'aboutir à une synthèse optimale de la trajectoire en terme de fidélité et de stabilité de la représentation. Dans un second temps, on considère l'intégration d'une stratégie de localisation par vision monoculaire pour la navigation en convoi. L'approche repose sur une cartographie 3D de l'environnement préalablement construite à partir d'une séquence vidéo. Cependant, un tel monde virtuel comporte des distorsions locales par rapport au monde réel, ce qui affecte les performances des lois de commande en convoi. Une analyse des distorsions a permis de démontrer qu'il était possible de recouvrer des performances de navigation satisfaisantes à partir d'un jeu de facteurs d'échelle estimés localement le long de la trajectoire de référence. Plusieurs stratégies ont alors été élaborées pour estimer en-ligne ces facteurs d'échelle, soit à partir de données odométriques alimentant un observateur, soit à partir de données télémétriques intégrées dans un processus d'optimisation. Comme précédemment, l'influence des paramètres a été évaluée afin de mettre en évidence les meilleures configurations à utiliser en vue d'applications expérimentales. Pour finir, les algorithmes développés précédemment ont été mis en oeuvre lors d'expérimentations et ont permis d'obtenir des démonstrateurs en vraie grandeur comprenant jusqu'à quatre véhicules de type CyCab et RobuCab. Une attention particulière a été accordée à la cohérence temporelle des données. Celles-ci sont collectées de manière asynchrone au sein du convoi. L'utilisation du protocole NTP a permis de synchroniser l'horloge des véhicules et le middleware AROCCAM d'estampiller les données et de gérer le cadencement de la commande. Ainsi, le modèle d'évolution des véhicules a pu être intégré afin de disposer d'une estimation précise de l'état du convoi à l'instant où la commande est évaluée.
APA, Harvard, Vancouver, ISO, and other styles
14

Frost, Duncan. "Long range monocular SLAM." Thesis, University of Oxford, 2017. https://ora.ox.ac.uk/objects/uuid:af38cfa6-fc0a-48ab-b919-63c440ae8774.

Full text
Abstract:
This thesis explores approaches to two problems in the frame-rate computation of a priori unknown 3D scene structure and camera pose using a single camera, or monocular simultaneous localisation and mapping. The thesis reflects two trends in vision in general and structure from motion in particular: (i) the move from directly recovered and towards learnt geometry; and (ii) the sparsification of otherwise dense direct methods. The first contributions mitigate scale drift. Beyond the inevitable accumulation of random error, monocular SLAM accumulates error via the depth/speed scaling ambiguity. Three solutions are investigated. The first detects objects of known class and size using fixed descriptors, and incorporates their measurements in the 3D map. Experiments using databases with ground truth show that metric accuracy can be restored over kilometre distances; and similar gains are made using a hand-held camera. Our second method avoids explicit feature choice, instead employing a deep convolutional neural network to yield depth priors. Relative depths are learnt well, but absolute depths less so, and recourse to database-wide scaling is investigated. The third approach uses a novel trained network to infer speed from imagery. The second part of the thesis develops sparsified direct methods for monocular SLAM. The first contribution is a novel camera tracker operating directly using affine image warping, but on patches around sparse corners. Camera pose is recovered with an accuracy at least equal to the state of the art, while requiring only half the computational time. The second introduces a least squares adjustment to sparsified direct map refinement, again using patches from sparse corners. The accuracy of its 3D structure estimation is compared with that from the widely used method of depth filtering. It is found empirically that the new method's accuracy is often higher than that of its filtering counterpart, but that the method is more troubled by occlusion.
APA, Harvard, Vancouver, ISO, and other styles
15

Cheng, Kelvin. "Direct interaction with large displays through monocular computer vision." University of Sydney, 2009. http://hdl.handle.net/2123/5331.

Full text
Abstract:
Doctor of Philosophy (PhD)
Large displays are everywhere, and have been shown to provide higher productivity gain and user satisfaction compared to traditional desktop monitors. The computer mouse remains the most common input tool for users to interact with these larger displays. Much effort has been made on making this interaction more natural and more intuitive for the user. The use of computer vision for this purpose has been well researched as it provides freedom and mobility to the user and allows them to interact at a distance. Interaction that relies on monocular computer vision, however, has not been well researched, particularly when used for depth information recovery. This thesis aims to investigate the feasibility of using monocular computer vision to allow bare-hand interaction with large display systems from a distance. By taking into account the location of the user and the interaction area available, a dynamic virtual touchscreen can be estimated between the display and the user. In the process, theories and techniques that make interaction with computer display as easy as pointing to real world objects is explored. Studies were conducted to investigate the way human point at objects naturally with their hand and to examine the inadequacy in existing pointing systems. Models that underpin the pointing strategy used in many of the previous interactive systems were formalized. A proof-of-concept prototype is built and evaluated from various user studies. Results from this thesis suggested that it is possible to allow natural user interaction with large displays using low-cost monocular computer vision. Furthermore, models developed and lessons learnt in this research can assist designers to develop more accurate and natural interactive systems that make use of human’s natural pointing behaviours.
APA, Harvard, Vancouver, ISO, and other styles
16

Kurdziel, Michael Scott. "A monocular color vision system for road intersection detection /." Online version of thesis, 2008. http://hdl.handle.net/1850/6208.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Mienie, Dewald. "Autonomous docking for a satellite pair using monocular vision." Thesis, Stellenbosch : University of Stellenbosch, 2009. http://hdl.handle.net/10019.1/2382.

Full text
Abstract:
Thesis (MEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2009.
Autonomous rendezvouz and docking is seen as an enabling technology. It allows, among others, the construction of larger space platforms in-orbit and also provides a means for the in-orbit servicing of space vehicles. In this thesis a docking sequence is proposed and tested in both simulation and practice. This therefore also requires the design and construction of a test platform. A model hovercraft is used to emulate the chaser satellite in a 2-dimensional plane as it moves relatively frictionlessly. The hovercraft is also equipped with a single camera (monocular vision) that is used as the main sensor to estimate the target’s pose (relative position and orientation). An imitation of a target satellite was made and equipped with light markers that are used by the chaser’s camera sensor. The position of the target’s lights in the image is used to determine the target’s pose using a modified version ofMalan’s Extended Kalman Filter [20]. This information is then used during the docking sequence. This thesis successfully demonstrated the autonomous and reliable identification of the target’s lights in the image, and the autonomous docking of a satellite pair using monocular camera vision in both simulation and emulation.
APA, Harvard, Vancouver, ISO, and other styles
18

Cheng, Kelvin. "0Direct interaction with large displays through monocular computer vision." Connect to full text, 2008. http://ses.library.usyd.edu.au/handle/2123/5331.

Full text
Abstract:
Thesis (Ph. D.)--University of Sydney, 2009.
Title from title screen (viewed November 5, 2009). Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy to the School of Information Technologies in the the Faculty of Engineering & Information Technologies. Degree awarded 2009; thesis submitted 2008. Includes bibliographical references. Also available in print form.
APA, Harvard, Vancouver, ISO, and other styles
19

Schlachtman, Matthew Paul. "Monocular Vision and Image Correlation to Accomplish Autonomous Localization." DigitalCommons@CalPoly, 2010. https://digitalcommons.calpoly.edu/theses/320.

Full text
Abstract:
For autonomous navigation, robots and vehicles must have accurate estimates of their current state (i.e. location and orientation) within an inertial coordinate frame. If a map is given a priori, the process of determining this state is known as localization. When operating in the outdoors, localization is often assumed to be a solved problem when GPS measurements are available. However, in urban canyons and other areas where GPS accuracy is decreased, additional techniques with other sensors and filtering are required. This thesis aims to provide one such technique based on monocular vision. First, the system requires a map be generated, which consists of a set of geo-referenced video images. This map is generated offline before autonomous navigation is required. When an autonomous vehicle is later deployed, it will be equipped with an on-board camera. As the vehicle moves and obtains images, it will be able to compare its current images with images from the pre-generated map. To conduct this comparison, a method known as image correlation, developed at Johns Hopkins University by Rob Thompson, Daniel Gianola and Christopher Eberl, is used. The output from this comparison is used within a particle filter to provide an estimate of vehicle location. Experimentation demonstrates the particle filter's ability to successfully localize the vehicle within a small map that consists of a short section of road. Notably, no initial assumption of vehicle location within this map is required.
APA, Harvard, Vancouver, ISO, and other styles
20

Karlsson, Samuel. "Monocular vision-based obstacle avoidance for Micro Aerial Vehicles." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-80906.

Full text
Abstract:
The Micro Aerial Vehicless (MAVs) are gaining attention in numerous applications asthese platforms are cheap and can do complex maneuvers. Moreover, most of the commer-cially available MAVs are equipped with a mono-camera. Currently, there is an increasinginterest to deploy autonomous mono-camera MAVs with obstacle avoidance capabilitiesin various complex application areas. Some of the application areas have moving obstaclesas well as stationary, which makes it more challenging for collision avoidance schemes.This master thesis set out to investigate the possibility to avoid moving and station-ary obstacles with a single camera as the only sensor gathering information from thesurrounding environment.One concept to perform autonomous obstacle avoidance is to predict the time near-collision based on a Convolution Neural Network (CNN) architecture that uses the videofeed from a mono-camera. In this way, the heading of the MAV is regulated to maximizethe time to a collision, resulting in the avoidance maneuver. Moreover, another interestingperspective is when due to multiple dynamic obstacles in the environment there aremultiple time predictions for different parts of the Field of View (FoV). The method ismaximizing time to a collision by choosing the part with the largest time to collision.However, this is a complicated task and this thesis provides an overview of it whilediscussing the challenges and possible future directions. One of the main reason was thatthe available data set was not reliable and was not provide enough information for theCNN to produce any acceptable predictions.Moreover, this thesis looks into another approach for avoiding collisions, using objectdetection method You Only Lock Once (YOLO) with the mono-camera video feed. YOLOis a state-of-the-art network that can detect objects and produce bounding boxes in real-time. Because of YOLOs high success rate and speed were it chosen to be used in thisthesis. When YOLO detects an obstacle it is telling where in the image the object is,the obstacle pixel coordinates. By utilizing the images FoV and trigonometry can pixelcoordinates be transformed to an angle, assuming the lens does not distort the image.This position information can then be used to avoid obstacles. The method is evaluated insimulation environment Gazebo and experimental verification with commercial availableMAV Parrot Bebop 2. While the obtained results show the efficiency of the method. To bemore specific, the proposed method is capable to avoid dynamic and stationary obstacles.Future works will be the evaluation of this method in more complex environments with multiple dynamic obstacles for autonomous navigation of a team of MAVs. A video ofthe experiments can be viewed at:https://youtu.be/g_zL6eVqgVM.
APA, Harvard, Vancouver, ISO, and other styles
21

Magree, Daniel Paul. "Monocular vision-aided inertial navigation for unmanned aerial vehicles." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53892.

Full text
Abstract:
The reliance of unmanned aerial vehicles (UAVs) on GPS and other external navigation aids has become a limiting factor for many missions. UAVs are now physically able to fly in many enclosed or obstructed environments, due to the shrinking size and weight of electronics and other systems. These environments, such as urban canyons or enclosed areas, often degrade or deny external signals. Furthermore, many of the most valuable potential missions for UAVs are in hostile or disaster areas, where navigation infrastructure could be damaged, denied, or actively used against the vehicle. It is clear that developing alternative, independent, navigation techniques will increase the operating envelope of UAVs and make them more useful. This thesis presents work in the development of reliable monocular vision-aided inertial navigation for UAVs. The work focuses on developing a stable and accurate navigation solution in a variety of realistic conditions. First, a vision-aided inertial navigation algorithm is developed which assumes uncorrelated feature and vehicle states. Flight test results on a 80 kg UAV are presented, which demonstrate that it is possible to bound the horizontal drift with vision aiding. Additionally, a novel implementation method is developed for integration with a variety of navigation systems. Finally, a vision-aided navigation algorithm is derived within a Bierman-Thornton factored extended Kalman Filter (BTEKF) framework, using fully correlated vehicle and feature states. This algorithm shows improved consistency and accuracy by 2 to 3 orders of magnitude over the previous implementation, both in simulation and flight testing. Flight test results of the BTEKF on large (80 kg) and small (600 g) vehicles show accurate navigation over numerous tests.
APA, Harvard, Vancouver, ISO, and other styles
22

Spencer, Lisa. "REAL-TIME MONOCULAR VISION-BASED TRACKING FOR INTERACTIVE AUGMENTED REALITY." Doctoral diss., University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4289.

Full text
Abstract:
The need for real-time video analysis is rapidly increasing in today's world. The decreasing cost of powerful processors and the proliferation of affordable cameras, combined with needs for security, methods for searching the growing collection of video data, and an appetite for high-tech entertainment, have produced an environment where video processing is utilized for a wide variety of applications. Tracking is an element in many of these applications, for purposes like detecting anomalous behavior, classifying video clips, and measuring athletic performance. In this dissertation we focus on augmented reality, but the methods and conclusions are applicable to a wide variety of other areas. In particular, our work deals with achieving real-time performance while tracking with augmented reality systems using a minimum set of commercial hardware. We have built prototypes that use both existing technologies and new algorithms we have developed. While performance improvements would be possible with additional hardware, such as multiple cameras or parallel processors, we have concentrated on getting the most performance with the least equipment. Tracking is a broad research area, but an essential component of an augmented reality system. Tracking of some sort is needed to determine the location of scene augmentation. First, we investigated the effects of illumination on the pixel values recorded by a color video camera. We used the results to track a simple solid-colored object in our first augmented reality application. Our second augmented reality application tracks complex non-rigid objects, namely human faces. In the color experiment, we studied the effects of illumination on the color values recorded by a real camera. Human perception is important for many applications, but our focus is on the RGB values available to tracking algorithms. Since the lighting in most environments where video monitoring is done is close to white, (e.g., fluorescent lights in an office, incandescent lights in a home, or direct and indirect sunlight outside,) we looked at the response to "white" light sources as the intensity varied. The red, green, and blue values recorded by the camera can be converted to a number of other color spaces which have been shown to be invariant to various lighting conditions, including view angle, light angle, light intensity, or light color, using models of the physical properties of reflection. Our experiments show how well these derived quantities actually remained constant with real materials, real lights, and real cameras, while still retaining the ability to discriminate between different colors. This color experiment enabled us to find color spaces that were more invariant to changes in illumination intensity than the ones traditionally used. The first augmented reality application tracks a solid colored rectangle and replaces the rectangle with an image, so it appears that the subject is holding a picture instead. Tracking this simple shape is both easy and hard; easy because of the single color and the shape that can be represented by four points or four lines, and hard because there are fewer features available and the color is affected by illumination changes. Many algorithms for tracking fixed shapes do not run in real time or require rich feature sets. We have created a tracking method for simple solid colored objects that uses color and edge information and is fast enough for real-time operation. We also demonstrate a fast deinterlacing method to avoid "tearing" of fast moving edges when recorded by an interlaced camera, and optimization techniques that usually achieved a speedup of about 10 from an implementation that already used optimized image processing library routines. Human faces are complex objects that differ between individuals and undergo non-rigid transformations. Our second augmented reality application detects faces, determines their initial pose, and then tracks changes in real time. The results are displayed as virtual objects overlaid on the real video image. We used existing algorithms for motion detection and face detection. We present a novel method for determining the initial face pose in real time using symmetry. Our face tracking uses existing point tracking methods as well as extensions to Active Appearance Models (AAMs). We also give a new method for integrating detection and tracking data and leveraging the temporal coherence in video data to mitigate the false positive detections. While many face tracking applications assume exactly one face is in the image, our techniques can handle any number of faces. The color experiment along with the two augmented reality applications provide improvements in understanding the effects of illumination intensity changes on recorded colors, as well as better real-time methods for detection and tracking of solid shapes and human faces for augmented reality. These techniques can be applied to other real-time video analysis tasks, such as surveillance and video analysis.
Ph.D.
School of Computer Science
Engineering and Computer Science
Computer Science
APA, Harvard, Vancouver, ISO, and other styles
23

Murali, Vidya N. "Autonomous navigation and mapping using monocular low-resolution grayscale vision." Connect to this title online, 2008. http://etd.lib.clemson.edu/documents/1219852130/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Kang, Changkoo. "Small UAV Trajcetory Prediction and Avoidance using Monocular Computer Vision." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/79950.

Full text
Abstract:
Small unmanned aircraft systems (UAS) must be able to detect and avoid conflicting traffic, an especially challenging task when the threat is another small UAS. Collision avoidance requires trajectory prediction and the performance of a collision avoidance system can be improved by extending the prediction horizon. In this thesis, an algorithm for predicting the trajectory of a small, fixed-wing UAS using an estimate of its orientation and for maneuvering around the threat, if necessary, is developed. A computer vision algorithm locates specific feature points of a threat aircraft in an image and the POSIT algorithm uses these feature points to estimate the pose (position and attitude) of the threat. A sequence of pose estimates is then used to predict the trajectory of the threat aircraft and to avoid colliding with it. To assess the algorithm's performance, the predictions are compared with predictions based solely on position estimates for a variety of encounter scenarios. Simulation and experimental results indicate that trajectory prediction using orientation estimates provides quicker response to a change in the threat aircraft trajectory and results in better prediction and avoidance performance.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
25

Gampher, John Eric. "Perception of motion-in-depth induced motion effects on monocular and binocular cues /." Birmingham, Ala. : University of Alabama at Birmingham, 2008. https://www.mhsl.uab.edu/dt/2009r/gampher.pdf.

Full text
Abstract:
Thesis (Ph. D.)--University of Alabama at Birmingham, 2008.
Title from PDF title page (viewed Mar. 30, 2010). Additional advisors: Franklin R. Amthor, James E. Cox, Timothy J. Gawne, Rosalyn E. Weller. Includes bibliographical references (p. 104-114).
APA, Harvard, Vancouver, ISO, and other styles
26

Agarwal, Saurav. "Monocular vision based indoor simultaneous localisation and mapping for quadrotor platform." Thesis, Cranfield University, 2012. http://dspace.lib.cranfield.ac.uk/handle/1826/7210.

Full text
Abstract:
An autonomous robot acting in an unknown dynamic environment requires a detailed understanding of its surroundings. This information is provided by mapping algorithms which are necessary to build a sensory representation of the environment and the vehicle states. This aids the robot to avoid collisions with complex obstacles and to localize in six degrees of freedom i.e. x, y, z, roll, pitch and yaw angle. This process, wherein, a robot builds a sensory representation of the environment while estimating its own position and orientation in relation to those sensory landmarks, is known as Simultaneous Localisation and Mapping (SLAM). A common method for gauging environments are laser scanners, which enable mobile robots to scan objects in a non-contact way. The use of laser scanners for SLAM has been studied and successfully implemented. In this project, sensor fusion combining laser scanning and real time image processing is investigated. Hence, this project deals with the implementation of a Visual SLAM algorithm followed by design and development of a quadrotor platform which is equipped with a camera, low range laser scanner and an on-board PC for autonomous navigation and mapping of unstructured indoor environments. This report presents a thorough account of the work done within the scope of this project. It presents a brief summary of related work done in the domain of vision based navigation and mapping before presenting a real time monocular vision based SLAM algorithm. A C++ implementation of the visual slam algorithm based on the Extended Kalman Filter is described. This is followed by the design and development of the quadrotor platform. First, the baseline speci cations are described followed by component selection, dynamics modelling, simulation and control. The autonomous navigation algorithm is presented along with the simulation results which show its suitability to real time application in dynamic environments. Finally, the complete system architecture along with ight test results are described.
APA, Harvard, Vancouver, ISO, and other styles
27

Tournier, Glenn P. (Glenn Paul). "Six degrees of freedom estimation using monocular vision and moiré patterns." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/37951.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2006.
Includes bibliographical references (p. 105-107).
We present the vision-based estimation of the position and orientation of an object using a single camera relative to a novel target that incorporates the use of moire patterns. The objective is to acquire the six degree of freedom estimation that is essential for the operation of vehicles in close proximity to other craft and landing platforms. A target contains markers to determine relative orientation and locate two sets of orthogonal moire patterns at two different frequencies. A camera is mounted on a small vehicle with the target in the field of view. An algorithm processes the images extracting the attitude and position information of the camera relative to the target utilizing geometry and 4 single-point discrete Fourier transforms (DFTs) on the moire patterns. Manual and autonomous movement tests are conducted to determine the accuracy of the system relative to ground truth locations obtained through an external indoor positioning system. Position estimations with accompanying control techniques have been implemented including hovering, static platform landings, and dynamic platform landings to display the algorithm's ability to provide accurate information to precisely control the vehicle. The results confirm the moire target system's feasibility as a viable option for low-cost relative navigation for indoor and outdoor operations including landing on static and dynamic surfaces.
by Glenn P. Tournier.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
28

Mercado-Ravell, Diego Alberto. "Autonomous navigation and teleoperation of unmanned aerial vehicles using monocular vision." Thesis, Compiègne, 2015. http://www.theses.fr/2015COMP2239/document.

Full text
Abstract:
Ce travail porte, de façon théorétique et pratique, sur les sujets plus pertinents autour des drones en navigation autonome et semi-autonome. Conformément à la nature multidisciplinaire des problèmes étudies, une grande diversité des techniques et théories ont été couverts dans les domaines de la robotique, l’automatique, l’informatique, la vision par ordinateur et les systèmes embarques, parmi outres.Dans le cadre de cette thèse, deux plates-formes expérimentales ont été développées afin de valider la théorie proposée pour la navigation autonome d’un drone. Le premier prototype, développé au laboratoire, est un quadrirotor spécialement conçu pour les applications extérieures. La deuxième plate-forme est composée d’un quadrirotor à bas coût du type AR.Drone fabrique par Parrot. Le véhicule est connecté sans fil à une station au sol équipé d’un système d’exploitation pour robots (ROS) et dédié à tester, d’une façon facile, rapide et sécurisé, les algorithmes de vision et les stratégies de commande proposés. Les premiers travaux développés ont été basés sur la fusion de donnés pour estimer la position du drone en utilisant des capteurs inertiels et le GPS. Deux stratégies ont été étudiées et appliquées, le Filtre de Kalman Etendu (EKF) et le filtre à Particules (PF). Les deux approches prennent en compte les mesures bruitées de la position de l’UAV, de sa vitesse et de son orientation. On a réalisé une validation numérique pour tester la performance des algorithmes. Une tâche dans le cahier de cette thèse a été de concevoir d’algorithmes de commande pour le suivi de trajectoires ou bien pour la télé-opération. Pour ce faire, on a proposé une loi de commande basée sur l’approche de Mode Glissants à deuxième ordre. Cette technique de commande permet de suivre au quadrirotor de trajectoires désirées et de réaliser l’évitement des collisions frontales si nécessaire. Etant donné que la plate-forme A.R.Drone est équipée d’un auto-pilote d’attitude, nous avons utilisé les angles désirés de roulis et de tangage comme entrées de commande. L’algorithme de commande proposé donne de la robustesse au système en boucle fermée. De plus, une nouvelle technique de vision monoculaire par ordinateur a été utilisée pour la localisation d’un drone. Les informations visuelles sont fusionnées avec les mesures inertielles du drone pour avoir une bonne estimation de sa position. Cette technique utilise l’algorithme PTAM (localisation parallèle et mapping), qui s’agit d’obtenir un nuage de points caractéristiques dans l’image par rapport à une scène qui servira comme repère. Cet algorithme n’utilise pas de cibles, de marqueurs ou de scènes bien définies. La contribution dans cette méthodologie a été de pouvoir utiliser le nuage de points disperse pour détecter possibles obstacles en face du véhicule. Avec cette information nous avons proposé un algorithme de commande pour réaliser l’évitement d’obstacles. Cette loi de commande utilise les champs de potentiel pour calculer une force de répulsion qui sera appliquée au drone. Des expériences en temps réel ont montré la bonne performance du système proposé. Les résultats antérieurs ont motivé la conception et développement d’un drone capable de réaliser en sécurité l’interaction avec les hommes et les suivre de façon autonome. Un classificateur en cascade du type Haar a été utilisé pour détecter le visage d’une personne. Une fois le visage est détecté, on utilise un filtre de Kalman (KF) pour améliorer la détection et un algorithme pour estimer la position relative du visage. Pour réguler la position du drone et la maintenir à une distance désirée du visage, on a utilisé une loi de commande linéaire
The present document addresses, theoretically and experimentally, the most relevant topics for Unmanned Aerial Vehicles (UAVs) in autonomous and semi-autonomous navigation. According with the multidisciplinary nature of the studied problems, a wide range of techniques and theories are covered in the fields of robotics, automatic control, computer science, computer vision and embedded systems, among others. As part of this thesis, two different experimental platforms were developed in order to explore and evaluate various theories and techniques of interest for autonomous navigation. The first prototype is a quadrotor specially designed for outdoor applications and was fully developed in our lab. The second testbed is composed by a non expensive commercial quadrotor kind AR. Drone, wireless connected to a ground station equipped with the Robot Operating System (ROS), and specially intended to test computer vision algorithms and automatic control strategies in an easy, fast and safe way. In addition, this work provides a study of data fusion techniques looking to enhance the UAVs pose estimation provided by commonly used sensors. Two strategies are evaluated in particular, an Extended Kalman Filter (EKF) and a Particle Filter (PF). Both estimators are adapted for the system under consideration, taking into account noisy measurements of the UAV position, velocity and orientation. Simulations show the performance of the developed algorithms while adding noise from real GPS (Global Positioning System) measurements. Safe and accurate navigation for either autonomous trajectory tracking or haptic teleoperation of quadrotors is presented as well. A second order Sliding Mode (2-SM) control algorithm is used to track trajectories while avoiding frontal collisions in autonomous flight. The time-scale separation of the translational and rotational dynamics allows us to design position controllers by giving desired references in the roll and pitch angles, which is suitable for quadrotors equipped with an internal attitude controller. The 2-SM control allows adding robustness to the closed-loop system. A Lyapunov based analysis probes the system stability. Vision algorithms are employed to estimate the pose of the vehicle using only a monocular SLAM (Simultaneous Localization and Mapping) fused with inertial measurements. Distance to potential obstacles is detected and computed using the sparse depth map from the vision algorithm. For teleoperation tests, a haptic device is employed to feedback information to the pilot about possible collisions, by exerting opposite forces. The proposed strategies are successfully tested in real-time experiments, using a low-cost commercial quadrotor. Also, conception and development of a Micro Aerial Vehicle (MAV) able to safely interact with human users by following them autonomously, is achieved in the present work. Once a face is detected by means of a Haar cascade classifier, it is tracked applying a Kalman Filter (KF), and an estimation of the relative position with respect to the face is obtained at a high rate. A linear Proportional Derivative (PD) controller regulates the UAV’s position in order to keep a constant distance to the face, employing as well the extra available information from the embedded UAV’s sensors. Several experiments were carried out through different conditions, showing good performance even under disadvantageous scenarios like outdoor flight, being robust against illumination changes, wind perturbations, image noise and the presence of several faces on the same image. Finally, this thesis deals with the problem of implementing a safe and fast transportation system using an UAV kind quadrotor with a cable suspended load. The objective consists in transporting the load from one place to another, in a fast way and with minimum swing in the cable
APA, Harvard, Vancouver, ISO, and other styles
29

Ferrera, Maxime. "Monocular Visual-Inertial-Pressure fusion for Underwater localization and 3D mapping." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS089.

Full text
Abstract:
Cette thèse aborde le problème de la localisation et cartographie 3D sous-marine en temps-réel. Dans le domaine de l'archéologie sous-marine, des véhicules téléopérés (ROV – Remotely Operated Vehicle) sont utilisés pour étudier les sites. La localisation et la cartographie précises en temps-réel sont des informations essentielles pour le pilotage manuel ou automatique de ces engins. Bien que plusieurs solutions de localisation existent, la plupart d'entre elles reposent sur l'utilisation de capteurs tels que les lochs Doppler (DVL – Doppler Velocity Log) ou les centrales inertielles à gyroscopes à fibre optique,qui sont très coûteux et peuvent être trop volumineux ou trop lourds pour les ROVs les plus petits. Les systèmes de positionnement acoustique sont également fréquemment utilisés en complément des systèmes précédents, mais leur fréquence d’échantillonnage et leur précision sont limitées.Dans cette thèse, nous étudions l'utilisation de capteurs à faible coût pour la localisation sous-marine.Notre étude porte sur l'utilisation d'une caméra monoculaire, d'un capteur de pression et d'une centrale inertielle MEMS (Micro ElectroMechanical System) à faible coût comme seul moyen de localisation et de cartographie en contexte archéologique sous-marin.Nous avons mené une évaluation de différentes méthodes de suivi de point d'intérêts sur des images affectées par des perturbations typiques rencontrées dans un contexte sous-marin. À partir des résultats obtenus nous avons développé une méthode monoculaire de SLAM (Simultaneous Localization and Mapping) robuste aux perturbations spécifiques de l’environnement sous-marin. Ensuite, nous proposons une extension de cette méthode pour intégrer étroitement les mesures du capteur de pression etde la centrale inertielle dans l’algorithme de SLAM. La méthode finale fournit une localisation très précise et s'exécute en temps-réel. En outre, un module de reconstruction 3D dense, en ligne, compatible avec une configuration monoculaire, est également proposé. Deux prototypes compacts et légers de ce système nt été conçus et utilisés pour enregistrer des jeux de données qui ont été publiés. En outre, ces prototypes ont été utilisés avec succès pour tester et valider en conditions réelles les algorithmes de localisation et de cartographie proposés
This thesis addresses the problem of real-time 3D localization and mapping in underwater environments.In the underwater archaeology field, Remotely Operated Vehicles (ROVs) are used to conduct deep-seasurveys and excavations. Providing both accurate localization and mapping information in real-time iscrucial for manual or automated piloting of the robots. While many localization solutions already existfor underwater robots, most of them rely on very accurate sensors, such as Doppler velocity logs or fiberoptic gyroscopes, which are very expensive and may be too bulky for small ROVs. Acoustic positioningsystems are also commonly used for underwater positioning, but they provide low frequencymeasurements, with limited accuracy.In this thesis, we study the use of low-cost sensors for accurate underwater localization. Our studyinvestigates the use of a monocular camera, a pressure sensor and a low-cost MEMS-IMU as the onlymeans of performing localization and mapping in the context of underwater archaeology.We have conducted an evaluation of different features tracking methods on images affected by typicaldisturbances met in an underwater context. From the results obtained with this evaluation, we havedeveloped a monocular Visual SLAM (Simultaneous Localization and Mapping) method, robust to thespecific disturbances of underwater environments. Then, we propose an extension of this method totightly integrate the measurements of a pressure sensor and an IMU in the SLAM algorithm. The finalmethod provides a very accurate localization and runs in real-time. In addition, an online dense 3Dreconstruction module, compliant with a monocular setup, is also proposed. Two lightweight and compactprototypes of this system have been designed and used to record datasets that have been publiclyreleased. Furthermore, these prototypes have been successfully used to test and validate the proposedlocalization and mapping algorithms in real-case scenarios
APA, Harvard, Vancouver, ISO, and other styles
30

Xie, Bingqian. "Lane Departure and Front Collision Warning System Using Monocular and Stereo Vision." Digital WPI, 2015. https://digitalcommons.wpi.edu/etd-theses/274.

Full text
Abstract:
Driving Assistance Systems such as lane departure and front collision warning has caught great attention for its promising usage on road driving. This, this research focus on implementing lane departure and front collision warning at same time. In order to make the system really useful for real situation, it is critical that the whole process could be near real-time. Thus we chose Hough Transform as the main algorithm for detecting lane on the road. Hough Transform is used for that it is a very fast and robust algorithm, which makes it possible to execute as many frames as possible per frames. Hough Transform is used to get boundary information, so that we could decide if the car is doing lane departure based on the car's position in lane. Later, we move on to use front car's symmetry character to do front car detection, and combine it with Camshift tracking algorithm to fill the gap for failure of detection. Later we introduce camera calibration, stereo calibration, and how to calculate real distance from depth map.
APA, Harvard, Vancouver, ISO, and other styles
31

Randell, Charles James. "3D underwater monocular machine vision from 2D images in an attenuating medium." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp02/NQ32764.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Swart, Andre Dewald. "Monocular vision assisted autonomous landing of a helicopter on a moving deck." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019.1/80134.

Full text
Abstract:
Thesis (MScEng)--Stellenbosch University, 2013.
ENGLISH ABSTRACT: The landing phase of any helicopter is the most critical part of the whole flight envelope, particularly on a moving flight deck. The flight deck is usually located at the stern of the ship, translating to large heave motions. This thesis focuses on the three fundamental components required for a successful landing: accurate, relative state-estimation between the helicopter and the flight deck; a prediction horizon to forecast suitable landing opportunities; and excellent control to safely unite the helicopter with the flight deck. A monocular-vision sensor node was developed to provide accurate, relative position and attitude information of the flight deck. The flight deck is identified by a distinct, geometric pattern. The relative states are combined with the onboard, kinematic state-estimates of the helicopter to provide an inertial estimate of the flight deck states. Onboard motion prediction is executed to forecast a possible safe landing time which is conveyed to the landing controller. Camera pose-estimation tests and hardware-in-the-loop simulations proved the system developed in this thesis viable for flight tests. The practical flight tests confirmed the success of the monocular-vision sensor node.
AFRIKAANSE OPSOMMING: Die mees kritiese deel van die hele vlug-duurte van ’n helikopter is die landings-fase, veral op ’n bewegende vlugdek. Die vlugdek is gewoonlik geleë aan die agterstewe-kant van die skip wat groot afgee bewegings mee bring. Hierdie tesis ondersoek die drie fundamentele komponente van ’n suksesvolle landing: akkurate, relatiewe toestand-beraming tussen die helikopter en die vlugdek; ’n vooruitskatting horison om geskikte landings geleenthede te voorspel; en uitstekended beheer om die helikopter en vlugdek veilig te verenig. ’n Monokulêre-visie sensor-nodus was ontwikkel om akkurate, relatiewe-posisie en oriëntasie informasie van die vlugdek te verwerf. Die vlugdek is geidentifiseer deur ’n kenmerkende, geometriese patroon. Die relatiewe toestande word met die aan-boord kinematiese toestandafskatter van die helikopter gekombineer, om ’n beraming van die inertiale vlugdek-toestande te verskaf. Aan-boord beweging-vooruitskatting is uitgevoer om moontlike, veilige landingstyd te voorspel en word teruggevoer na die landingsbeheerder. Kamera-orientasie afskat-toetse en hardeware-in-die-lus simulasies het die ontwikkelde sisteem van hierdie tesis lewensvatbaar vir vlug-toetse bewys. Praktiese vlug-toetse het die sukses van die monokulêre-visie sensor-nodus bevestig.
APA, Harvard, Vancouver, ISO, and other styles
33

Diskin, Yakov. "Dense 3D Point Cloud Representation of a Scene Using Uncalibrated Monocular Vision." University of Dayton / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1366386933.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Katramados, Ioannis. "Real-time object detection using monocular vision for low-cost automotive sensing systems." Thesis, Cranfield University, 2013. http://dspace.lib.cranfield.ac.uk/handle/1826/10386.

Full text
Abstract:
This work addresses the problem of real-time object detection in automotive environments using monocular vision. The focus is on real-time feature detection, tracking, depth estimation using monocular vision and finally, object detection by fusing visual saliency and depth information. Firstly, a novel feature detection approach is proposed for extracting stable and dense features even in images with very low signal-to-noise ratio. This methodology is based on image gradients, which are redefined to take account of noise as part of their mathematical model. Each gradient is based on a vector connecting a negative to a positive intensity centroid, where both centroids are symmetric about the centre of the area for which the gradient is calculated. Multiple gradient vectors define a feature with its strength being proportional to the underlying gradient vector magnitude. The evaluation of the Dense Gradient Features (DeGraF) shows superior performance over other contemporary detectors in terms of keypoint density, tracking accuracy, illumination invariance, rotation invariance, noise resistance and detection time. The DeGraF features form the basis for two new approaches that perform dense 3D reconstruction from a single vehicle-mounted camera. The first approach tracks DeGraF features in real-time while performing image stabilisation with minimal computational cost. This means that despite camera vibration the algorithm can accurately predict the real-world coordinates of each image pixel in real-time by comparing each motion-vector to the ego-motion vector of the vehicle. The performance of this approach has been compared to different 3D reconstruction methods in order to determine their accuracy, depth-map density, noise-resistance and computational complexity. The second approach proposes the use of local frequency analysis of i ii gradient features for estimating relative depth. This novel method is based on the fact that DeGraF gradients can accurately measure local image variance with subpixel accuracy. It is shown that the local frequency by which the centroid oscillates around the gradient window centre is proportional to the depth of each gradient centroid in the real world. The lower computational complexity of this methodology comes at the expense of depth map accuracy as the camera velocity increases, but it is at least five times faster than the other evaluated approaches. This work also proposes a novel technique for deriving visual saliency maps by using Division of Gaussians (DIVoG). In this context, saliency maps express the difference of each image pixel is to its surrounding pixels across multiple pyramid levels. This approach is shown to be both fast and accurate when evaluated against other state-of-the-art approaches. Subsequently, the saliency information is combined with depth information to identify salient regions close to the host vehicle. The fused map allows faster detection of high-risk areas where obstacles are likely to exist. As a result, existing object detection algorithms, such as the Histogram of Oriented Gradients (HOG) can execute at least five times faster. In conclusion, through a step-wise approach computationally-expensive algorithms have been optimised or replaced by novel methodologies to produce a fast object detection system that is aligned to the requirements of the automotive domain.
APA, Harvard, Vancouver, ISO, and other styles
35

Garg, Ravi. "Dense motion capture of deformable surfaces from monocular video." Thesis, Queen Mary, University of London, 2013. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8823.

Full text
Abstract:
Accurate motion capture of deformable objects from monocular video sequences is a challenging Computer Vision problem with immense applicability to domains ranging from virtual reality, animation to image guided surgery. Existing dense motion capture methods rely on expensive setups with multiple calibrated cameras,structured light, active markers or prior scene knowledge learned from a large 3D dataset. In this thesis, we propose an end-to-end pipeline for 3D reconstruction of deformable scenes from a monocular video sequence. Our method relies on a two step pipeline in which temporally consistent video registration is followed by a dense non-rigid structure from motion approach. We present a data-driven method to reconstruct non-rigid smooth surfaces densely, using only a single video as input, without the need for any prior models or shape templates. We focus on the well explored low-rank prior for deformable shape reconstruction and propose its convex relaxation to introduce the first variational energy minimisation approach to non-rigid structure from motion. To achieve realistic dense reconstruction of sparsely textured surfaces, we incorporate an edge preserving spatial smoothness prior into the low-rank factorisation framework and design a single variational energy to address the non-rigid structure from motion problem. We also discuss the importance of long-term 2D trajectories for several vision problems and explain how subspace constraints can be used to exploit the redundancy present in the motion of real scenes for dense video registration. To that end, we adopt a variational optimisation approach to design a robust multi-frame video registration algorithm that combines a robust subspace prior with a total variation spatial regulariser. Throughout this thesis, we advocate the use of GPU-portable and scalable energy minimisation algorithms to progress towards practical dense non-rigid 3D motion capture from a single video in the presence of occlusions and illumination changes.
APA, Harvard, Vancouver, ISO, and other styles
36

Nassir, Cesar. "Domain-Independent Moving Object Depth Estimation using Monocular Camera." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233519.

Full text
Abstract:
Today automotive companies across the world strive to create vehicles with fully autonomous capabilities. There are many benefits of developing autonomous vehicles, such as reduced traffic congestion, increased safety and reduced pollution, etc. To be able to achieve that goal there are many challenges ahead, one of them is visual perception. Being able to estimate depth from a 2D image has been shown to be a key component for 3D recognition, reconstruction and segmentation. Being able to estimate depth in an image from a monocular camera is an ill-posed problem since there is ambiguity between the mapping from colour intensity and depth value. Depth estimation from stereo images has come far compared to monocular depth estimation and was initially what depth estimation relied on. However, being able to exploit monocular cues is necessary for scenarios when stereo depth estimation is not possible. We have presented a novel CNN network, BiNet which is inspired by ENet, to tackle depth estimation of moving objects using only a monocular camera in real-time. It performs better than ENet in the Cityscapes dataset while adding only a small overhead to the complexity.
I dag strävar bilföretag över hela världen för att skapa fordon med helt autonoma möjligheter. Det finns många fördelar med att utveckla autonoma fordon, såsom minskad trafikstockning, ökad säkerhet och minskad förorening, etc. För att kunna uppnå det målet finns det många utmaningar framåt, en av dem är visuell uppfattning. Att kunna uppskatta djupet från en 2D-bild har visat sig vara en nyckelkomponent för 3D-igenkännande, rekonstruktion och segmentering. Att kunna uppskatta djupet i en bild från en monokulär kamera är ett svårt problem eftersom det finns tvetydighet mellan kartläggningen från färgintensitet och djupvärde. Djupestimering från stereobilder har kommit långt jämfört med monokulär djupestimering och var ursprungligen den metod som man har förlitat sig på. Att kunna utnyttja monokulära bilder är dock nödvändig för scenarier när stereodjupuppskattning inte är möjligt. Vi har presenterat ett nytt nätverk, BiNet som är inspirerat av ENet, för att ta itu med djupestimering av rörliga objekt med endast en monokulär kamera i realtid. Det fungerar bättre än ENet med datasetet Cityscapes och lägger bara till en liten kostnad på komplexiteten.
APA, Harvard, Vancouver, ISO, and other styles
37

Repo, T. (Tapio). "Modeling of structured 3-D environments from monocular image sequences." Doctoral thesis, University of Oulu, 2002. http://urn.fi/urn:isbn:9514268571.

Full text
Abstract:
Abstract The purpose of this research has been to show with applications that polyhedral scenes can be modeled in real time with a single video camera. Sometimes this can be done very efficiently without any special image processing hardware. The developed vision sensor estimates its three-dimensional position with respect to the environment and models it simultaneously. Estimates become recursively more accurate when objects are approached and observed from different viewpoints. The modeling process starts by extracting interesting tokens, like lines and corners, from the first image. Those features are then tracked in subsequent image frames. Also some previously taught patterns can be used in tracking. A few features in the same image are extracted. By this way the processing can be done at a video frame rate. New features appearing can also be added to the environment structure. Kalman filtering is used in estimation. The parameters in motion estimation are location and orientation and their first derivates. The environment is considered a rigid object in respect to the camera. The environment structure consists of 3-D coordinates of the tracked features. The initial model lacks depth information. The relational depth is obtained by utilizing facts such as closer points move faster on the image plane than more distant ones during translational motion. Additional information is needed to obtain absolute coordinates. Special attention has been paid to modeling uncertainties. Measurements with high uncertainty get less weight when updating the motion and environment model. The rigidity assumption is utilized by using shapes of a thin pencil for initial model structure uncertainties. By observing continuously motion uncertainties, the performance of the modeler can be monitored. In contrast to the usual solution, the estimations are done in separate state vectors, which allows motion and 3-D structure to be estimated asynchronously. In addition to having a more distributed solution, this technique provides an efficient failure detection mechanism. Several trackers can estimate motion simultaneously, and only those with the most confident estimates are allowed to update the common environment model. Tests showed that motion with six degrees of freedom can be estimated in an unknown environment. The 3-D structure of the environment is estimated simultaneously. The achieved accuracies were millimeters at a distance of 1-2 meters, when simple toy-scenes and more demanding industrial pallet scenes were used in tests. This is enough to manipulate objects when the modeler is used to offer visual feedback.
APA, Harvard, Vancouver, ISO, and other styles
38

Sköld, Jonas. "Estimating 3D-trajectories from Monocular Video Sequences." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-172919.

Full text
Abstract:
Tracking a moving object and reconstructing its trajectory can be done with a stereo camera system, since the two cameras enable depth vision. However, such a system would not work if one of the cameras fails to detect the object. If that happens, it would be beneficial if the system could still use the functioning camera to make an approximate trajectory reconstruction. In this study, I have investigated how past observations from a stereo system can be used to recreate trajectories when video from only one of the cameras is available. Several approaches have been implemented and tested, with varying results. The best method was found to be a nearest neighbors-search optimized by a Kalman filter. On a test set with 10000 golf shots, the algorithm was able to create estimations which on average differed around 3.5 meters from the correct trajectory, with better results for trajec-tories originating close to the camera.
Att spåra ett objekt i rörelse och rekonstruera dess bana kan göras med ett stereokamerasystem, eftersom de två kamerorna möjliggör djupseende. Ett sådant system skulle dock inte fungera om en av kamerorna misslyckas med att detektera objektet. Om det händer skulle det vara fördelaktigt om systemet ändå kunde använda den fungerande kameran för att göra en approximativ rekonstruktion av banan. I den här studien har jag undersökt hur tidigare observationer från ett stereosystem kan användas för att rekonstruera banor när video från enbart en av kamerorna är tillgänglig. Ett flertal metoder har implementerats och testats, med varierande resultat. Den bästa metoden visade sig vara en närmaste-grannar-sökning optimerad med ett Kalman-filter. På en testmängd bestående av 10000 golfslag kunde algoritmen skapa uppskattningar som i genomsnitt skiljde sig 3.5 meter från den korrekta banan, med bättre resultat för banor som startat nära kameran.
APA, Harvard, Vancouver, ISO, and other styles
39

Ekström, Marcus. "Road Surface Preview Estimation Using a Monocular Camera." Thesis, Linköpings universitet, Datorseende, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-151873.

Full text
Abstract:
Recently, sensors such as radars and cameras have been widely used in automotives, especially in Advanced Driver-Assistance Systems (ADAS), to collect information about the vehicle's surroundings. Stereo cameras are very popular as they could be used passively to construct a 3D representation of the scene in front of the car. This allowed the development of several ADAS algorithms that need 3D information to perform their tasks. One interesting application is Road Surface Preview (RSP) where the task is to estimate the road height along the future path of the vehicle. An active suspension control unit can then use this information to regulate the suspension, improving driving comfort, extending the durabilitiy of the vehicle and warning the driver about potential risks on the road surface. Stereo cameras have been successfully used in RSP and have demonstrated very good performance. However, the main disadvantages of stereo cameras are their high production cost and high power consumption. This limits installing several ADAS features in economy-class vehicles. A less expensive alternative are monocular cameras which have a significantly lower cost and power consumption. Therefore, this thesis investigates the possibility of solving the Road Surface Preview task using a monocular camera. We try two different approaches: structure-from-motion and Convolutional Neural Networks.The proposed methods are evaluated against the stereo-based system. Experiments show that both structure-from-motion and CNNs have a good potential for solving the problem, but they are not yet reliable enough to be a complete solution to the RSP task and be used in an active suspension control unit.
APA, Harvard, Vancouver, ISO, and other styles
40

Schultz, Kevin P. "Exploration of the crosslinks between saccadic and vergence eye movement pathways using motor and visual perturbations." Thesis, Birmingham, Ala. : University of Alabama at Birmingham, 2010. https://www.mhsl.uab.edu/dt/2010p/schultz.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Bayle, Elodie. "Entre fusion et rivalité binoculaires : impact des caractéristiques des stimuli visuels lors de l’utilisation d’un système de réalité augmentée semi-transparent monoculaire." Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG029.

Full text
Abstract:
Les visuels de casque monoculaires utilisés dans l'aéronautique augmentent la vision des pilotes et facilitent l'accès aux informations essentielles telles que la symbologie de vol. Ils sont plus légers et plus ajustables que leurs homologuesbinoculaires, peuvent s'intégrer dans n'importe quel aéronef et permettent de conserver l'information quelle que soit la direction de regard. Cependant, ils génèrent une perception particulière puisqu'une image virtuelle monoculaire se superpose à l'environnement réel binoculaire. Des informations différentes sont projetées à des régions correspondantes des deux yeux créant un conflit interoculaire. L'enjeu de cette thèse est d'évaluer l'impact des caractéristiques des stimuli sur les performances aux tâches réalisées via ce type de système afin d'en optimiser l'utilisation.Deux études psychophysiques et une étude écologique en simulateur de vol ont été réalisées. Toutes ont montré un bon confort à l'exposition. Les performances ont été évalués en fonction des caractéristiques du fond binoculaire, de l'affichage de l'image monoculaire et des caractéristiques d'évènements à détecter. Le choix de l'œil de présentation n'est pas anodin étant donné les différences entre les performances réalisées avec le monoculaire sur chacun des deux yeux. Nos résultats aux trois études montrent également que, comme pour deux images fusionnables ou dichoptiques, les performances sont dépendantes des stimulations visuelles. Ils suggèrent donc qu'il faille envisager une symbologie adaptative qui ne peux se résumer au changement de luminosité dont dispose actuellement les pilotes
Monocular augmented reality devices are used in the aeronautical field to enhance pilots' vision by providing access to essential information such as flight symbology. They are lighter and more adjustable than their binocular counterparts, can be integrated into any aircraft, and allow information to be retained regardless of gaze direction. However, they generate a particular perception since a monocular virtual image is superimposed on the real binocular environment. Different information is projected to corresponding regions of the two eyes creating an interocular conflict. The goal of this thesis is to evaluate the impact of the stimuli characteristics on the performance of tasks performed with this type of system to optimize its use. Two psychophysical studies and an ecological study in a flight simulator have been carried out. All of them showed a good comfort when exposed to interocular conflict. The performances were evaluated according to the characteristics of the binocular background, the display of the monocular image and the characteristics of events to be detected. The choice of the presenting eye is not insignificant given the differences between the performances achieved with the monocular on each of the two eyes. Our results from the three studies also show that, as with two fusible or two dichoptic images, performance is dependent on visual stimuli. They therefore suggest that an adaptive symbology should be considered, which cannot be summarized by the change in brightness currently available to pilots
APA, Harvard, Vancouver, ISO, and other styles
42

Spaenlehauer, Ariane. "Decentralized monocular-inertial multi-UAV SLAM system." Thesis, Compiègne, 2019. http://www.theses.fr/2019COMP2494.

Full text
Abstract:
Dans cette thèse, nous proposons un algorithme pour la localisation d’une flotte de UAVs autonomes dans le cadre de l’architecture des Systèmes-de-Systèmes. En particulier, notre objectif est que les UAVs autonomes puissent se localiser et générer une carte d’un environnement inconnu en utilisant le moins possible de capteurs embarqués sur chaque UAV : une caméra monoculaire dirigée vers l’avant et une centrale inertielle. Cette problématique est cruciale pour des applications telles que l’exploration de zones inconnues ou de missions de sauvetage et de reconnaissance. Les choix de conception algorithmique sont motivés par une étude de l’état de l’art dans les domaines des systèmes multi-robots réalisant de la localisation, de la cartographie, de la navigation et/ou de l’exploration, ainsi que des approches de SLAM visuel, monoculaire, temps réel et monoculaire-inertiel. Le traitement des mesures en vision monoculaire, par nature, n’est pas capable d’estimer des distances métriques à cause de la perte d’information sur la profondeur lors de la projection de l’environnement sur le plan image. Bien que cela ne représente pas un problème majeur pour la plupart des systèmes simple-robot, l’obtention de distances métriques est nécessaire pour permettre la collaboration inter-robots. De plus, lorsque les problématiques de contrôle, de navigation et d’exploration sont ajoutées au problème initial, les distances métriques deviennent d’autant plus importantes. Dans cette thèse, nous proposons une nouvelle approche pour estimer des distances métriques pour un système mobile embarquant seulement une caméra monoculaire et une centrale inertielle via une fusion lâche des mesures.Ce travail de recherche explore également la conception d’un système de localisation pour une flotte de UAVs soumise à des hypothèses minimales : pas de connaissance a priori sur les poses initiales et terminales, pas de connaissance sur l’environnement et pas de mesures absolues ou extérieures aux robots. De plus, notre système est capable de gérer des trajectoires agressives, des changements de vitesse et de cap abrupts ainsi que des mesures bruitées telles que des images floues. Dans les systèmes multi-robots, la gestion des repères est critique et nécessite une attention particulière. La plupart des travaux simplifie ce problème dans un premier temps, en représentant l’ensemble des repères par rapport à un repère « monde » arbitraire. Toutefois, ce genre d’hypothèse nécessite soit des mesures de capteurs que nous n’utilisons pas dans notre système car externes aux robots, soit une connaissance a priori sur l’environnement ou les conditions expérimentales que nous n’avons pas. Dans notre système, chaque robot évolue dans son propre repère, l’ensemble des relations entre les repères nécessaires à la collaboration inter-robots est estimé par notre algorithme. De ce fait, nous pouvons nous abstraire de système de positionnement absolu tel que le GPS. Dans ce but, nous proposons une généralisation du concept de fermeture de boucle bien connu dans le SLAM traditionnel (simple-robot) aux systèmes de SLAM multi-robots. Dans le cadre de cette généralisation, nous mettons en exergue les nouveaux phénomènes induits et leurs conséquences. Cela est démontré expérimentalement, et mis en correspondance avec un système simple-robot. De plus, nous présentons les résultats expérimentaux obtenus dans la recherche de l’amélioration de la localisation en intégrant les nouvelles contraintes de fermeture de boucle
In this thesis, we provide a scheme for localization of a fleet of autonomous UAVs (unmanned autonomous vehicles) within a Technological System-of-Systems architecture. Specifically, we aim for a fleet of autonomous UAVs to localize themselves and to obtain a map of an unknown environment using a minimal set of sensors on each UAV: A front monocular camera and an Inertial Measurement Unit. This is a critically important problem for applications such as exploration of unknown areas, or search and rescue missions. The choices for designing such a system are supported by an extensive study of the scientific literature on two broad fronts: First, about the multi-robot systems performing localization, mapping, navigation and exploration, and second, about the monocular, real-time and inertial-monocular SLAM (Simultaneous Localization and Mapping) algorithms. Processing monocular camera frames suffers the drawback of lacking the capability of providing metric estimates as the depth dimension is lost when the frames are photographed by the camera. Although, it is usually not a critical problem for single-robot systems, having accurate metric estimates is required for multi-robot systems. This requirement becomes critical if the system is designed for control, navigation and exploration purposes. In this thesis, we provide a novel approach to make the outputs of monocular SLAM algorithms metric through a loosely-coupled fusion scheme by using the inertial measurements. This work also explores a design for a fleet of UAVs to localize each robot with minimal requirements: No a priori knowledge about the environment, information about neither the position nor the moment in time the UAV takes off and land is required. Moreover, the system presented in the thesis handles aggressive UAV trajectories having dramatic changes in speed and altitude. In multi-robot systems, the question of the coordinate frames require more attention than in single robot systems. In many studies, the coordinate frame problem is simplified to the representation of the fleet and the expression of the measurements in a global coordinate frame. However, this kind of hypothesis implies either the use of additional sensors to be able to measure the transformations to the global coordinate frame or additional experimental constraints, for example about the starting position of the robots. Our system does not require absolute measurements like GNSS positioning or knowledge about the coordinate frame of each UAV. As each UAV of the fleet estimates its location and produces a map in its own coordinate frame, relations between those coordinate frames are found by our scheme. For that purpose, we extend the well known concept of loop-closures in single-robot SLAM approaches, to multi-robot systems. In this research work, we also provide an overview of the new effects due to the extended definition of loop-closures we provide in comparison with the loop-closures scheme that can be found in single robot SLAM algorithms. In addition to the coordinate frame problem, we provide experimental results about the possibilities for improving the location estimate of a fleet by considering the places visited by several UAVs. By searching for similar places using each UAV imagery, using the 2-D information encapsulated in the images of the same sceneryfrom different view points, and the 3-D map locally estimated by each UAV, we add new constraints to the SLAM problem that is the main scheme that can be used to improve the UAV location estimates. We included experiments to assess the accuracy of the inter-UAV location estimation. The system was tested using datasets with measurements recorded on board UAVs in similar conditions as the ones we target
APA, Harvard, Vancouver, ISO, and other styles
43

Herdtweck, Christian [Verfasser], and Heinrich [Akademischer Betreuer] Bülthoff. "Learning Data-Driven Representations for Robust Monocular Computer Vision Applications / Christian Herdtweck ; Betreuer: Heinrich Bülthoff." Tübingen : Universitätsbibliothek Tübingen, 2014. http://d-nb.info/1162897317/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Formankiewicz, Monika Anna. "The psychophysics of lustre and the use of monocular filters to treat colour vision deficiencies." Thesis, University of Cambridge, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.615264.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Munn, Susan M. "3D head motion, point-of-regard and encoded gaze fixations in real scenes : next-generation portable video-based monocular eye tracking /." Online version of thesis, 2009. http://hdl.handle.net/1850/11206.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Barra, Roberto José Giordano. "Combinação de visão monocular e sonares esparsos para a localização de robôs móveis." Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-13072007-164026/.

Full text
Abstract:
Um componente fundamental no sistema de um robô móvel consiste na habilidade de localizar-se acuradamente, o que envolve estimar sua postura em relação a uma representação global do espaço. A especificação geral de uma abordagem de localização baseada em dados sensoriais possui uma estimativa inicial da postura do robô e usa os dados coletados pelos sensores, em conjunto com um mapa do ambiente, para produzir uma estimativa mais precisa da postura, que oferece um valor de maior confiança em relação à postura real do robô. Uma dificuldade é que os dados sensoriais são corrompidos por erros de medidas derivados de diversas fontes, como ruídos, quantização, dispositivos de digitalização, deslizamentos do robô, entre outras. Sensores distintos medem diferentes propriedades físicas, corrompidas por diversos erros de medida. O uso de dados oriundos de vários sensores fornece informação redundante e complementar, que pode ser processada para derivar uma estimativa combinada com o objetivo de aumentar a confiança na estimativa final da postura. Nesta dissertação é proposto ELViS, um sistema que estima a localização de um robô móvel equipado com odômetros, uma câmera de vídeo e um semi-anel frontal de 8 sonares, o qual opera, com sucesso, em um ambiente interno, estruturado e estático. Assume-se que o robô navega sobre uma superfície plana e que diversos segmentos de retas possam ser identificados nas imagens do ambiente. Para aumentar a seletividade dos marcos visuais e diminuir a complexidade computacional no processamento e correspondência dos dados com os modelos, elementos do ambiente são representados por modelos minimalistas, possibilitando o uso do ELViS em um grande número de aplicações onde o custo ou tempo de execução sejam fatores limitantes. ELViS foi implementado e testado utilizando dois estimadores baseados em Filtro de Kalman. Os resultados, obtidos com robôs reais e em simulações, indicam direções bastante promissoras.
A key component of a mobile robot system is the ability to localize itself accurately, which involves estimating its pose with respect to some global representation of space. The general specification of a sensor-based localization approach starts with an initial estimate of the robot\'s pose and uses sensor data in conjunction with a map to produce a refined pose estimate that has an increased confidence about the true pose of the robot. One of the main difficulties is that sensor data is corrupted by measurement errors. These errors can arise from noise, quantization, digitalization artifacts, wheel slippage, and other such sources. Different sensors measure different physical properties, which are corrupted by different sources of measurement errors. The use of data from multiple sensors provides redundant and complementary information that can be processed to obtain a combined estimate aiming at an increase in the confidence of the final pose estimate. In this work we propose ELViS, a system that estimates the localization of a mobile robot equipped with odometers, a video camera and a frontal semi-ring of 8 sonar sensors, and that operates successfully in stationary and structured indoor environments. It is assumed that the robot navigates on flat surfaces and that straight lines can be identified in the environment image acquired by the camera. To increase selectivity of the landmarks and reduce computational complexity in data processing and matching to the map, environment features are represented using minimalist models in the map. This allows the use of ELViS in a large number of applications where tight budget or execution time constraints exist. ELViS has been implemented and tested using two estimators based on the Kalman Filter. The results, obtained with the real robots and in series of simulation runs, indicate promising directions.
APA, Harvard, Vancouver, ISO, and other styles
47

Aguilar-Gonzalez, Abiel. "Monocular-SLAM dense mapping algorithm and hardware architecture for FPGA acceleration." Thesis, Université Clermont Auvergne‎ (2017-2020), 2019. http://www.theses.fr/2019CLFAC055.

Full text
Abstract:
La localisation et la cartographie simultanées (SLAM) consiste à construire une carte 3D tout en situant le ou les capteurs (ayant servi au SLAM) dans cette carte. Ces dernières années, le travail s'est focalisé sur des systèmes utilisant une seule caméra mobile comme moyen de perception (monoculaire-SLAM). Ce choix a été motivé par le fait qu'il est aujourd'hui possible de trouver des caméras commerciales peu coûteuses, plus petites et plus légères que les autres capteurs utilisés auparavant. De plus ces caméras fournissent des informations environnementales visuelles qui peuvent être exploitées pour créer des cartes 3D complexes tandis que les poses des caméras peuvent être estimées simultanément. Malheureusement, les systèmes monoculaires SLAM sont basés sur des techniques d'optimisation qui limitent les performances des applications embarquées en temps réel. Pour résoudre ce problème, nous proposons dans ce travail une nouvelle formulation SLAM monoculaire basée sur l'hypothèse qu'il est possible d'atteindre une haute efficacité pour les applications embarquées, en augmentant la densité de la carte des nuages de points (et donc la densité de la carte 3D et le positionnement et la cartographie globale) et en reformulant le processus de suivi des caractéristiques/rappariement des fonctionnalités pour obtenir de hautes performances pour les architectures matérielles embarquées, comme le FPGA ou CUDA. Afin d'augmenter la densité de la carte des nuages de points, nous proposons de nouveaux algorithmes pour le suivi et la mise en correspondance de primitives ainsi que des algorithmes de calcul profondeur à partir du mouvement pouvant se ramener à une extension d'un problème de mise en correspondance stéréo. Ensuite, deux architectures matérielles différentes (basées respectivement sur FPGA et CUDA) entièrement compatibles avec les contraintes embarquées temps réel sont proposées. Les résultats expérimentaux montrent qu'il est possible d'obtenir des estimations précises de la pose de la caméra. Par rapport aux systèmes monoculaires de l'état de l'art, nous occupons la 5ème place dans la suite de benchmarks KITTI, avec un score supérieur à celui de l'année dernière (nous sommes l'algorithme le plus rapide du benchmark) et une densité du nuage de points dix fois plus élevée que les approches précédentes
Simultaneous Localization and Mapping (SLAM) is the problem of constructing a 3D map while simultaneously keeping track of an agent location within the map. In recent years, work has focused on systems that use a single moving camera as the only sensing mechanism (monocular-SLAM). This choice was motivated because nowadays, it is possible to find inexpensive commercial cameras, smaller and lighter than other sensors previously used and, they provide visual environmental information that can be exploited to create complex 3D maps while camera poses can be simultaneously estimated. Unfortunately, previous monocular-SLAM systems are based on optimization techniques that limits the performance for real-time embedded applications. To solve this problem, in this work, we propose a new monocular SLAM formulation based on the hypothesis that it is possible to reach high efficiency for embedded applications, increasing the density of the point cloud map (and therefore, the 3D map density and the overall positioning and mapping) by reformulating the feature-tracking/feature-matching process to achieve high performance for embedded hardware architectures, such as FPGA or CUDA. In order to increase the point cloud map density, we propose new feature-tracking/feature-matching and depth-from-motion algorithms that consists of extensions of the stereo matching problem. Then, two different hardware architectures (based on FPGA and CUDA, respectively) fully compliant for real-time embedded applications are presented. Experimental results show that it is possible to obtain accurate camera pose estimations. Compared to previous monocular systems, we are ranked as the 5th place in the KITTI benchmark suite, with a higher processing speed (we are the fastest algorithm in the benchmark) and more than x10 the density of the point cloud from previous approaches
APA, Harvard, Vancouver, ISO, and other styles
48

Spencer, Justina. "Peeping in, peering out : monocularity and early modern vision." Thesis, University of Oxford, 2014. https://ora.ox.ac.uk/objects/uuid:b8854565-ce57-4c83-9cdb-64249d171142.

Full text
Abstract:
One of the central theoretical tenets of linear perspective is that it is based upon the idea of a monocular observer. Our lived perception, also referred to in the Renaissance as perspectiva naturalis, is always rooted in binocular vision, however, the guidelines for perspectiva artificialis often imply a single peeping eye as a starting point. In the early modern period, a number of rare art forms and instruments follow the prescriptive character of linear perspective to ludic ends. By focusing on this special class of what I would call 'monocular art forms', I will analyse the extent to which the perspectival method has been successfully applied in material form beyond the classic two-dimensional paintings. This special class of objects include: anamorphosis, peep-boxes, catoptrics, dioptric perspective tubes, and perspective instruments. It is my intention to draw attention to the different ways traditional perspectival paintings, exceptional cases such as perspective boxes and anamorphoses, and optical devices were encountered in the early modern period. In this thesis I will be examining the specific sites of each case study in depth so as to describe the various contexts - aristocratic, intellectual, religious - in which these items circulated. In Chapter 1 I illustrate a special class of perspective and anamorphic designs that confined their illusions to a peepshow. Chapter 2 examines one of the most consummate applications of the monocular principle of perspective: seventeenth-century Dutch perspective boxes. In Chapter 3, monocular catoptric designs are studied in light of the vogue for mirror cabinets in the seventeenth century. Chapter 4 examines the innovative techniques of drawing machines and their collection in early modern courts through close study of the 'perspectograph'.
APA, Harvard, Vancouver, ISO, and other styles
49

Kalghatgi, Roshan Satish. "Reconstruction techniques for fixed 3-D lines and fixed 3-D points using the relative pose of one or two cameras." Thesis, Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/43590.

Full text
Abstract:
In general, stereovision can be defined as a two part problem. The first is the correspondence problem. This involves determining the image point in each image of a set of images that correspond to the same physical point P. We will call this set of image points, N. The second problem is the reconstruction problem. Once a set of image points, N, that correspond to point P has been determined, N is then used to extract three dimensional information about point P. This master's thesis presents three novel solutions to the reconstruction problem. Two of the techniques presented are for detecting the location of a 3-D point and one for detecting a line expressed in a three dimensional coordinate system. These techniques are tested and validated using a unique 3-D finger detection algorithm. The techniques presented are unique because of their simplicity and because they do not require the cameras to be placed in specific locations, orientations or have specific alignments. On the contrary, it will be shown that the techniques presented in this thesis allow the two cameras used to assume almost any relative pose provided that the object of interest is within their field of view. The relative pose of the cameras at a given instant in time, along with basic equations from the perspective image model are used to form a system of equations that when solved, reveal the 3-D coordinates of a particular fixed point of interest or the three dimensional equation of a fixed line of interest. Finally, it will be shown that a single moving camera can successfully perform the same line and point detection accomplished by two cameras by altering the pose of the camera. The results presented in this work are beneficial to any typical stereovision application because of the computational ease in comparison to other point and line reconstruction techniques. But more importantly, this work allows for a single moving camera to perceive three-dimensional position information, which effectively removes the two camera constraint for a stereo vision system. When used with other monocular cues such as texture or color, the work presented in this thesis could be as accurate as binocular stereo vision at interpreting three dimensional information. Thus, this work could potentially increase the three dimensional perception of a robot that normally uses one camera, such as an eye-in-hand robot or a snake like robot.
APA, Harvard, Vancouver, ISO, and other styles
50

Almanza-Ojeda, Dora Luz. "Détection et suivi d'objets mobiles perçus depuis un capteur visuel embarqué." Phd thesis, Université Paul Sabatier - Toulouse III, 2011. http://tel.archives-ouvertes.fr/tel-01017785.

Full text
Abstract:
Cette thèse traite de la détection et du suivi d'objets mobiles dans un environnement dynamique, en utilisant une caméra embarquée sur un robot mobile. Ce sujet représente encore un défi important car on exploite uniquement la vision mono-caméra pour le résoudre. Nous devons détecter les objets mobiles dans la scène par une analyse de leurs déplacements apparents dans les images, en excluant le mouvement propre de la caméra. Dans une première étape, nous proposons une analyse spatio-temporelle de la séquence d'images, sur la base du flot optique épars. La méthode de clustering a contrario permet le groupement des points dynamiques, sans information a priori sur le nombre de groupes à former et sans réglage de paramètres. La réussite de cette méthode réside dans une accumulation suffisante des données pour bien caractériser la position et la vitesse des points. Nous appelons temps de pistage, le temps nécessaire pour acquérir les images analysées pour bien caractériser les points. Nous avons développé une carte probabiliste afin de trouver les zones dans l'image qui ont les probabilités la plus grandes de contenir un objet mobile. Cette carte permet la sélection active de nouveaux points près des régions détectées précédemment en permettant d'élargir la taille de ces régions. Dans la deuxième étape nous mettons en oeuvre une approche itérative pour exécuter détection, clustering et suivi sur des séquences d'images acquises depuis une caméra fixe en intérieur et en extérieur. Un objet est représenté par un contour actif qui est mis à jour de sorte que le modèle initial reste à l'intérieur du contour. Finalement nous présentons des résultats expérimentaux sur des images acquises depuis une caméra embarquée sur un robot mobile se déplaçant dans un environnement extérieur avec des objets mobiles rigides et nonrigides. Nous montrons que la méthode est utilisable pour détecter des obstacles pendant la navigation dans un environnement inconnu a priori, d'abord pour des faibles vitesses, puis pour des vitesses plus réalistes après compensation du mouvement propre du robot dans les images.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography