Log in

Relevant bibliographies by topics / RGBZ / Dissertations / Theses

To see the other types of publications on this topic, follow the link: RGBZ.

Dissertations / Theses on the topic 'RGBZ'

Author: Grafiati

Published: 4 June 2021

Last updated: 11 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'RGBZ.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Can, Chi. "Compact and efficient method of RGB to RGBW data conversion for OLED microdisplays." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/9512.

Full text

Abstract:

Colour Electronic Information Displays (EIDs) typically consist of pixels that are made up of red, green and blue (RGB) subpixels. A recent technology, Organic Light Emitting Diode (OLED), offers the potential to create a superior EID. OLED is already suitable for use in small displays and microdisplays for personal electronics products. OLED microdisplays, in particular, exhibit lower power consumption than equivalent direct-view panels thus enabling microdisplay-based personal display systems such as electronic viewfinders and video glasses to exhibit the longest possible battery life. In many EIDs, the light source is white and colour filters are used, at the expense of much absorbed light, to create the RGB light in the subpixels. Hence, the concept has recently emerged of adding a white (W) subpixel to form an RGBW pixel. The advantages can include lower power, higher luminance, and in the case of emissive displays, longer lifetime. One key to realizing the improved performance of RGBW EIDs is a suitable method of data conversion from standard RGB input signal formats to RGBW output signal formats. An OLED microdisplay built on Complementary Metal–Oxide–Semiconductor (CMOS) active matrix back-plane exhibits low power consumption. This device architecture also gives the OLED microdisplay the potential to realize the concept of low-power Display System on a Chip (DSoC). In realizing the performance potential of DSoC on an RGBW OLED microdisplay, there is a trade-off between system resources used to perform the data conversion and the image quality achieved. A compact and efficient method of RGB-to-RGBW data conversion is introduced to fit the requirement of “minimum system resources with indistinguishable visual side-effect” that is appropriate for an OLED microdisplay. In this context, the terms “Compact” and “Efficient” mean that the data conversion functionality (i) is capable of insertion into the signal path, (ii) is capable of integration on the OLED microdisplay back-plane, i.e., is small and (iii) consumes minimal power. The image quality produced by the algorithm is first simulated on a software platform, followed by an optical analysis of the output of the algorithm implemented on a real time hardware platform. The optical analysis shows good preservation of colour fidelity in the image on the microdisplay so that the proposed RGB-to-RGBW data conversion algorithm delivers sufficiently high image quality whilst remaining compact and efficient to meet the development requirements of the RGBW OLED microdisplay with DSoC approach.

APA, Harvard, Vancouver, ISO, and other styles

2

Melbouci, Kathia. "Contributions au RGBD-SLAM." Thesis, Université Clermont Auvergne‎ (2017-2020), 2017. http://www.theses.fr/2017CLFAC006/document.

Full text

Abstract:

Pour assurer la navigation autonome d’un robot mobile, les traitements effectués pour sa localisation doivent être faits en ligne et doivent garantir une précision suffisante pour permettre au robot d’effectuer des tâches de haut niveau pour la navigation et l’évitement d’obstacles. Les auteurs de travaux basés sur le SLAM visuel (Simultaneous Localization And Mapping) tentent depuis quelques années de garantir le meilleur compromis rapidité/précision. La majorité des solutions SLAM visuel existantes sont basées sur une représentation éparse de l’environnement. En suivant des primitives visuelles sur plusieurs images, il est possible d’estimer la position 3D de ces primitives ainsi que les poses de la caméra. La communauté du SLAM visuel a concentré ses efforts sur l’augmentation du nombre de primitives visuelles suivies et sur l’ajustement de la carte 3D, afin d’améliorer l’estimation de la trajectoire de la caméra et les positions 3D des primitives. Cependant, la localisation par SLAM visuel présente souvent des dérives dues au cumul d’erreurs, et dans le cas du SLAM visuel monoculaire, la position de la caméra n’est connue qu’à un facteur d’échelle près. Ce dernier peut être fixé initialement mais dérive au cours du temps. Pour faire face à ces limitations, nous avons centré nos travaux de thèse sur la problématique suivante : intégrer des informations supplémentaires dans un algorithme de SLAM visuel monoculaire afin de mieux contraindre la trajectoire de la caméra et la reconstruction 3D. Ces contraintes ne doivent pas détériorer les performances calculatoires de l’algorithme initial et leur absence ne doit pas mettre l’algorithme en échec. C’est pour cela que nous avons choisi d’intégrer l’information de profondeur fournie par un capteur 3D (e.g. Microsoft Kinect) et des informations géométriques sur la structure de la scène. La première contribution de cette thèse est de modifier l’algorithme SLAM visuel monoculaire proposé par Mouragnon et al. (2006b) pour prendre en compte la mesure de profondeur fournie par un capteur 3D, en proposant particulièrement un ajustement de faisceaux qui combine, d’une manière simple, des informations visuelles et des informations de profondeur. La deuxième contribution est de proposer une nouvelle fonction de coût du même ajustement de faisceaux qui intègre, en plus des contraintes sur les profondeurs des points, des contraintes géométriques d’appartenance aux plans de la scène. Les solutions proposées ont été validées sur des séquences de synthèse et sur des séquences réelles, représentant des environnements variés. Ces solutions ont été comparées aux récentes méthodes de l’état de l’art. Les résultats obtenus montrent que les différentes contraintes développées permettent d’améliorer significativement la précision de la localisation du SLAM. De plus les solutions proposées sont faciles à déployer et peu couteuses en temps de calcul<br>To guarantee autonomous and safely navigation for a mobile robot, the processing achieved for its localization must be fast and accurate enough to enable the robot to perform high-level tasks for navigation and obstacle avoidance. The authors of Simultaneous Localization And Mapping (SLAM) based works, are trying since year, to ensure the speed/accuracy trade-off. Most existing works in the field of monocular (SLAM) has largely centered around sparse feature-based representations of the environment. By tracking salient image points across many frames of video, both the positions of the features and the motion of the camera can be inferred live. Within the visual SLAM community, there has been a focus on both increasing the number of features that can be tracked across an image and efficiently managing and adjusting this map of features in order to improve camera trajectory and feature location accuracy. However, visual SLAM suffers from some limitations. Indeed, with a single camera and without any assumptions or prior knowledge about the camera environment, rotation can be retrieved, but the translation is up to scale. Furthermore, visual monocular SLAM is an incremental process prone to small drifts in both pose measurement and scale, which when integrated over time, become increasingly significant over large distances. To cope with these limitations, we have centered our work around the following issues : integrate additional information into an existing monocular visual SLAM system, in order to constrain the camera localization and the mapping points. Provided that the high speed of the initial SLAM process is kept and the lack of these added constraints should not give rise to the failure of the process. For these last reasons, we have chosen to integrate the depth information provided by a 3D sensor (e.g. Microsoft Kinect) and geometric information about scene structure. The primary contribution of this work consists of modifying the SLAM algorithm proposed by Mouragnon et al. (2006b) to take into account the depth measurement provided by a 3D sensor. This consists of several rather straightforward changes, but also on a way to combine the depth and visual data in the bundle adjustment process. The second contribution is to propose a solution that uses, in addition to the depth and visual data, the constraints lying on points belonging to the plans of the scene. The proposed solutions have been validated on a synthetic sequences as well as on a real sequences, which depict various environments. These solutions have been compared to the state of art methods. The performances obtained with the previous solutions demonstrate that the additional constraints developed, improves significantly the accuracy and the robustness of the SLAM localization. Furthermore, these solutions are easy to roll out and not much time consuming

APA, Harvard, Vancouver, ISO, and other styles

3

Quiroga, Sepúlveda Julián. "Scene Flow Estimation from RGBD Images." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM057/document.

Full text

Abstract:

Cette thèse aborde le problème du calcul de manière fiable d'un champ de mouvement 3D, appelé flot de scène, à partir d'une paire d'images RGBD prises à des instants différents. Nous proposons un schéma d'estimation semi-rigide pour le calcul robuste du flot de scène, en prenant compte de l'information de couleur et de profondeur, et un cadre de minimisation alternée variationnelle pour récupérer les composantes rigides et non rigides du champ de mouvement 3D. Les tentatives précédentes pour estimer le flot de scène à partir des images RGBD étaient des extensions des approches de flux optique, et n'exploitaient pas totalement les données de profondeur, ou bien elles formulaient l'estimation dans l'espace 3D sans tenir compte de la semi-rigidité des scènes réelles. Nous démontrons que le flot de scène peut ^etre calculé de manière robuste et précise dans le domaine de l'image en reconstruisant un mouvement 3D cohérent avec la couleur et la profondeur, en encourageant une combinaison réglable entre rigidité locale et par morceaux. En outre, nous montrons que le calcul du champ de mouvement 3D peut être considéré comme un cas particulier d'un problème d'estimation plus général d'un champ de mouvements rigides à 6 dimensions. L'estimation du flot de scène est donc formulée comme la recherche d'un champ optimal de mouvements rigides. Nous montrons finalement que notre méthode permet d'obtenir des résultats comparables à l'état de l'art<br>This thesis addresses the problem of reliably recovering a 3D motion field, or scene flow, from a temporal pair of RGBD images. We propose a semi-rigid estimation framework for the robust computation of scene flow, taking advantage of color and depth information, and an alternating variational minimization framework for recovering rigid and non-rigid components of the 3D motion field. Previous attempts to estimate scene flow from RGBD images have extended optical flow approaches without fully exploiting depth data or have formulated the estimation in 3D space disregarding the semi-rigidity of real scenes. We demonstrate that scene flow can be robustly and accurately computed in the image domain by solving for 3D motions consistent with color and depth, encouraging an adjustable combination between local and piecewise rigidity. Additionally, we show that solving for the 3D motion field can be seen as a specific case of a more general estimation problem of a 6D field of rigid motions. Accordingly, we formulate scene flow estimation as the search of an optimal field of twist motions achieving state-of-the-art results.STAR

APA, Harvard, Vancouver, ISO, and other styles

4

Amamra, A. "Robust 3D registration and tracking with RGBD sensors." Thesis, Cranfield University, 2015. http://dspace.lib.cranfield.ac.uk/handle/1826/9291.

Full text

Abstract:

This thesis investigates the utilisation of cheap RGBD sensors in rigid body tracking and 3D multiview registration for augmented and Virtual reality applications. RGBD sensors can be used as an affordable substitute for the more sophisticated, but expensive, conventional laser-based scanning and tracking solutions. Nevertheless, the low-cost sensing technology behind them has several drawbacks such as the limited range, significant noisiness and instability. To deal with these issues, an innovative adaptation of Kalman filtering scheme is first proposed to improve the precision, smoothness and robustness of raw RGBD outputs. It also extends the native capabilities of the sensor to capture further targets. The mathematical foundations of such an adaptation are explained in detail, and its corrective effect is validated with real tracking as well as 3D reconstruction experiments. A Graphics Processing Unit (GPU) implementation is also proposed with the different optimisation levels in order to ensure real-time responsiveness. After extensive experimentation with RGBD cameras, a significant difference in accuracy was noticed between the newer and ageing sensors. This decay could not be restored with conventional calibration. Thus, a novel method for worn RGBD sensors correction is also proposed. Another algorithm for background/foreground segmentation of RGBD images is contributed. The latter proceeds through background subtraction from colour and depth images separately, the resulting foreground regions are then fused for a more robust detection. The three previous contributions are used in a novel approach for multiview vehicle tracking for mixed reality needs. The determination of the position regarding the vehicle is achieved in two stages: the former is a sensor-wise robust filtering algorithm that is able to handle the uncertainties in the system and measurement models resulting in multiple position estimates; the latter algorithm aims at merging the independent estimates by using a set of optimal weighting coefficients. The outcome of fusion is used to determine vehicle’s orientation in the scene. Finally, a novel recursive filtering approach for sparse registration is proposed. Unlike ordinary state of the art alignment algorithms, the proposed method has four advantages that are not available altogether in any previous solution. It is able to deal with inherent noise contaminating sensory data; it is robust to uncertainties related to feature localisation; it combines the advantages of both L2 , L (infinity) norms for a higher performance and prevention of local minima; it also provides an estimated rigid body transformation along with its error covariance. This 3D registration scheme is validated in various challenging scenarios with both synthetic and real RGBD data.

APA, Harvard, Vancouver, ISO, and other styles

5

Coufal, Miroslav. "Modulární RGB LED displej." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-220135.

Full text

Abstract:

The aim of this master’s thesis was the design RGB LED display with Ethernet interface. I created a display module, controlled by a microcontroller ATmega 2560-16AU. These modules can be connected via a serial RS485 standard. Ethernet connection is made via plug-in interface that uses the programmable module Rabbit RCM 3200. I documented a proposal. I tested designed device.

APA, Harvard, Vancouver, ISO, and other styles

6

Svensson, Niclas. "Structure from Motion with Unstructured RGBD Data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302553.

Full text

Abstract:

This thesis covers the topic of depth- assisted Structure from Motion (SfM). When performing classic SfM, the goal is to reconstruct a 3D scene using only a set of unstructured RGB images. What is attempted to be achieved in this thesis is adding the depth dimension to the problem formulation, and consequently create a system that can receive a set of RGBD images. The problem has been addressed by modifying an already existing SfM pipeline and in particular, its Bundle Adjustment (BA) process. Comparisons between the modified framework and the baseline framework resulted in conclusions regarding the impact of the modifications. The results show mainly two things. First of all, the accuracy of the framework is increased in most situations. The difference is the most significant when the captured scene only is covered from a small sector. However, noisy data can cause the modified pipeline to decrease in performance. Secondly, the run time of the framework is significantly reduced. A discussion of how to modify other parts of the pipeline is covered in the conclusion of the report.<br>Följande examensarbete behandlar ämnet djupassisterad Struktur genom Rörelse (eng. SfM). Vid klassisk SfM är målet att återskapa en 3D scen, endast med hjälp av en sekvens av oordnade RGB bilder. I djupassiterad SfM adderas djupinformationen till problemformulering och följaktligen har ett system som kan motta RGBD bilder skapats. Problemet har lösts genom att modifiera en befintlig SfM- mjukvara och mer specifikt dess Buntjustering (eng. BA). Resultatet från den modifierade mjukvaran jämförs med resultatet av originalutgåvan för att dra slutsatser rådande modifikationens påverkan på prestandan. Resultaten visar huvudsakligen två saker. Först och främst, den modifierade mjukvaran producerar resultat med högre noggrannhet i de allra flesta fall. Skillnaden är som allra störst när bilderna är tagna från endast en liten sektor som omringar scenen. Data med brus kan dock försämra systemets prestanda aningen jämfört med orginalsystemet. För det andra, så minskar exekutionstiden betydligt. Slutligen diskuteras hur mjukvaran kan vidareutvecklas för att ytterligare förbättra resultaten.

APA, Harvard, Vancouver, ISO, and other styles

7

Coen, Paul Dixon. "Human Activity Recognition and Prediction using RGBD Data." OpenSIUC, 2019. https://opensiuc.lib.siu.edu/theses/2562.

Full text

Abstract:

Being able to predict and recognize human activities is an essential element for us to effectively communicate with other humans during our day to day activities. A system that is able to do this has a number of appealing applications, from assistive robotics to health care and preventative medicine. Previous work in supervised video-based human activity prediction and detection fails to capture the richness of spatiotemporal data that these activities generate. Convolutional Long short-term memory (Convolutional LSTM) networks are a useful tool in analyzing this type of data, showing good results in many other areas. This thesis’ focus is on utilizing RGB-D Data to improve human activity prediction and recognition. A modified Convolutional LSTM network is introduced to do so. Experiments are performed on the network and are compared to other models in-use as well as the current state-of-the-art system. We show that our proposed model for human activity prediction and recognition outperforms the current state-of-the-art models in the CAD-120 dataset without giving bounding frames or ground-truths about objects.

APA, Harvard, Vancouver, ISO, and other styles

8

Vríčan, Peter. "Světelné efekty pomocí RGB budiče." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-219969.

Full text

Abstract:

The purpose of this thesis is implementation of RGB LED driver using circuit ON Semiconductor NCV7430. The main objective was to design a circuit solution for temperature compensation of the driver. The thesis aims to describe the driver and its functions and to eliminate thermal effects caused by heating the circuit by surroundings. It discusses the circuit thermal stabilization in the temperature range of -40 to 80 °C to the RGB diode lights to a constant color. The thesis presents various possibilities of the stabilization and method of evaluating the obtained parameters of applications. Next, the thesis solves the design of equipment for implementation of light effects. The equipment presents options and features of the RGB driver.

APA, Harvard, Vancouver, ISO, and other styles

9

Möckelind, Christoffer. "Improving deep monocular depth predictions using dense narrow field of view depth images." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235660.

Full text

Abstract:

In this work we study a depth prediction problem where we provide a narrow field of view depth image and a wide field of view RGB image to a deep network tasked with predicting the depth for the entire RGB image. We show that by providing a narrow field of view depth image, we improve results for the area outside the provided depth compared to an earlier approach only utilizing a single RGB image for depth prediction. We also show that larger depth maps provide a greater advantage than smaller ones and that the accuracy of the model decreases with the distance from the provided depth. Further, we investigate several architectures as well as study the effect of adding noise and lowering the resolution of the provided depth image. Our results show that models provided low resolution noisy data performs on par with the models provided unaltered depth.<br>I det här arbetet studerar vi ett djupapproximationsproblem där vi tillhandahåller en djupbild med smal synvinkel och en RGB-bild med bred synvinkel till ett djupt nätverk med uppgift att förutsäga djupet för hela RGB-bilden. Vi visar att genom att ge djupbilden till nätverket förbättras resultatet för området utanför det tillhandahållna djupet jämfört med en existerande metod som använder en RGB-bild för att förutsäga djupet. Vi undersöker flera arkitekturer och storlekar på djupbildssynfält och studerar effekten av att lägga till brus och sänka upplösningen på djupbilden. Vi visar att större synfält för djupbilden ger en större fördel och även att modellens noggrannhet minskar med avståndet från det angivna djupet. Våra resultat visar också att modellerna som använde sig av det brusiga lågupplösta djupet presterade på samma nivå som de modeller som använde sig av det omodifierade djupet.

APA, Harvard, Vancouver, ISO, and other styles

10

El, Ahmar Wassim. "Head and Shoulder Detection using CNN and RGBD Data." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39448.

Full text

Abstract:

Alex Krizhevsky and his colleagues changed the world of machine vision and image processing in 2012 when their deep learning model, named Alexnet, won the Im- ageNet Large Scale Visual Recognition Challenge with more than 10.8% lower error rate than their closest competitor. Ever since, deep learning approaches have been an area of extensive research for the tasks of object detection, classification, pose esti- mation, etc...This thesis presents a comprehensive analysis of different deep learning models and architectures that have delivered state of the art performances in various machine vision tasks. These models are compared to each other and their strengths and weaknesses are highlighted. We introduce a new approach for human head and shoulder detection from RGB- D data based on a combination of image processing and deep learning approaches. Candidate head-top locations(CHL) are generated from a fast and accurate image processing algorithm that operates on depth data. We propose enhancements to the CHL algorithm making it three times faster. Different deep learning models are then evaluated for the tasks of classification and detection on the candidate head-top loca- tions to regress the head bounding boxes and detect shoulder keypoints. We propose 3 different small models based on convolutional neural networks for this problem. Experimental results for different architectures of our model are highlighted. We also compare the performance of our model to mobilenet. Finally, we show the differences between using 3 types of inputs CNN models: RGB images, a 3-channel representation generated from depth data (Depth map, Multi-order depth template, and Height difference map or DMH), and a 4 channel input composed of RGB+D data.

APA, Harvard, Vancouver, ISO, and other styles

11

Li, Zhaoyang. "Monitoring urban sprawl using RGB images." Thesis, Högskolan i Gävle, Akademin för teknik och miljö, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-9276.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

SOARES, ANA CRISTINA COSME. "RGB PHOTOELASTICITY APPLIED TO GLASS COMPONENTS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2000. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=2812@1.

Full text

Abstract:

CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO<br>A fotoelasticidade é uma ferramenta poderosa para análise de tensões em problemas bi e tridimensionais. Nos últimos anos a técnica vem ganhando renovado interesse da comunidade científica devido à adaptação de modernas técnicas de aquisição e processamento digital de imagens. Uma das linhas mais promissoras é a Fotoelasticidade RGB, que entende a cor como uma combinação única dos níveis de intensidade das componentes Red, Green e Blue. Tratase de uma metodologia poderosa, porém de fácil compreensão e aplicação em ambientes industriais. Uma aplicação tradicional da fotoelasticidade é na inspeção de tensões residuais em componentes de vidro, utilizando-se a Norma ASTM F218. Esta Norma estabelece dois procedimentos para a determinação da ordem de franja isocromática: um quantitativo e outro qualitativo. O vidro tem birrefringência muito baixa, o que faz com que a resposta ótica seja basicamente em tons de cinza. No seu procedimento qualitativo a Norma ASTM F218 recomenda a utilização de um filtro retardador de onda inteira. Com esse simples procedimento, a resposta ótica se modifica. No lugar de tons de cinza tem-se cores em torno da passagem do vermelho para o azul, o que simplifica muito a análise de birrefringência por um operador. A aplicação da Fotoelasticidade RGB para componentes de vidro, transforma este método qualitativo em quantitativo: a cor deixa de ser um parâmetro abstrato, e passa a ser um número. Este trabalho analisa os procedimentos necessários para aplicar a Fotoelasticidade RGB à indústria do vidro. A Norma ASTM F218 recomenda que o filtro de onda inteira, seja inserido com sua direção principal alinhada à direção principal de cada ponto analisado. Foi pesquisada a diferença entre os resultados obtidos, quando a norma é seguida, e quando é utilizado um procedimento simplificado, ou seja, é utilizada apenas uma posição do filtro para analisar todos os pontos. Além disso, o método foi aplicado a dois casos: um esteme e um bulbo de lâmpada. Nos dois casos, foram analisados um componente considerado bom e outro considerado ruim, por seu fabricante, procurando mostrar seus estados de tensões residuais através da Fotoelasticidade RGB.<br>Photoelasticity is a powerful tool for analyzing stress in bidimensional and three-dimensional problems. In the last years the technique renewed the scientific community s interest due to adaptation of modern acquisition and image digital processing techniques. One of the most promising field is called RGB Photoelasticity. It quantifies the color as an unique combination of the intensity levels of Red, Green and Blue components. It is a powerful methodology,even so of easy understanding and application in industrial enviroment.A traditional application of the photoelasticity is in the inspection of residual stresses in glass components, using the specification ASTM F218. That specification establishes two procedures for the determination of the isocromatics fringe order: one of them quantitative and anotehr one qualitative. The glass has very low birefringence, so that its optic response is basically in gray tones. In the qualitative procedure the specification F218 recommends the use of a full - wave plate. With this simple procedure the optical answer is changed. It is possible to obtain colors near the passage of the red to blue, what simplifies the analysis too much. The application of RGB photoelasticity to glass transforms the qualitative method in quantitative: the color is not more an abstract parameter, but a number. This dissertation analyzes the necessary procedures to apply RGB photoelasticity to the glass industry. The specification F218 recommends that the full wave plate must be inserted with its principal direction aligned to the principal direction of each analyzed point. A implification to that procedure was proposed: to use only one position of the filter to analyze all the points. The difference among the two procedure was researched; the results obtained following the specification and using the simplified procedure were compared. The method has been applied to two cases: a stem and a lamp bulb. In the both cases, a component in the acceptable condition and another in the non-acceptable condition were analyzed, in order to show their states of residual stresses using the RGB photoelasticity.

APA, Harvard, Vancouver, ISO, and other styles

13

Cai, Ziyun. "Feature learning for RGB-D data." Thesis, University of Sheffield, 2017. http://etheses.whiterose.ac.uk/18370/.

Full text

Abstract:

RGB-D data has turned out to be a very useful representation for solving fundamental computer vision problems. It takes the advantages of the color images that provide appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. RGB-D image/video can facilitate a wide range of application areas, such as computer vision, robotics, construction and medical imaging. Furthermore, how to fuse RGB information and depth information is still a problem in computer vision. It is not enough to simply concatenate RGB data and depth data together. A new fusion method could better fuse RGB images and depth images. It still needs more powerful algorithms on this. In this thesis, to explore more advantages of RGB-D data, we use some popular RGB-D datasets for deep feature learning algorithms evaluation, hyper-parameter optimization, local multi-modal feature learning, RGB-D data fusion and recognizing RGB information from RGB-D images: i)With the success of Deep Neural Network in computer vision, deep features from fused RGB-D data can be proved to gain better results than RGB data only. However, different deep learning algorithms show different performance on different RGB-D datasets. Through large-scale experiments to comprehensively evaluate the performance of deep feature learning models for RGB-D image/ video classification, we obtain the conclusion that RGB-D fusion methods using CNNs always outperform other selected methods (DBNs, SDAE and LSTM). On the other side, since LSTM can learn from experience to classify, process and predict time series, it achieved better performances than DBN and SDAE in video classification tasks. ii) Hyper-parameter optimization can help researchers quickly choose an initial set of hyper-parameters for a new coming classification task, thus reducing the number of trials in terms of hyper-parameter space. We present a simple and efficient framework for improving the efficiency and accuracy of hyper-parameter optimization by considering the classification complexity of a particular dataset. We verify this framework on three real-world RGB-D datasets. After the analysis of experiments, we confirm that our framework can provide deeper insights into the relationship between dataset classification tasks and hyperparameters optimization, thus quickly choosing an accurate initial set of hyper-parameters for a new coming classification task. iii) We propose a new Convolutional Neural Networks (CNNs)-based local multi-modal feature learning framework for RGB-D scene classification. This method can effectively capture much of the local structure from the RGB-D scene images and automatically learn a fusion strategy for the object-level recognition step instead of simply training a classifier on top of features extracted from both modalities. Experiments are conducted on two popular datasets to thoroughly test the performance of our method, which show that our method with local multi-modal CNNs greatly outperforms state-of-the-art approaches. Our method has the potential to improve RGB-D scene understanding. Some extended evaluation shows that CNNs trained using a scene-centric dataset is able to achieve an improvement on scene benchmarks compared to a network trained using an object-centric dataset. iv) We propose a novel method for RGB-D data fusion. We project raw RGB-D data into a complex space and then jointly extract features from the fused RGB-D images. Besides three observations about the fusion methods, the experimental results also show that our method achieves competing performance against the classical SIFT. v) We propose a novel method called adaptive Visual-Depth Embedding (aVDE) which learns the compact shared latent space between two representations of labeled RGB and depth modalities in the source domain first. Then the shared latent space can help the transfer of the depth information to the unlabeled target dataset. At last, aVDE matches features and reweights instances jointly across the shared latent space and the projected target domain for an adaptive classifier. This method can utilize the additional depth information in the source domain and simultaneously reduce the domain mismatch between the source and target domains. On two real-world image datasets, the experimental results illustrate that the proposed method significantly outperforms the state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

14

Madeja, Jiří. "Vývoj RGB kamery s vysokým rozlišením." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-319299.

Full text

Abstract:

Tato práce se zabývá výběrem vhodného obrazového snímače pro použití v kameře snímající rostliny ve vysokém rozlišení a návrhem vhodného obvodu pro propojení vybraného snímače (SONY IMX253) s vývojovou deskou Avnet MicroZed. Tato práce pojednává o jednotlivých parametrech obrazových snímačů podle kterých je vybírán vhodný obrazový snímač. Je vysvětlen proces výběru vhodného obrazového snímače a podrobněji popsány parametry vybraného snímače. Je naznačena problematika návrhu elektroniky a plošných spojů z hlediska požadavků vysokorychlostních obvodů a citlivých a specifických součástek jako je obrazový snímač. Je nastíněna konfigurace a programování obvodu Xilinx Zynq a nakonec je provedeno zjednodušené teoretické ověření funkčnosti navrženého modulu.

APA, Harvard, Vancouver, ISO, and other styles

15

Kendrick, Connah. "Markerless facial motion capture : deep learning approaches on RGBD data." Thesis, Manchester Metropolitan University, 2018. http://e-space.mmu.ac.uk/622357/.

Full text

Abstract:

Facial expressions are a series of fast, complex and interconnected movement that causes an array of deformations, such as stretching, compressing and folding of the skin. Identifying expression is a natural process in human vision, but due to the diversity of faces, it has many challenges for computer vision. Research in markerless facial motion capture using single Red Green Blue (RGB) camera has gained popularity due to the wide access of the data, such as from mobile phones. The motivation behind this work is much of the existing work attempts to infer the 3-Dimensional (3D) data from 2-Dimensional (2D) images, such as in motion capture multiple 2D cameras are calibration to allow some depth prediction. Whereas, the inclusion of Red Green Blue Depth (RGBD) sensors that give ground truth depth data could gain a better understanding of the human face and how expressions are visualised. The aim of this thesis is to investigate and develop novel methods of markerless facial motion capture, where the focus is on the inclusions of RGBD data to provide 3D data. The contributions are: A tool to aid in the annotation of 3D facial landmarks; A novel neural network that demonstrate the ability of predicting 2D and 3D landmarks by merging RGBD data; Working application that demonstrates complex deep learning network on portable handheld devices; A review of existing methods of denoising fine detail in depth maps using neural networks; A network for the complete analysis of facial landmarks and expressions in 3D. The 3D annotator was developed to overcome the issues of relying on existing 3D modelling software, which made feature identification difficult. The technique of predicting 2D and 3D with auxiliary information, allowed high accuracy 3D landmarking, without the need for full model generation. Also, it outperformed other recent techniques of landmarking. The networks running on the handheld devices show as a proof of concept that even without much optimisation, a complex task can be performed in near real-time. Denoising Time of Flight (ToF) depth maps, showed much more complexity than the tradition RGB denoising, where we reviewed and applied an array of techniques to the task. The full facial analysis showed that when neural networks perform on a wide range of related task for auxiliary information allow for deep understanding of the overall task. The research for facial processing is vast, but still with many new problems and challenges to face and improve upon. While RGB cameras are used widely, we see the inclusion of high accuracy and cost-effective depth sensing device available. The new devices allow better understanding of facial features and expression. By using and merging RGB data, the area of facial landmarking, and expression intensity recognition can be improved.

APA, Harvard, Vancouver, ISO, and other styles

16

Dong, Wei S. M. Massachusetts Institute of Technology. "Innovative color management methods for RGB printing." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/38292.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2006.<br>Includes bibliographical references (leaf 50).<br>Re-calibrating a printer in response to systematic changes is measurement and labor intensive. In this study, a fast correction method with cycle-to-cycle control was proposed. The process includes two steps: the creation of look-up table using a characterization data set, and image color compensation in conjunction with Windows printing architecture. Several types of correction models for determining printer characterization were proposed and evaluated, including polynomial models and neural network models. The most successful of these methods was the quadratic spline interpolation model, which removed most errors introduced by the changes of colorant and printing substrate. A significant reduction in error was realized by incorporating this technique into the color management program.<br>by Wei Dong.<br>M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

17

Vanderpuije, Curtis N. "Innovative color management methods for RGB printing." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/38285.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2006.<br>Includes bibliographical references (leaf 57).<br>The demand for printing excellent quality images has increased tremendously in parallel to the growth spurts in the digital camera market. Printing good quality images consistently, however, remains a difficult and/or expensive venture despite the numerous advances in color technology and printing. To alleviate these issues, a color compensating software solution was developed to utilize the unique Kikuze calibration chart to improve printer output. The software solution integrates with the windows printing process at the operating system level through a UNIDRV plug-in. The plug-in retrieves the data within the print stream, passes it on to the color compensation engine which corrects the color data by mapping input and output colors obtained via a B-spline interpolation algorithm. The rendered image is re-introduced into the print stream for final printing. The prototype achieved successful results and can be packaged with commercial printers after a few refinements.<br>by Curtis N. Vanderpuije.<br>M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

18

Zemánek, Petr. "Modulární RGB LED displej s rozhraním Ethernet." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-220700.

Full text

Abstract:

This thesis deals with an electronic circuit and a PCB of a modular RGB LED display with the Ehernet interface. Firstly, author describes a RGB colour model, features of RGB LED displays, ways of control them. The next chapter contains a short description of the Ethernet interface, UDP and TCP protocols and a lwIP TCP/IP stack. The last theoretical chapter is an introduction to ARM Cortex-M3 and Cortex-M4 based microcontrollers. The next chaper is deals with a hardware design of the modular RGB LED display. The device is designed to be modular. Individual devices can be combosed together and create a larger display. Data from the Ethernet interface will be displayed on the RGB LED matrix, resolution of the matrix is 32 × 32 (1024 diodes). A refresh frequency is 100 Hz, a color depth is High color (16 bits) and a scanning 1/16 (two rows is driven at the same time). The next chapter describes the firmware for the RGB LED display, all its logical parts including a web page. Author also created the PC application, which sends pictures using UDP protocol to individual modules.

APA, Harvard, Vancouver, ISO, and other styles

19

Oliver, Moll Javier. "PERSON RE-IDENTIFICATION USING RGB-DEPTH CAMERAS." Doctoral thesis, Universitat Politècnica de València, 2015. http://hdl.handle.net/10251/59227.

Full text

Abstract:

[EN] The presence of surveillance systems in our lives has drastically increased during the last years. Camera networks can be seen in almost every crowded public and private place, which generate huge amount of data with valuable information. The automatic analysis of data plays an important role to extract relevant information from the scene. In particular, the problem of person re-identification is a prominent topic that has become of great interest, specially for the fields of security or marketing. However, there are some factors, such as changes in the illumination conditions, variations in the person pose, occlusions or the presence of outliers that make this topic really challenging. Fortunately, the recent introduction of new technologies such as depth cameras opens new paradigms in the image processing field and brings new possibilities. This Thesis proposes a new complete framework to tackle the problem of person re-identification using commercial rgb-depth cameras. This work includes the analysis and evaluation of new approaches for the modules of segmentation, tracking, description and matching. To evaluate our contributions, a public dataset for person re-identification using rgb-depth cameras has been created. Rgb-depth cameras provide accurate 3D point clouds with color information. Based on the analysis of the depth information, an novel algorithm for person segmentation is proposed and evaluated. This method accurately segments any person in the scene, and naturally copes with occlusions and connected people. The segmentation mask of a person generates a 3D person cloud, which can be easily tracked over time based on proximity. The accumulation of all the person point clouds over time generates a set of high dimensional color features, named raw features, that provides useful information about the person appearance. In this Thesis, we propose a family of methods to extract relevant information from the raw features in different ways. The first approach compacts the raw features into a single color vector, named Bodyprint, that provides a good generalisation of the person appearance over time. Second, we introduce the concept of 3D Bodyprint, which is an extension of the Bodyprint descriptor that includes the angular distribution of the color features. Third, we characterise the person appearance as a bag of color features that are independently generated over time. This descriptor receives the name of Bag of Appearances because its similarity with the concept of Bag of Words. Finally, we use different probabilistic latent variable models to reduce the feature vectors from a statistical perspective. The evaluation of the methods demonstrates that our proposals outperform the state of the art.<br>[ES] La presencia de sistemas de vigilancia se ha incrementado notablemente en los últimos anños. Las redes de videovigilancia pueden verse en casi cualquier espacio público y privado concurrido, lo cual genera una gran cantidad de datos de gran valor. El análisis automático de la información juega un papel importante a la hora de extraer información relevante de la escena. En concreto, la re-identificación de personas es un campo que ha alcanzado gran interés durante los últimos años, especialmente en seguridad y marketing. Sin embargo, existen ciertos factores, como variaciones en las condiciones de iluminación, variaciones en la pose de la persona, oclusiones o la presencia de artefactos que hacen de este campo un reto. Afortunadamente, la introducción de nuevas tecnologías como las cámaras de profundidad plantea nuevos paradigmas en la visión artificial y abre nuevas posibilidades. En esta Tesis se propone un marco completo para abordar el problema de re-identificación utilizando cámaras rgb-profundidad. Este trabajo incluye el análisis y evaluación de nuevos métodos de segmentación, seguimiento, descripción y emparejado de personas. Con el fin de evaluar las contribuciones, se ha creado una base de datos pública para re-identificación de personas usando estas cámaras. Las cámaras rgb-profundidad proporcionan nubes de puntos 3D con información de color. A partir de la información de profundidad, se propone y evalúa un nuevo algoritmo de segmentación de personas. Este método segmenta de forma precisa cualquier persona en la escena y resuelve de forma natural problemas de oclusiones y personas conectadas. La máscara de segmentación de una persona genera una nube de puntos 3D que puede ser fácilmente seguida a lo largo del tiempo. La acumulación de todas las nubes de puntos de una persona a lo largo del tiempo genera un conjunto de características de color de grandes dimensiones, denominadas características base, que proporcionan información útil de la apariencia de la persona. En esta Tesis se propone una familia de métodos para extraer información relevante de las características base. La primera propuesta compacta las características base en un vector único de color, denominado Bodyprint, que proporciona una buena generalización de la apariencia de la persona a lo largo del tiempo. En segundo lugar, se introducen los Bodyprints 3D, definidos como una extensión de los Bodyprints que incluyen información angular de las características de color. En tercer lugar, la apariencia de la persona se caracteriza mediante grupos de características de color que se generan independientemente a lo largo del tiempo. Este descriptor recibe el nombre de Grupos de Apariencias debido a su similitud con el concepto de Grupos de Palabras. Finalmente, se proponen diferentes modelos probabilísticos de variables latentes para reducir los vectores de características desde un punto de vista estadístico. La evaluación de los métodos demuestra que nuestras propuestas superan los métodos del estado del arte.<br>[CAT] La presència de sistemes de vigilància s'ha incrementat notòriament en els últims anys. Les xarxes de videovigilància poden veure's en quasi qualsevol espai públic i privat concorregut, la qual cosa genera una gran quantitat de dades de gran valor. L'anàlisi automàtic de la informació pren un paper important a l'hora d'extraure informació rellevant de l'escena. En particular, la re-identificaciò de persones és un camp que ha aconseguit gran interès durant els últims anys, especialment en seguretat i màrqueting. No obstant, hi ha certs factors, com variacions en les condicions d'il.luminació, variacions en la postura de la persona, oclusions o la presència d'artefactes que fan d'aquest camp un repte. Afortunadament, la introducció de noves tecnologies com les càmeres de profunditat, planteja nous paradigmes en la visió artificial i obri noves possibilitats. En aquesta Tesi es proposa un marc complet per abordar el problema de la re-identificació mitjançant càmeres rgb-profunditat. Aquest treball inclou l'anàlisi i avaluació de nous mètodes de segmentació, seguiment, descripció i emparellat de persones. Per tal d'avaluar les contribucions, s'ha creat una base de dades pública per re-identificació de persones emprant aquestes càmeres. Les càmeres rgb-profunditat proporcionen núvols de punts 3D amb informació de color. A partir de la informació de profunditat, es defineix i s'avalua un nou algorisme de segmentació de persones. Aquest mètode segmenta de forma precisa qualsevol persona en l'escena i resol de forma natural problemes d'oclusions i persones connectades. La màscara de segmentació d'una persona genera un núvol de punts 3D que pot ser fàcilment seguida al llarg del temps. L'acumulació de tots els núvols de punts d'una persona al llarg del temps genera un conjunt de característiques de color de grans dimensions, anomenades característiques base, que hi proporcionen informació útil de l'aparença de la persona. En aquesta Tesi es proposen una família de mètodes per extraure informació rellevant de les característiques base. La primera proposta compacta les característiques base en un vector únic de color, anomenat Bodyprint, que proporciona una bona generalització de l'aparença de la persona al llarg del temps. En segon lloc, s'introdueixen els Bodyprints 3D, definits com una extensió dels Bodyprints que inclouen informació angular de les característiques de color. En tercer lloc, l'aparença de la persona es caracteritza amb grups de característiques de color que es generen independentment a llarg del temps. Aquest descriptor reb el nom de Grups d'Aparences a causa de la seua similitud amb el concepte de Grups de Paraules. Finalment, es proposen diferents models probabilístics de variables latents per reduir els vectors de característiques des d'un punt de vista estadístic. L'avaluació dels mètodes demostra que les propostes presentades superen als mètodes de l'estat de l'art.<br>Oliver Moll, J. (2015). PERSON RE-IDENTIFICATION USING RGB-DEPTH CAMERAS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/59227<br>TESIS

APA, Harvard, Vancouver, ISO, and other styles

20

Kacete, Amine. "Unconstrained Gaze Estimation Using RGB-D Camera." Thesis, CentraleSupélec, 2016. http://www.theses.fr/2016SUPL0012/document.

Full text

Abstract:

Dans ce travail, nous avons abordé le problème d’estimation automatique du regard dans des environnements utilisateur sans contraintes. Ce travail s’inscrit dans la vision par ordinateur appliquée à l’analyse automatique du comportement humain. Plusieurs solutions industrielles sont aujourd’hui commercialisées et donnent des estimations précises du regard. Certaines ont des spécifications matérielles très complexes (des caméras embarquées sur un casque ou sur des lunettes qui filment le mouvement des yeux) et présentent un niveau d’intrusivité important, ces solutions sont souvent non accessible au grand public. Cette thèse vise à produire un système d’estimation automatique du regard capable d’augmenter la liberté du mouvement de l’utilisateur par rapport à la caméra (mouvement de la tête, distance utilisateur-capteur), et de réduire la complexité du système en utilisant des capteurs relativement simples et accessibles au grand public. Dans ce travail, nous avons exploré plusieurs paradigmes utilisés par les systèmes d’estimation automatique du regard. Dans un premier temps, Nous avons mis au point deux systèmes basés sur deux approches classiques: le premier basé caractéristiques et le deuxième basé semi apparence. L’inconvénient majeur de ces paradigmes réside dans la conception des systèmes d'estimation du regard qui supposent une indépendance totale entre l'image d'apparence des yeux et la pose de la tête. Pour corriger cette limitation, Nous avons convergé vers un nouveau paradigme qui unifie les deux blocs précédents en construisant un espace regard global, nous avons exploré deux directions en utilisant des données réelles et synthétiques respectivement<br>In this thesis, we tackled the automatic gaze estimation problem in unconstrained user environments. This work takes place in the computer vision research field applied to the perception of humans and their behaviors. Many existing industrial solutions are commercialized and provide an acceptable accuracy in gaze estimation. These solutions often use a complex hardware such as range of infrared cameras (embedded on a head mounted or in a remote system) making them intrusive, very constrained by the user's environment and inappropriate for a large scale public use. We focus on estimating gaze using cheap low-resolution and non-intrusive devices like the Kinect sensor. We develop new methods to address some challenging conditions such as head pose changes, illumination conditions and user-sensor large distance. In this work we investigated different gaze estimation paradigms. We first developed two automatic gaze estimation systems following two classical approaches: feature and semi appearance-based approaches. The major limitation of such paradigms lies in their way of designing gaze systems which assume a total independence between eye appearance and head pose blocks. To overcome this limitation, we converged to a novel paradigm which aims at unifying the two previous components and building a global gaze manifold, we explored two global approaches across the experiments by using synthetic and real RGB-D gaze samples

APA, Harvard, Vancouver, ISO, and other styles

21

Cimmino, Martin. "RGB-D Object Recognition for Deep Robotic Learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14537/.

Full text

Abstract:

Negli ultimi anni, il successo delle tecniche di Deep Learning in una grande varietà di problemi sia nel contesto della visione artificiale che in quello dell’elaborazione del linguaggio naturale ha contribuito all’applicazione di reti neurali artificiali profonde a sistemi robotici. Grazie all’utilizzo di sensori RGB-D per l’acquisizione dell’informazione di profondità di una scena del mondo reale, i sistemi robotizzati stanno sempre più semplificando alcune delle sfide comuni nel campo della visione robotica. Nel contesto del riconoscimento oggetti RGB-D, un’attività fondamentale per diverse applicazioni robotiche, data una CNN come modello di apprendimento ed un dataset RGB-D, ci si chiede spesso quale sia la migliore strategia di preprocessamento della profondità al fine di ottenere una migliore accuratezza di classificazione. Un’altra domanda cruciale è se l’informazione di profondità incrementerà in maniera notevole o meno l’accuratezza del classificatore.Questa tesi è interessata a cercare di rispondere a queste domande chiave. In particolare, discutiamo e confrontiamo i risultati ottenuti dall’impiego di tre strategie di preprocessamento dell’informazione di profondità, dove ognuna di queste strategie conduce ad uno specifico scenario di training. Questi scenari vengono valutati per mezzo del dataset CORe50 RGB-D. Infine, questa tesi prova che, nel contesto del riconoscimento oggetti, l’utilizzo dell’informazione di profondità migliora significativamente l’accuratezza di classificazione. A tal fine, dalla nostra analisi si evince che la precisione e completezza dell’informazione di profondità ed eventualmente la sua strategia di segmentazione svolgono un ruolo fondamentale. Inoltre, mostriamo che effettuare un training from scratch di una CNN (rispetto ad un fine-tuning) può permettere di apprezzare miglioramenti notevoli dell’accuratezza.

APA, Harvard, Vancouver, ISO, and other styles

22

Biba, Panagiota. "New waviness measurement system using RGB LED lights." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-28314.

Full text

Abstract:

Due to the rapid technological developments in the car industry and the high quality demands of customers, manufacturers and researchers focus on the reduction of surface roughness making use of various surface topography measurement systems. This master thesis focuses on development of a waviness measurement system (WMS) at Volvo Cars where light from different heights and angles illuminates the surface of an extended object in order to acquire images with different intensities due to shadowing effect and reflection. With this, surface irregularities and imperfections can be detected both in polished and unpolished surfaces for improving the car panels in the manufacturing process. The initial WMS idea was to illuminate the surface at different heights from the four corners of a dark room using 20 flash lights and a camera positioned exactly on the top of the surface in the middle of the room. The first light goes on and the image is acquired. This procedure continues for all flash lights in 19s.The acquired images were evaluated by Matlab application. With the new WMS system flash lights are replaced by 32 RGB COB LED lights using the DMX512 protocol to communicate with them. The system runs in 9s which is half the time of the old WMS system. New LabView and Matlab codes were adjusted to the new parameters and devices. In the end, measurements were taken with different surfaces, exposure times and color lights. Details of the new devices and software are analyzed in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

23

Fioraio, Nicola <1987&gt. "Scene Reconstruction And Understanding By RGB-D Sensors." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amsdottorato.unibo.it/6941/.

Full text

Abstract:

This thesis investigates interactive scene reconstruction and understanding using RGB-D data only. Indeed, we believe that depth cameras will still be in the near future a cheap and low-power 3D sensing alternative suitable for mobile devices too. Therefore, our contributions build on top of state-of-the-art approaches to achieve advances in three main challenging scenarios, namely mobile mapping, large scale surface reconstruction and semantic modeling. First, we will describe an effective approach dealing with Simultaneous Localization And Mapping (SLAM) on platforms with limited resources, such as a tablet device. Unlike previous methods, dense reconstruction is achieved by reprojection of RGB-D frames, while local consistency is maintained by deploying relative bundle adjustment principles. We will show quantitative results comparing our technique to the state-of-the-art as well as detailed reconstruction of various environments ranging from rooms to small apartments. Then, we will address large scale surface modeling from depth maps exploiting parallel GPU computing. We will develop a real-time camera tracking method based on the popular KinectFusion system and an online surface alignment technique capable of counteracting drift errors and closing small loops. We will show very high quality meshes outperforming existing methods on publicly available datasets as well as on data recorded with our RGB-D camera even in complete darkness. Finally, we will move to our Semantic Bundle Adjustment framework to effectively combine object detection and SLAM in a uniﬁed system. Though the mathematical framework we will describe does not restrict to a particular sensing technology, in the experimental section we will refer, again, only to RGB-D sensing. We will discuss successful implementations of our algorithm showing the beneﬁt of a joint object detection, camera tracking and environment mapping.

APA, Harvard, Vancouver, ISO, and other styles

24

Watson, Owen. "Full 3D Reconstruction From Multiple RGB-D Cameras." Scholar Commons, 2013. http://scholarcommons.usf.edu/etd/4607.

Full text

Abstract:

This thesis describes a novel procedure for achieving full 3D reconstruction from multiple RGB-D cameras configured such that the amount of overlap between views is low. Overlap is used to describe the portion of a scene that is common in a pair of views, and is considered low when at most 50% of the scene is common. Compatible systems are configured such that interpreting cameras as nodes and overlap as edges, a connected undirected graph can be constructed. The fundamental goal of the proposed procedure is to calibrate a given system of cameras. Calibration is the process of finding the transformation from each camera's point of view to the reconstructed scenes global coordinate system. The procedure focuses on maintaing the accuracy of reconstruction once the system is calibrated.par RGB-D cameras gained popularity from their ability to generate dense 3D images; however, individually these cameras can not provide full 3D images because of factors like occlusions from and a limited field of view. In order to successfully combine views there must exist common features that can be matched or prior heuristics pertaining to the environment that can be used to infer alignment. Intuitively, corresponding features exist in overlapping regions of views. Combining data from pairs of overlapping views would provide a more full 3D reconstructed scene. A calibrated system of cameras is susceptible to misalignment. Re-calibration of the entire system is expensive, and is unnecessary if only a small number of cameras became misaligned. Correcting misalignment is a much more practical approach for maintaing calibration accuracy over extended periods of time. par The presented procedure begins by identifying the necessary overlapping pairs of cameras for calibration. These pairs form a spanning tree in which overlap is maximized; this tree is referred to as the alignment tree. Each pair is aligned by a two-phase procedure that transforms the data from the coordinate system of the camera at a lower level in the alignment tree to that of the higher. The transformation between each pair is catalogued and used to reconstruction of incoming frames from the cameras. Once calibrated, cameras are assumed to be independent and their successive frames are compared to detect motion. The catalogued transformations are updated on instances that motion is detected essentially correcting misalignment. \par At the end of the calibration process the reconstructed scene generated from the combined data would contain relative alignment accuracy throughout all regions. Using this proposed algorithm reconstruction accuracy of over 90% was achieved for systems calibrated with the angle between the cameras 45 degrees or more. Once calibrated the cameras can observe and reconstruct a scene on every frame. This is reliant on the assumption that the cameras will be fixed; however, in a practical sense this cannot be guaranteed. Systems maintained over 90% reconstruction accuracy during operation with induced misalignment. This procedure also maintained the reconstruction accuracy from calibration during execution for up to an hour. The fundamental contribution of this work is the novel concept of using overlap as a means of expressing how a group of cameras are connected. Building a spanning tree representation of the given system of cameras provides a useful structure for uniquely expressing the relationship between the cameras. A calibration procedure that is effective with low overlapping views is also contributed. The final contribution is a procedure to maintain reconstruction accuracy overtime in a mostly static environment.

APA, Harvard, Vancouver, ISO, and other styles

25

Byttner, Wolf. "Classifying RGB Images with multi-colour Persistent Homology." Thesis, Linköpings universitet, Matematiska institutionen, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157641.

Full text

Abstract:

In Image Classification, pictures of the same type of object can have very different pixel values. Traditional norm-based metrics therefore fail to identify objectsin the same category. Topology is a branch of mathematics that deals with homeomorphic spaces, by discarding length. With topology, we can discover patterns in the image that are invariant to rotation, translation and warping. Persistent Homology is a new approach in Applied Topology that studies the presence of continuous regions and holes in an image. It has been used successfully for image segmentation and classification [12]. However, current approaches in image classification require a grayscale image to generate the persistence modules. This means information encoded in colour channels is lost. This thesis investigates whether the information in the red, green and blue colour channels of an RGB image hold additional information that could help algorithms classify pictures. We apply two recent methods, one by Adams [2] and the other by Hofer [25], on the CUB-200-2011 birds dataset [40] andfind that Hofer’s method produces significant results. Additionally, a modified method based on Hofer that uses the RGB colour channels produces significantly better results than the baseline, with over 48 % of images correctly classified, compared to 44 % and with a more significant improvement at lower resolutions.This indicates that colour channels do provide significant new information and generating one persistence module per colour channel is a viable approach to RGB image classification.

APA, Harvard, Vancouver, ISO, and other styles

26

Liu, Shuang. "3D facial performance capture from monocular RGB video." Thesis, Bournemouth University, 2018. http://eprints.bournemouth.ac.uk/30227/.

Full text

Abstract:

3D facial performance capture is an essential technique for animation production in featured films, video gaming, human computer interaction, VR/AR asset creation and digital heritage, which all have huge impact on our daily life. Traditionally, dedicated hardware such as depth sensors, laser scanners and camera arrays have been developed to acquire depth information for such purpose. However, such sophisticated instruments can only be operated by trained professionals. In recent years, the wide spread availability of mobile devices, and the increased interest of casual untrained users in applications such as image, video editing, virtual and facial model creation, have sparked interest in 3D facial reconstruction from 2D RGB input. Due to the depth ambiguity and facial appearance variation, 3D facial performance capture and modelling from 2D images are inherently ill-posed problems. However, with strong prior knowledge of the human face, it is possible to accurately infer the true 3D facial shape and performance from multiple observations captured with different viewing angles. Various 3D from 2D methods have been proposed and proven to work well in controlled environments. Nevertheless there are still many unexplored issues in uncontrolled in-the-wild environments. In order to achieve the same level of performance in controlled environments, interfering factors in uncontrolled environments such as varying illumination, partial occlusion and facial variation not captured by prior knowledge would require the development of new techniques. This thesis addresses existing challenges and proposes novel methods involving 2D landmark detection, 3D facial reconstruction and 3D performance tracking, which are validated through theoretical research and experimental studies. 3D facial performance tracking is a multidisciplinary problem involving many areas such as computer vision, computer graphics and machine learning. To deal with the large variations within a single image, we present new machine learning techniques for facial landmark detection based on our observation of the facial features in challenging scenarios to increase the robustness. To take advantage of the evidence aggregated from multiple observations, we present new robust and efficient optimisation techniques that impose consistency constrains that help filter out outliers. To exploit the person-specific model generation, temporal and spatial coherence in continuous video input, we present new methods to improve the performance via optimisation. In order to track the 3D facial performance, the fundamental prerequisite for good results is the accurate underlying 3D model of the actor. In this thesis, we present new methods that are targeted at 3D facial geometry reconstruction, which are more efficient than existing generic 3D geometry reconstruction methods. Evaluation and validation were obtained and analysed from substantial experiment, which shows the proposed methods in this thesis outperform the state-of-the-art methods and enable us to generate high quality results with less constraints.

APA, Harvard, Vancouver, ISO, and other styles

27

Masse, Jean-Thomas. "Capture de mouvements humains par capteurs RGB-D." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30361/document.

Full text

Abstract:

L'arrivée simultanée de capteurs de profondeur et couleur, et d'algorithmes de détection de squelettes super-temps-réel a conduit à un regain de la recherche sur la capture de mouvements humains. Cette fonctionnalité constitue un point clé de la communication Homme-Machine. Mais le contexte d'application de ces dernières avancées est l'interaction volontaire et fronto-parallèle, ce qui permet certaines approximations et requiert un positionnement spécifique des capteurs. Dans cette thèse, nous présentons une approche multi-capteurs, conçue pour améliorer la robustesse et la précision du positionnement des articulations de l'homme, et fondée sur un processus de lissage trajectoriel par intégration temporelle, et le filtrage des squelettes détectés par chaque capteur. L'approche est testée sur une base de données nouvelle acquise spécifiquement, avec une méthodologie d'étalonnage adaptée spécialement. Un début d'extension à la perception jointe avec du contexte, ici des objets, est proposée<br>Simultaneous apparition of depth and color sensors and super-realtime skeleton detection algorithms led to a surge of new research in Human Motion Capture. This feature is a key part of Human-Machine Interaction. But the applicative context of those new technologies is voluntary, fronto-parallel interaction with the sensor, which allowed the designers certain approximations and requires a specific sensor placement. In this thesis, we present a multi-sensor approach, designed to improve robustness and accuracy of a human's joints positionning, and based on a trajectory smoothing process by temporal integration, and filtering of the skeletons detected in each sensor. The approach has been tested on a new specially constituted database, with a specifically adapted calibration methodology. We also began extending the approach to context-based improvements, with object perception being proposed

APA, Harvard, Vancouver, ISO, and other styles

28

Calderon, Olle. "Genomskinlig touchsensor för pålitlig styrning av RGB-lysdioder." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210668.

Full text

Abstract:

Many electronic products of today utilize some form of touch technology. Looking at everything from smartphone screens to ticket vending machines, it is obvious that the number of applications is big and the demand is huge. Touch technologies generally require no force to use, which reduces mechanical wear-and-tear and thus increases their lifespan. In this thesis, a touch system was constructed to control RGB LEDs. The sensor surface was made from a white, semi-clear plastic, through which the LEDs’ light should be visible. Since the plastic both needed to transmit visible light and act as a touch surface, a problem arose: how do you construct a transparent touch sensor that can control RGB LEDs in a reliable way? Firstly, this thesis describes and discusses many of the different available touch technologies and their strengths and weaknesses. From this information, a specific sensor technology was chosen, from which a prototype of the transparent touch sensor was built. The sensor prototype was a capacitive sensor, made from a thin metallic mesh, placed on the back of the plastic surface. Using an embedded system, based on a differential capacitance touch IC and a microcontroller, the capacitance of the sensor was measured and converted into signals which controlled the LEDs. In order to ensure the sensor’s reliability, the environmental factors which affected the sensor had to be determined and handled. To do this, measurements were performed on the sensor to see how its capacitance changed with environmental changes. It was concluded that moisture, temperature and frequency had negligible effect on the sensor’s dielectric properties. However, it was discovered that proximity to ground greatly affected the sensor and that the sensor was significantly dependent on its enclosure and grounding.<br>Många av de elektronikprodukter som produceras idag använder någon form av touchteknik. Då den används i allt från skärmar på smartphones till biljettautomater är det tydligt att användningsområdena är många och att efterfrågan är stor. Touchtekniker kräver i regel ingen kraft för att användas, vilket minskar mekaniskt slitage och därför ökar dess livslängd. I detta arbete skulle en touchstyrning till en uppsättning RGB-lysdioder byggas. Problemet var att sensorytan skulle vara en vit, halvgenomskinlig plast, genom vilken lysdioderna skulle lysa. Eftersom plasten både skulle släppa igenom ljus och agera touchyta uppstod problemet: hur konstruerar man en genomskinlig touchsensor som kan styra RGBlysdioder på ett pålitligt sätt? Denna rapport inleds med att beskriva och diskutera många av de touchtekniker som finns idag samt vilka föroch nackdelar de har. Utifrån denna information valdes en specifik sensorteknik, varifrån en prototyp på den genomskinliga touchsensorn byggdes. Sensorprototypen var en kapacitiv sensor uppbyggd av ett tunt metallnät placerat bakom plastpanelen. Med ett inbyggt system, bestående av en integrerad touchkrets för differentiell kapacitansmätning och en mikrokontroller, mättes sensorns kapacitans och en styrning till lysdioderna implementerades. För att säkerställa sensorns pålitlighet var det viktigt att analysera vilka miljöfaktorer som påverkade sensorn och hur de kunde hanteras. Mätningar utfördes därför på sensorn för att se hur dess kapacitans förändrades med avseende på dessa. Det kunde konstateras att fukt, temperatur och frekvens hade försumbar påverkan på sensorns dielektrum. Däremot kunde det visas att närhet till jordplan påverkade sensorn avsevärt och att sensorns tillförlitlighet berodde signifikant på dess inkapsling och jordning.

APA, Harvard, Vancouver, ISO, and other styles

29

ADORF, JULIUS. "Motion Segmentation of RGB-D Videosvia Trajectory Clustering." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-153943.

Full text

Abstract:

Motion segmentation of RGB-D videos can be a first step towards object reconstruction in dynamic scenes. The objective in this thesis is to end an ecient motion segmentation method that can deal with a moving camera. To this end, we adopt a feature-based approach where keypoints in the images are tracked over time. The variation in the observed pairwise 3-d distances is used to determine which of the points move similarly. We then employ spectral clusteringto group trajectories into clusters with similar motion, thereby obtaining a sparse segmentation of the dynamic objectsin the scene. The results on twenty scenes from real world datasets and simulations show that while the method needs more sophistication to segment all of them, several dynamic scenes have been successfully segmented at a processing speed of multiple frames per second.

APA, Harvard, Vancouver, ISO, and other styles

30

Widebäck, West Nikolaus. "Multiple Session 3D Reconstruction using RGB-D Cameras." Thesis, Linköpings universitet, Datorseende, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-112799.

Full text

Abstract:

In this thesis we study the problem of multi-session dense rgb-d slam for 3D reconstruc- tion. Multi-session reconstruction can allow users to capture parts of an object that could not easily be captured in one session, due for instance to poor accessibility or user mistakes. We first present a thorough overview of single-session dense rgb-d slam and describe the multi-session problem as a loosening of the incremental camera movement and static scene assumptions commonly held in the single-session case. We then implement and evaluate sev- eral variations on a system for doing two-session reconstruction as an extension to a single- session dense rgb-d slam system. The extension from one to several sessions is divided into registering separate sessions into a single reference frame, re-optimizing the camera trajectories, and fusing together the data to generate a final 3D model. Registration is done by matching reconstructed models from the separate sessions using one of two adaptations on a 3D object detection pipeline. The registration pipelines are evaluated with many different sub-steps on a challenging dataset and it is found that robust registration can be achieved using the proposed methods on scenes without degenerate shape symmetry. In particular we find that using plane matches between two sessions as constraints for as much as possible of the registration pipeline improves results. Several different strategies for re-optimizing camera trajectories using data from both ses- sions are implemented and evaluated. The re-optimization strategies are based on re- tracking the camera poses from all sessions together, and then optionally optimizing over the full problem as represented on a pose-graph. The camera tracking is done by incrementally building and tracking against a tsdf volume, from which a final 3D mesh model is extracted. The whole system is qualitatively evaluated against a realistic dataset for multi-session re- construction. It is concluded that the overall approach is successful in reconstructing objects from several sessions, but that other fine grained registration methods would be required in order to achieve multi-session reconstructions that are indistinguishable from singe-session results in terms of reconstruction quality.

APA, Harvard, Vancouver, ISO, and other styles

31

Pires, David da Silva. "Estimação de movimento a partir de imagens RGBD usando homomorfismo entre grafos." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-13022014-152114/.

Full text

Abstract:

Recentemente surgiram dispositivos sensores de profundidade capazes de capturar textura e geometria de uma cena em tempo real. Com isso, diversas técnicas de Visão Computacional, que antes eram aplicadas apenas a texturas, agora são passíveis de uma reformulação, visando o uso também da geometria. Ao mesmo tempo em que tais algoritmos, tirando vantagem dessa nova tecnologia, podem ser acelerados ou tornarem-se mais robustos, surgem igualmente diversos novos desafios e problemas interessantes a serem enfrentados. Como exemplo desses dispositivos podemos citar o do Projeto Vídeo 4D, do IMPA, e o Kinect (TM), da Microsoft. Esses equipamentos fornecem imagens que vêm sendo chamadas de RGBD, fazendo referência aos três canais de cores e ao canal adicional de profundidade (com a letra \'D\' vindo do termo depth, profundidade em inglês). A pesquisa descrita nesta tese apresenta uma nova abordagem não-supervisionada para a estimação de movimento a partir de vídeos compostos por imagens RGBD. Esse é um passo intermediário necessário para a identificação de componentes rígidos de um objeto articulado. Nosso método faz uso da técnica de casamento inexato (homomorfismo) entre grafos para encontrar grupos de pixels (blocos) que se movem para um mesmo sentido em quadros consecutivos de um vídeo. Com o intuito de escolher o melhor casamento para cada bloco, é minimizada uma função custo que leva em conta distâncias tanto no espaço de cores RGB quanto no XYZ (espaço tridimensional do mundo). A contribuição metodológica consiste justamente na manipulação dos dados de profundidade fornecidos pelos novos dispositivos de captura, de modo que tais dados passem a integrar o vetor de características que representa cada bloco nos grafos a serem casados. Nosso método não usa quadros de referência para inicialização e é aplicável a qualquer vídeo que contenha movimento paramétrico por partes. Para blocos cujas dimensões causem uma relativa diminuição na resolução das imagens, nossa aplicação roda em tempo real. Para validar a metodologia proposta, são apresentados resultados envolvendo diversas classes de objetos com diferentes tipos de movimento, tais como vídeos de pessoas caminhando, os movimento de um braço e um casal de dançarinos de samba de gafieira. Também são apresentados os avanços obtidos na modelagem de um sistema de vídeo 4D orientado a objetos, o qual norteia o desenvolvimento de diversas aplicações a serem desenvolvidas na continuação deste trabalho.<br>Depth-sensing devices have arised recently, allowing real-time scene texture and depth capture. As a result, many computer vision techniques, primarily applied only to textures, now can be reformulated using additional properties like the geometry. At the same time that these algorithms, making use of this new technology, can be accelerated or be made more robust, new interesting challenges and problems to be confronted are appearing. Examples of such devices include the 4D Video Project, from IMPA, and Kinect (TM) from Microsoft. These devices offer the so called RGBD images, being related to the three color channels and to the additional depth channel. The research described on this thesis presents a new non-supervised approach to estimate motion from videos composed by RGBD images. This is an intermediary and necessary step to identify the rigid components of an articulated object. Our method uses the technique of inexact graph matching (homomorphism) to find groups of pixels (patches) that move to the same direction in subsequent video frames. In order to choose the best matching for each patch, we minimize a cost function that accounts for distances on RGB color and XYZ (tridimensional world coordinates) spaces. The methodological contribution consists on depth data manipulation given by the new capture devices, such that these data become components of the feature vector that represents each patch on graphs to be matched. Our method does not use reference frames in order to be initialized and it can be applied to any video that contains piecewise parametric motion. For patches which allow a relative decrease on images resolution, our application runs in real-time. In order to validate the proposed methodology, we present results involving object classes with different movement kinds, such as videos with walking people, the motions of an arm and a couple of samba dancers. We also present the advances obtained on modeling an object oriented 4D video system, which guide a development of different applications to be developed as future work.

APA, Harvard, Vancouver, ISO, and other styles

32

Sousa, Alexandre Martins Ferreira de. "Superfície mágica: criando superfícies interativas por meio de câmeras RGBD e projetores." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-23122015-104315/.

Full text

Abstract:

Em computação ubíqua, existe a ideia de tornar o computador onipresente, \"invisível\", de modo a aproximar computadores e humanos. Com o avanço das tecnologias de hardware e de software, torna-se interessante investigar possibilidades inovadoras de interação com os computadores. Neste trabalho, exploramos novas formas de interação inspiradas nos atos de desenhar, agarrar e gesticular. Para testá-las, desenvolvemos novos algoritmos baseados em câmeras RGBD para detecção, classificação e rastreamento de objetos, o que permite a concepção de uma instalação interativa que utilize equipamentos portáteis e de baixo custo. Para avaliar as formas de interação propostas, desenvolvemos a Superfície Mágica, um sistema que transforma uma superfície comum (como uma parede ou uma mesa) num espaço interativo multi-toque. A Superfície Mágica identifica toques de dedos de mãos, de canetas coloridas e de um apagador, oferecendo também suporte a uma varinha mágica para interação 3D. A Superfície Mágica suporta a execução de aplicativos, permitindo que uma superfície comum se transforme numa área interativa para desenho, num explorador de mapas, num simulador 3D para navegação em ambientes virtuais, entre outras possibilidades. As áreas de aplicação do sistema vão desde a educação até a arte interativa e o entretenimento. A instalação do protótipo envolve: um sensor Microsoft Kinect, um projetor de vídeo e um computador pessoal.<br>Ubiquitous computing is a concept where computing is thought to be omnipresent, effectively \"invisible\", so that humans and computers are brought together in a seamless way. The progress of hardware and software technologies make it compelling to investigate innovative possibilities of interaction with computers. In this work, we explore novel ways of interaction that are inspired by the acts of drawing, grasping and gesturing. In order to test them, we have developed new RGBD camera-based algorithms for object detection, classification and tracking. This allows the conception of an interactive installation that uses portable and low cost equipment. In order to evaluate the proposed ways of interaction, we have developed the Magic Surface, a system that transforms a regular surface (such as a wall or a tabletop) into a multitouch interactive space. The Magic Surface detects touch of hand fingers, colored pens and eraser. It also supports the usage of a magic wand for 3D interaction. The Magic Surface can run applications, allowing the transformation of a regular surface into an interactive drawing area, a map explorer, a 3D simulator for navigation in virtual environments, among other possibilities. Areas of application range from education to interactive art and entertainment. The setup of our prototype includes: a Microsoft Kinect sensor, a video projector and a personal computer.

APA, Harvard, Vancouver, ISO, and other styles

33

Rafi, Nazari Mina. "Denoising and Demosaicking of Color Images." Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/35802.

Full text

Abstract:

Most digital cameras capture images through Color Filter Arrays (CFA), and reconstruct the full color image from the CFA image. Each CFA pixel only captures one primary color component at each pixel location; the other primary components will be estimated using information from neighboring pixels. During the demosaicking algorithm, the unknown color components will be estimated at each pixel location. Most of the demosaicking algorithms use the RGB Bayer CFA pattern with Red, Green and Blue filters. Some other CFAs contain four color filters. The additional filter is a panchromatic/white filter, and it usually receives the full light spectrum. In this research, we studied and compared different four channel CFAs with panchromatic/white filter, and compared them with three channel CFAs. An appropriate demosaicking algorithm has been developed for each CFA. The most well-known three-channel CFA is Bayer. The Fujifilm X-Trans pattern has been studied in this work as another three-channel CFA with a different structure. Three different four-channel CFAs have been discussed in this research: RGBW-Kodak, RGBW-Bayer and RGBW- $5 \times 5$. The structure and the number of filters for each color are different for these CFAs. Since the Least-Square Luma-Chroma Demultiplexing method is a state of the art demosaicking method for the Bayer CFA, we designed the Least-Square method for RGBW CFAs. The effect of noise on different CFA patterns will be discussed for four channel CFAs. The Kodak database has been used to evaluate our non-adaptive and adaptive demosaicking methods as well as the optimized algorithms with the least square method. The captured values of white (panchromatic/clear) filters in RGBW CFAs have been estimated using red, green and blue filter values. Sets of optimized coefficients have been proposed to estimate the white filter values accurately. The results have been validated using the actual white values of a hyperspectral image dataset. A new denoising-demosaicking method for RGBW-Bayer CFA has been presented in this research. The algorithm has been tested on the Kodak dataset using the estimated value of white filters and a hyperspectral image dataset using the actual value of white filters, and the results have been compared. The results in both cases have been compared with the previous works on RGB-Bayer CFA, and it shows that the proposed algorithm using RGBW-Bayer CFA is working better than RGB-Bayer CFA in presence of noise.

APA, Harvard, Vancouver, ISO, and other styles

34

Villa, Giacomo Maria. "Multi object detection and tracking using RGB-D cameras." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

In this thesis is presented the development of a munti-object detection and tracking method in a low light dynamic environment. After a brief introduction on the history of tracking and a general description of the procedure, the following chapters continue with a list of the most used methods to satisfy the solution of the tracking problem, giving greater importance to those that have been taken into consideration for the solution of our specific problem. After this introduction, a description of the dynamic environment, in which our target objects will have to be traced, is provided. This will lead us to the presentation of the approach used to achieve the goal of tracking bumper cars inside an interactive carousel.

APA, Harvard, Vancouver, ISO, and other styles

35

Hesse, Nikolas [Verfasser], and Ulrich [Akademischer Betreuer] Hofmann. "Unobtrusive medical infant motion analysis from RGB-D data." Freiburg : Universität, 2019. http://d-nb.info/121195675X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Brunetto, Nicholas. "Ricostruzione 3d da immagini rgb-d su piattaforma mobile." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/7337/.

Full text

Abstract:

Viene proposto un porting su piattaforma mobile Android di un sistema SLAM (Simultaneous Localization And Mapping) chiamato SlamDunk. Il porting affronta problematiche di prestazioni e qualità delle ricostruzioni 3D ottenute, proponendo poi la soluzione ritenuta ottimale.

APA, Harvard, Vancouver, ISO, and other styles

37

Sengupta, Agniva. "Visual tracking of deformable objects with RGB-D camera." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S069.

Full text

Abstract:

Le suivi d'objets déformable à partir d’informations visuelles à de nombreuses applications dans le domaine de la robotique, de l'animation ou de la simulation. Dans cette thèse, nous proposons de nouvelles approches pour le suivi d'objets rigides et non rigides à l'aide d'une caméra RGB-D. Cette thèse comporte quatre contributions principales. La première contribution est une nouvelle approche de suivi d'objets dans des images RGB-D qui utilise des erreurs basées sur la profondeur et la photométrie pour suivre et localiser des formes complexes en utilisant leur modèle 3D grossier. La seconde contribution porte sur une méthode de suivi d'objets non rigides reposant sur une approche par éléments finis (FEM) pour suivre et caractériser les déformations. La troisième contribution est une approche de suivi de la déformation qui minimise une combinaison d'erreurs géométriques et photométriques tout en utilisant la FEM comme modèle de déformation. Finalement, la quatrième contribution consiste à estimer les propriétés d'élasticité d'un objet analysant ses déformations toujours à l'aide d'une caméra RGB-D. Une fois les paramètres d'élasticité estimés, la même méthodologie peut être réutilisée pour caractériser les forces de contact<br>Tracking soft objects using visual information has immense applications in the field of robotics, computer graphics and automation. In this thesis, we propose multiple new approaches for tracking both rigid and non-rigid objects using a RGB-D camera. There are four main contributions of this thesis. The first contribution is a rigid object tracking method which utilizes depth and photometry based errors for tracking complex shapes using their coarse, 3D template. The second contribution is a non-rigid object tracking method which uses co-rotational FEM to track deforming objects by regulating the virtual forces acting on the surface of a physics based model of the object. The third contribution is a deformation tracking approach which minimizes a combination of geometric and photometric error while utilizing co-rotation FEM as the deformation model. The fourth contribution involves estimating the elasticity properties of a deforming object while tracking their deformation using RGB-D camera. Once the elasticity parameters have been estimated, the same methodology can be re-utilized for tracking contact forces on the surface of deforming objects

APA, Harvard, Vancouver, ISO, and other styles

38

Hammond, Patrick Douglas. "Deep Synthetic Noise Generation for RGB-D Data Augmentation." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7516.

Full text

Abstract:

Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets.

APA, Harvard, Vancouver, ISO, and other styles

39

Dale, Ashley S. "3D Object Detection Using Virtual Environment Assisted Deep Network Training." Thesis, 2020. http://hdl.handle.net/1805/24756.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)<br>An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1 data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.

APA, Harvard, Vancouver, ISO, and other styles

40

Bibi, Adel. "Advances in RGB and RGBD Generic Object Trackers." Thesis, 2016. http://hdl.handle.net/10754/609455.

Full text

Abstract:

Visual object tracking is a classical and very popular problem in computer vision with a plethora of applications such as vehicle navigation, human computer interface, human motion analysis, surveillance, auto-control systems and many more. Given the initial state of a target in the first frame, the goal of tracking is to predict states of the target over time where the states describe a bounding box covering the target. Despite numerous object tracking methods that have been proposed in recent years [1-4], most of these trackers suffer a degradation in performance mainly because of several challenges that include illumination changes, motion blur, complex motion, out of plane rotation, and partial or full occlusion, while occlusion is usually the most contributing factor in degrading the majority of trackers, if not all of them. This thesis is devoted to the advancement of generic object trackers tackling different challenges through different proposed methods. The work presented propose four new state-of-the-art trackers. One of which is 3D based tracker in a particle filter framework where both synchronization and registration of RGB and depth streams are adjusted automatically, and three works in correlation filters that achieve state-of-the-art performance in terms of accuracy while maintaining reasonable speeds.

APA, Harvard, Vancouver, ISO, and other styles

41

Huang, Chi-Wen, and 黃季雯. "Tone characteristic description and color gamut variation of RGB/RGBW OLED display under ambient lighting conditions." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/54296669083305330876.

Full text

Abstract:

碩士<br>國立臺灣科技大學<br>光電工程研究所<br>101<br>There are three topics discussed in this thesis. The first topic majorly studied OLED (Organic Light-Emitting Diode) display characteristic under various ambient lighting conditions by using olorimeter. After the measurement, the linear piecewise formula in conjunction with the flare correction is used to describe TRC(tone response curve) under various ambient lighting conditions. The TRCs will be predicted more precisely. The second topic discussed three kinds of RGB to RGBW conversions and used Matlab to fullfill three conversion algorithms. The stimulation included red channel, green channel, blue channel and neutral channel based on different images. The difference images related to their corresponding conversion algorithms will be discussed subsequently. Finally, combined the flare values measuring in the first experiment with the gamut volumes. The cases were discussed whether normalized RGB stage and linear RGB stage could influence the color gamut volumes in RGBW OLED display. The results showed normalized RGB stage can reduce gamut volumes and the linear RGB stage can not.

APA, Harvard, Vancouver, ISO, and other styles

42

Xia, Lu active 21st century. "Recognizing human activity using RGBD data." Thesis, 2014. http://hdl.handle.net/2152/24981.

Full text

Abstract:

Traditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.<br>text

APA, Harvard, Vancouver, ISO, and other styles

43

Chen, Yang-Sheng, and 陳煬升. "3D Foot Measurement Using RGBD Cameras." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/12561454887663388318.

Full text

Abstract:

碩士<br>國立臺灣大學<br>資訊網路與多媒體研究所<br>102<br>Selecting a pair of comfortable shoes is a big issue in our life. However, there are few methods or system that can estimate our foot accurately to help us find the shoes. In this paper, we develop a fast foot dimension measurement system to provide useful information for selecting the shoes. Our foot measurement system uses a RGB-D Camera to obtain 3-D foot point cloud data at different view angles. We implement a fast and reliable automatic registration method to reconstruct a 3-D foot model. Based on the reconstructed 3-D foot model, we can compute the foot parameters, e.g. width, length, girth et.al. Experiences are conducted to evaluate the performance of our system. The results confirm that the performance of our system can provide a robust and accuracy foot dimension value.

APA, Harvard, Vancouver, ISO, and other styles

44

Engelbrecht, Bryce. "Object recognition beyond RGB." Thesis, 2020. https://hdl.handle.net/10539/31398.

Full text

Abstract:

A dissertation submitted to the Faculty of Science, in fulfilment of the requirements for the degree of Masters of Science in the Wits Institute of Data Science (WIDS) School of Computer Science and Applied Mathematics, 2020<br>Object recognition and the subproblem of land cover classification has been a key focus of computer vision research. An increasing number of devices have begun supporting the capture of images with additional bands beyond the standard RGB bands, including depth and other spectra such as near infrared. There is an opportunity to study the use of RGB images with depth and multispectral images to improve the accuracy of the object recognition and land cover classification. We do this by taking existing state-of-the-art object recognition models and modifying them to work with RGB images with depth. For land cover classification we present a novel model, LandNet, which allows varying the number of backbone feature extractors and the image bands in each. We also study the impact of adding the additional depth information, bands and the use of multiple feature extractors on the training and inference times of the models. We find that adding depth data did not show any benefits for object recognition but has little effect on the training and inference times. Utilizing multispectral images allows for improvements for the accuracy of land cover classification. Adding the additional bands in single feature extractor has no effect on the training and inference times, however using multiple feature extractors does increase the training and inference times. The results leads us to conclude that depth data has the potential to improve object recognition accuracy but a larger dataset than SUN RGB-D is required to demonstrate improved performance when using RGB and depth images. We can conclude that using multispectral images for land cover classification has tangible benefits<br>CK2021

APA, Harvard, Vancouver, ISO, and other styles

45

Wang, Chun-Yi, and 王淳毅. "Color Interpolation of RGBW Color Filter Array." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/77574596316362790967.

Full text

Abstract:

碩士<br>國立臺灣大學<br>資訊工程學研究所<br>101<br>In this thesis, we propose a novel interpolation on RGBW color filter array. Moreover, this method reduces the problem of color-alias and requires low computation. We introduce the background first, and review some traditional interpolation schemes. Then we discuss some problems in color interpolations such as edge-blurred and color-alias effects. Next, we will explain our proposed method in Chapter 3, and show the results in Chapter 4. We will compare different interpolations on Peak Signal-to-Noise Ratio (PSNR) and processing time. Last, conclusion and future works will be presented in Chapter 5.

APA, Harvard, Vancouver, ISO, and other styles

46

Gehrke, Ralf. "RGBI-Bilddaten mit RPAS und FOVEON Sensoren." Doctoral thesis, 2015. https://repositorium.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2015081013517.

Full text

Abstract:

In den letzten Jahren hat sich der Einsatz von Remotely Piloted Aerial Systems (RPAS) zur Geodatenerfassung immer mehr verbreitet. Dabei liegt das Hauptaugenmerk auf dem Einsatz von herkömmlichen Kameras mit drei Kanälen (Rot, Grün, Blau (RGB)). Mit dieser Kombination werden 3D-Daten und Orthophotomosaike erzeugt. Ein weiteres, jedoch kleineres Augenmerk liegt auf der Entwicklung von Sensoren mit mehr als drei Kanälen für Fragestellungen der Fernerkundung. Auffallend ist dabei, dass die Hersteller viel Arbeit in die radiometrische Qualität und die spektrale Auflösung der Sensoren stecken, deren geometrische Qualität ganz im Gegensatz zur herkömmlichen RGB-Kamera aber vernachlässigen. Die vorliegende Arbeit verfolgt einen anderen Ansatz: Ein bestehendes System wird unter der Berücksichtigung eines begrenzten Budgets (low-cost) und dem Erhalt der hohen geometrischen Abbildungsqualität um einen vierten Kanal im nahen Infrarot ergänzt. Anwendungsmöglichkeiten ergeben sich überall dort, wo gleichzeitig das Vorhandensein oder der Zustand von Vegetation und die Geometrie mit RPAS erfasst werden soll. Sigma Kameras mit Foveon® Sensoren sind bereits für ihre hohe Abbildungsqualität bekannt. Durch den Ausbau des Infrarot-Sperrfilters kann diese Kamera, ebenso wie fast alle Sensoren auf Silizium-Basis, für die Erfassung des Near Infrared (NIR) modifiziert werden. Mit der Kombination einer RGB- und einer NIR-Kamera zu einem Sensorkopf und einer selbst entwickelten Datenverarbeitung können Vierkanalbilddaten erzeugt werden, die die hohe Abbildungsqualität der Sigma Kamera und gleichzeitig die Zusatzinformation im NIR besitzen. Die Weiterverarbeitung in einer modernen Photogrammetriesoftware mit einem Structure from Motion (SFM)-Ansatz verspricht ein effizientes und praxisgerechtes Arbeiten. Der entwickelte Sensorkopf wird in zwei Einsatzszenarien zur Anwendung gebracht. In der Luftbildarchäologie kann der Zeitraum der Erfassung von Bodendenkmälern mit RPAS durch diesen Sensorkopf erheblich erweitert werden. Das Bodendenkmal ist sowohl im hochaufgelösten Oberflächenmodell als auch im NIR deutlicher zu erkennen als in einer herkömmlichen RGB-Aufnahme. Bei der Filterung eines Oberflächenmodells zu einem Geländemodell konnte gezeigt werden, dass die Verwendung des NIR den herkömmlichen Einsatz von neigungsbasierten Filtern sinnvoll ergänzt und zu besseren Ergebnissen führt. Durch den Rückgriff auf gebrauchte Sigma Kompaktkameras und weit verbreitete Software mit SFM-Algorithmen konnte der low-cost Ansatz voll erfüllt werden. Die radiometrische Qualität wurde untersucht und es wurde festgestellt, dass diese nicht an den Stand der Technik von speziellen und teuren Algorithmen und Sensoren heranreicht. Für die gezeigten Anwendungen ist sie jedoch als ausreichend zu bewerten.

APA, Harvard, Vancouver, ISO, and other styles

47

Peng, Hsiao-Chia, and 彭小佳. "3D Face Reconstruction on RGB and RGB-D Images for Recognition Across Pose." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/88142215912683274078.

Full text

Abstract:

博士<br>國立臺灣科技大學<br>機械工程系<br>103<br>Face recognition across pose is a challenging problem in computer vision. Two scenarios are considered in this thesis. One is the common setup with one single frontal facial image of each subject in the gallery set and the images of other poses in the probe set. The other considers a RGB-D image of the frontal face for each subject in the gallery, but the probe set is the same as in the previous case that only contains RGB images of other poses. The second scenario simulates the case that RGB-D camera can be available for user registration only and recognition can be performed on regular RGB images without the depth channel. Two approaches are proposed for handling the first scenario, one is holistic and the other is component-based. The former is extended from a face reconstruction approach and improved with different sets of landmarks for alignment and multiple reference models considered in the reconstruction phase. The latter focuses on the reconstruction of facial components obtained by the pose-invariant landmarks, and the recognition with different components considered at different poses. Such a component-based reconstruction for handling cross-pose recognition is rarely seen in the literature. Although the approach for handling the second scenario, i.e., the RGB-D based recognition, is partially similar to the approach for handling the first scenario, the novelty is on the handling of the depth readings corrupted by quantization noise, which are often encountered when the face is not close enough to the RGB-D camera at registration. An approach is proposed to resurface the corrupted depth map and substantially improve the recognition performance. All of the proposed approaches are evaluated on benchmark databases and proven comparable to state-of-the-art approaches.

APA, Harvard, Vancouver, ISO, and other styles

48

Jorge, Filipe Castanheira. "ANALYSIS OF AN RGBD CAMERA FOR MANHOLE INSPECTION." Master's thesis, 2019. http://hdl.handle.net/10400.8/4546.

Full text

Abstract:

As the service life of ducts and manholes reaches their end, there is a growing need to preserve the structures. In order to prevent casualties and service interruption, more frequent inspections are advised. To this day most of the inspections are still, manually made. These inspectors need to be highly qualified, and the inspections are done in an hazardous environment so, automating the inspection process would lead to a healthier workplace. Given the recent development in RGBD cameras, with smaller size factor, cost, and weight, this project aims to evaluate the application of one of the more recent models in a manhole inspection environment. Several analysis are done to assess the RGBD sensor performance for 3D model reconstruction, including the comparison with a ground-truth 3D model obtained using a laser point profile sensor with an industrial robot.

APA, Harvard, Vancouver, ISO, and other styles

49

Tsai, Yo-Hunmg, and 蔡友煌. "Color Control of RGB LED." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/58422936347529243592.

Full text

Abstract:

碩士<br>明志科技大學<br>機電工程研究所<br>98<br>This thesis presents the color mixing of red, green and blue light emitting diodes (LED).In order to verify the adopted additive color mixing,a MCU-based RGB-LED lamp,which uses an 8-bit microcontroller, HOLTEK 46R24,is built and a miniature fiber optic spectrometer, USB 4000, and accompanied Spectra Suite software produced by Ocean Optics Inc. are used to measure the experimental results. Based on the temperature characteristics of the used red, green and blue LEDs,color deviations arisen from temperature variations are successfully compensated.

APA, Harvard, Vancouver, ISO, and other styles

50

Wong, Yu Shiang, and 翁郁翔. "SMARTANNOTATOR: An Interactive Tool for Annotating Indoor RGBD Images." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/y34x98.

Full text

Abstract:

碩士<br>國立清華大學<br>資訊工程學系<br>103<br>RGBD images with high quality annotations, both in the form of geometric(i.e., segmentation) and structural (i.e., how do the segments mutually relate in 3D) information, provide valuable priors for a diverse range of applications in scene understanding and image manipulation. While it is now simple to acquire RGBD images, annotating them, automatically or manually, remains challenging. We present SmartAnnotator, an interactive system to facilitate annotating raw RGBD images. The system performs the tedious tasks of grouping pixels, creating potential abstracted cuboids, inferring object interactions in 3D, and generates an ordered list of hypotheses. The user simply has to flip through the suggestions for segment labels, finalize a selection, and the system updates the remaining hypotheses. As annotations are finalized, the process becomes simpler with fewer ambiguities to resolve. Moreover, as more scenes are annotated, the system makes better suggestions based on the structural and geometric priors learned from previous annotation sessions. We test the system on a large number of indoor scenes across different users and experimental settings, validate the results on existing benchmark datasets, and report significant improvements over low-level annotation alternatives.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!