Academic literature on the topic 'RGB-D object segmentation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'RGB-D object segmentation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "RGB-D object segmentation"

1

Shen, Xiaoke, and Ioannis Stamos. "3D Object Detection and Instance Segmentation from 3D Range and 2D Color Images." Sensors 21, no. 4 (February 9, 2021): 1213. http://dx.doi.org/10.3390/s21041213.

Full text
Abstract:
Instance segmentation and object detection are significant problems in the fields of computer vision and robotics. We address those problems by proposing a novel object segmentation and detection system. First, we detect 2D objects based on RGB, depth only, or RGB-D images. A 3D convolutional-based system, named Frustum VoxNet, is proposed. This system generates frustums from 2D detection results, proposes 3D candidate voxelized images for each frustum, and uses a 3D convolutional neural network (CNN) based on these candidates voxelized images to perform the 3D instance segmentation and object detection. Results on the SUN RGB-D dataset show that our RGB-D-based system’s 3D inference is much faster than state-of-the-art methods, without a significant loss of accuracy. At the same time, we can provide segmentation and detection results using depth only images, with accuracy comparable to RGB-D-based systems. This is important since our methods can also work well in low lighting conditions, or with sensors that do not acquire RGB images. Finally, the use of segmentation as part of our pipeline increases detection accuracy, while providing at the same time 3D instance segmentation.
APA, Harvard, Vancouver, ISO, and other styles
2

Yang, J., and Z. Kang. "INDOOR SEMANTIC SEGMENTATION FROM RGB-D IMAGES BY INTEGRATING FULLY CONVOLUTIONAL NETWORK WITH HIGHER-ORDER MARKOV RANDOM FIELD." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4 (September 19, 2018): 717–24. http://dx.doi.org/10.5194/isprs-archives-xlii-4-717-2018.

Full text
Abstract:
<p><strong>Abstract.</strong> Indoor scenes have the characteristics of abundant semantic categories, illumination changes, occlusions and overlaps among objects, which poses great challenges for indoor semantic segmentation. Therefore, we in this paper develop a method based on higher-order Markov random field model for indoor semantic segmentation from RGB-D images. Instead of directly using RGB-D images, we first train and perform RefineNet model only using RGB information for generating the high-level semantic information. Then, the spatial location relationship from depth channel and the spectral information from color channels are integrated as a prior for a marker-controlled watershed algorithm to obtain the robust and accurate visual homogenous regions. Finally, higher-order Markov random field model encodes the short-range context among the adjacent pixels and the long-range context within each visual homogenous region for refining the semantic segmentations. To evaluate the effectiveness and robustness of the proposed method, experiments were conducted on the public SUN RGB-D dataset. Experimental results indicate that compared with using RGB information alone, the proposed method remarkably improves the semantic segmentation results, especially at object boundaries.</p>
APA, Harvard, Vancouver, ISO, and other styles
3

Rafique, Adnan Ahmed, Ahmad Jalal, and Kibum Kim. "Automated Sustainable Multi-Object Segmentation and Recognition via Modified Sampling Consensus and Kernel Sliding Perceptron." Symmetry 12, no. 11 (November 23, 2020): 1928. http://dx.doi.org/10.3390/sym12111928.

Full text
Abstract:
Object recognition in depth images is challenging and persistent task in machine vision, robotics, and automation of sustainability. Object recognition tasks are a challenging part of various multimedia technologies for video surveillance, human–computer interaction, robotic navigation, drone targeting, tourist guidance, and medical diagnostics. However, the symmetry that exists in real-world objects plays a significant role in perception and recognition of objects in both humans and machines. With advances in depth sensor technology, numerous researchers have recently proposed RGB-D object recognition techniques. In this paper, we introduce a sustainable object recognition framework that is consistent despite any change in the environment, and can recognize and analyze RGB-D objects in complex indoor scenarios. Firstly, after acquiring a depth image, the point cloud and the depth maps are extracted to obtain the planes. Then, the plane fitting model and the proposed modified maximum likelihood estimation sampling consensus (MMLESAC) are applied as a segmentation process. Then, depth kernel descriptors (DKDES) over segmented objects are computed for single and multiple object scenarios separately. These DKDES are subsequently carried forward to isometric mapping (IsoMap) for feature space reduction. Finally, the reduced feature vector is forwarded to a kernel sliding perceptron (KSP) for the recognition of objects. Three datasets are used to evaluate four different experiments by employing a cross-validation scheme to validate the proposed model. The experimental results over RGB-D object, RGB-D scene, and NYUDv1 datasets demonstrate overall accuracies of 92.2%, 88.5%, and 90.5% respectively. These results outperform existing state-of-the-art methods and verify the suitability of the method.
APA, Harvard, Vancouver, ISO, and other styles
4

Novkovic, Tonci, Fadri Furrer, Marko Panjek, Margarita Grinvald, Roland Siegwart, and Juan Nieto. "CLUBS: An RGB-D dataset with cluttered box scenes containing household objects." International Journal of Robotics Research 38, no. 14 (September 23, 2019): 1538–48. http://dx.doi.org/10.1177/0278364919875221.

Full text
Abstract:
With the progress of machine learning, the demand for realistic data with high-quality annotations has been thriving. In order to generalize well, considerable amounts of data are required, especially realistic ground-truth data, for tasks such as object detection and scene segmentation. Such data can be difficult, time-consuming, and expensive to collect. This article presents a dataset of household objects and box scenes commonly found in warehouse environments. The dataset was obtained using a robotic setup with four different cameras. It contains reconstructed objects and scenes, as well as raw RGB and depth images, camera poses, pixel-wise labels of objects directly in the RGB images, and 3D bounding boxes with poses in the world frame. Furthermore, raw calibration data are provided, together with the intrinsic and extrinsic parameters for all the sensors. By providing object labels as pixel-wise masks, 3D, and 2D object bounding boxes, this dataset is useful for both object recognition and instance segmentation. The realistic scenes provided will serve for learning-based algorithms applied to scenarios where boxes of objects are often found, such as in the logistics sector. Both the dataset and the tools for data processing are published and available online.
APA, Harvard, Vancouver, ISO, and other styles
5

Schwarz, Max, Anton Milan, Arul Selvam Periyasamy, and Sven Behnke. "RGB-D object detection and semantic segmentation for autonomous manipulation in clutter." International Journal of Robotics Research 37, no. 4-5 (June 20, 2017): 437–51. http://dx.doi.org/10.1177/0278364917713117.

Full text
Abstract:
Autonomous robotic manipulation in clutter is challenging. A large variety of objects must be perceived in complex scenes, where they are partially occluded and embedded among many distractors, often in restricted spaces. To tackle these challenges, we developed a deep-learning approach that combines object detection and semantic segmentation. The manipulation scenes are captured with RGB-D cameras, for which we developed a depth fusion method. Employing pretrained features makes learning from small annotated robotic datasets possible. We evaluate our approach on two challenging datasets: one captured for the Amazon Picking Challenge 2016, where our team NimbRo came in second in the Stowing and third in the Picking task; and one captured in disaster-response scenarios. The experiments show that object detection and semantic segmentation complement each other and can be combined to yield reliable object perception.
APA, Harvard, Vancouver, ISO, and other styles
6

Thermos, Spyridon, Gerasimos Potamianos, and Petros Daras. "Joint Object Affordance Reasoning and Segmentation in RGB-D Videos." IEEE Access 9 (2021): 89699–713. http://dx.doi.org/10.1109/access.2021.3090471.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kang, Xujie, Jing Li, Xiangtao Fan, Hongdeng Jian, and Chen Xu. "Object-Level Semantic Map Construction for Dynamic Scenes." Applied Sciences 11, no. 2 (January 11, 2021): 645. http://dx.doi.org/10.3390/app11020645.

Full text
Abstract:
Visual simultaneous localization and mapping (SLAM) is challenging in dynamic environments as moving objects can impair camera pose tracking and mapping. This paper introduces a method for robust dense bject-level SLAM in dynamic environments that takes a live stream of RGB-D frame data as input, detects moving objects, and segments the scene into different objects while simultaneously tracking and reconstructing their 3D structures. This approach provides a new method of dynamic object detection, which integrates prior knowledge of the object model database constructed, object-oriented 3D tracking against the camera pose, and the association between the instance segmentation results on the current frame data and an object database to find dynamic objects in the current frame. By leveraging the 3D static model for frame-to-model alignment, as well as dynamic object culling, the camera motion estimation reduced the overall drift. According to the camera pose accuracy and instance segmentation results, an object-level semantic map representation was constructed for the world map. The experimental results obtained using the TUM RGB-D dataset, which compares the proposed method to the related state-of-the-art approaches, demonstrating that our method achieves similar performance in static scenes and improved accuracy and robustness in dynamic scenes.
APA, Harvard, Vancouver, ISO, and other styles
8

Kang, Xujie, Jing Li, Xiangtao Fan, Hongdeng Jian, and Chen Xu. "Object-Level Semantic Map Construction for Dynamic Scenes." Applied Sciences 11, no. 2 (January 11, 2021): 645. http://dx.doi.org/10.3390/app11020645.

Full text
Abstract:
Visual simultaneous localization and mapping (SLAM) is challenging in dynamic environments as moving objects can impair camera pose tracking and mapping. This paper introduces a method for robust dense bject-level SLAM in dynamic environments that takes a live stream of RGB-D frame data as input, detects moving objects, and segments the scene into different objects while simultaneously tracking and reconstructing their 3D structures. This approach provides a new method of dynamic object detection, which integrates prior knowledge of the object model database constructed, object-oriented 3D tracking against the camera pose, and the association between the instance segmentation results on the current frame data and an object database to find dynamic objects in the current frame. By leveraging the 3D static model for frame-to-model alignment, as well as dynamic object culling, the camera motion estimation reduced the overall drift. According to the camera pose accuracy and instance segmentation results, an object-level semantic map representation was constructed for the world map. The experimental results obtained using the TUM RGB-D dataset, which compares the proposed method to the related state-of-the-art approaches, demonstrating that our method achieves similar performance in static scenes and improved accuracy and robustness in dynamic scenes.
APA, Harvard, Vancouver, ISO, and other styles
9

Xie, Qian, Oussama Remil, Yanwen Guo, Meng Wang, Mingqiang Wei, and Jun Wang. "Object Detection and Tracking Under Occlusion for Object-Level RGB-D Video Segmentation." IEEE Transactions on Multimedia 20, no. 3 (March 2018): 580–92. http://dx.doi.org/10.1109/tmm.2017.2751965.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ge, Yanliang, Cong Zhang, Kang Wang, Ziqi Liu, and Hongbo Bi. "WGI-Net: A weighted group integration network for RGB-D salient object detection." Computational Visual Media 7, no. 1 (January 8, 2021): 115–25. http://dx.doi.org/10.1007/s41095-020-0200-x.

Full text
Abstract:
AbstractSalient object detection is used as a pre-process in many computer vision tasks (such as salient object segmentation, video salient object detection, etc.). When performing salient object detection, depth information can provide clues to the location of target objects, so effective fusion of RGB and depth feature information is important. In this paper, we propose a new feature information aggregation approach, weighted group integration (WGI), to effectively integrate RGB and depth feature information. We use a dual-branch structure to slice the input RGB image and depth map separately and then merge the results separately by concatenation. As grouped features may lose global information about the target object, we also make use of the idea of residual learning, taking the features captured by the original fusion method as supplementary information to ensure both accuracy and completeness of the fused information. Experiments on five datasets show that our model performs better than typical existing approaches for four evaluation metrics.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "RGB-D object segmentation"

1

Lin, Xiao. "Semantic and generic object segmentation for scene analysis using RGB-D Data." Doctoral thesis, Universitat Politècnica de Catalunya, 2018. http://hdl.handle.net/10803/620762.

Full text
Abstract:
In this thesis, we study RGB-D based segmentation problems from different perspectives in terms of the input data. Apart from the basic photometric and geometric information contained in the RGB-D data, also semantic and temporal information are usually considered in an RGB-D based segmentation system. The first part of this thesis focuses on an RGB-D based semantic segmentation problem, where the predefined semantics and annotated training data are available. First, we review how RGB-D data has been exploited in the state of the art to help training classifiers in a semantic segmentation tasks. Inspired by these works, we follow a multi-task learning schema, where semantic segmentation and depth estimation are jointly tackled in a Convolutional Neural Network (CNN). Since semantic segmentation and depth estimation are two highly correlated tasks, approaching them jointly can be mutually beneficial. In this case, depth information along with the segmentation annotation in the training data helps better defining the target of the training process of the classifier, instead of feeding the system blindly with an extra input channel. We design a novel hybrid CNN architecture by investigating the common attributes as well as the distinction for depth estimation and semantic segmentation. The proposed architecture is tested and compared with state of the art approaches in different datasets. Although outstanding results are achieved in semantic segmentation, the limitations in these approaches are also obvious. Semantic segmentation strongly relies on predefined semantics and a large amount of annotated data, which may not be available in more general applications. On the other hand, classical image segmentation tackles the segmentation task in a more general way. But classical approaches hardly obtain object level segmentation due to the lack of higher level knowledge. Thus, in the second part of this thesis, we focus on an RGB-D based generic instance segmentation problem where temporal information is available from the RGB-D video while no semantic information is provided. We present a novel generic segmentation approach for 3D point cloud video (stream data) thoroughly exploiting the explicit geometry and temporal correspondences in RGB-D. The proposed approach is validated and compared with state of the art generic segmentation approaches in different datasets. Finally, in the third part of this thesis, we present a method which combines the advantages in both semantic segmentation and generic segmentation, where we discover object instances using the generic approach and model them by learning from the few discovered examples by applying the approach of semantic segmentation. To do so, we employ the one shot learning technique, which performs knowledge transfer from a generally trained model to a specific instance model. The learned instance models generate robust features in distinguishing different instances, which is fed to the generic segmentation approach to perform improved segmentation. The approach is validated with experiments conducted on a carefully selected dataset.
En aquesta tesi, estudiem problemes de segmentació basats en RGB-D des de diferents perspectives pel que fa a les dades d'entrada. A part de la informació fotomètrica i geomètrica bàsica que conté les dades RGB-D, també es considera normalment informació semàntica i temporal en un sistema de segmentació basat en RGB-D. La primera part d'aquesta tesi se centra en un problema de segmentació semàntica basat en RGB-D, on hi ha disponibles les dades semàntiques predefinides i la informació d'entrenament anotada. En primer lloc, revisem com les dades RGB-D s'han explotat en l'estat de l'art per ajudar a entrenar classificadors en tasques de segmentació semàntica. Inspirats en aquests treballs, seguim un esquema d'aprenentatge multidisciplinar, on la segmentació semàntica i l'estimació de profunditat es tracten conjuntament en una Xarxa Neural Convolucional (CNN). Atès que la segmentació semàntica i l'estimació de profunditat són dues tasques altament correlacionades, l'aproximació a les mateixes pot ser mútuament beneficiosa. En aquest cas, la informació de profunditat juntament amb l'anotació de segmentació en les dades d'entrenament ajuda a definir millor l'objectiu del procés d'entrenament del classificador, en comptes d'alimentar el sistema cegament amb un canal d'entrada addicional. Dissenyem una nova arquitectura híbrida CNN investigant els atributs comuns, així com la distinció per a l'estimació de profunditat i la segmentació semàntica. L'arquitectura proposada es prova i es compara amb l'estat de l'art en diferents conjunts de dades. Encara que s'obtenen resultats excel·lents en la segmentació semàntica, les limitacions d'aquests enfocaments també són evidents. La segmentació semàntica es recolza fortament en la semàntica predefinida i una gran quantitat de dades anotades, que potser no estaran disponibles en aplicacions més generals. D'altra banda, la segmentació d'imatge clàssica aborda la tasca de segmentació d'una manera més general. Però els enfocaments clàssics gairebé no aconsegueixen la segmentació a nivell d'objectes a causa de la manca de coneixements de nivell superior. Així, en la segona part d'aquesta tesi, ens centrem en un problema de segmentació d'instàncies genèric basat en RGB-D, on la informació temporal està disponible a partir del vídeo RGB-D, mentre que no es proporciona informació semàntica. Presentem un nou enfocament genèric de segmentació per a vídeos de núvols de punts 3D explotant a fons la geometria explícita i les correspondències temporals en RGB-D. L'enfocament proposat es valida i es compara amb enfocaments de segmentació genèrica de l'estat de l'art en diferents conjunts de dades. Finalment, en la tercera part d'aquesta tesi, presentem un mètode que combina els avantatges tant en la segmentació semàntica com en la segmentació genèrica, on descobrim instàncies de l'objecte utilitzant l'enfocament genèric i les modelem mitjançant l'aprenentatge dels pocs exemples descoberts aplicant l'enfocament de segmentació semàntica. Per fer-ho, utilitzem la tècnica d'aprenentatge d'un tir, que realitza la transferència de coneixement d'un model entrenat de forma genèrica a un model d'instància específic. Els models apresos d'instància generen funcions robustes per distingir diferents instàncies, que alimenten la segmentació genèrica de segmentació per a la seva millora. L'enfocament es valida amb experiments realitzats en un conjunt de dades acuradament seleccionat.
APA, Harvard, Vancouver, ISO, and other styles
2

Finman, Ross Edward. "Real-time large object category recognition using robust RGB-D segmentation features." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/79218.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2013.
"February 2013." Cataloged from PDF version of thesis.
Includes bibliographical references (p. 77-80).
This thesis looks at the problem of large object category recognition for use in robotic systems. While many algorithms exist for object recognition, category recognition remains a challenge within robotics, particularly with the robustness and real-time constraints within robotics. Our system addresses category recognition by treating it as a segmentation problem and using the resulting segments to learn and detect large objects based on their 3D characteristics. The first part of this thesis examines how to efficiently do unsupervised segmentation of an RGB-D image in a way that is consistent across wide viewpoint and scale variance, and creating features from the resulting segments. The second part of this thesis explores how to do robust data association to keep temporally consistent segments between frames. Our higher-level module filters and matches relevant segments to a learned database of categories and outputs a pixel-accurate, labeled object mask. Our system has a run time that is nearly linear with the number of RGB-D samples and we evaluate it in a real-time robotic application.
by Ross Edward Finman.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
3

Wagh, Ameya Yatindra. "A Deep 3D Object Pose Estimation Framework for Robots with RGB-D Sensors." Digital WPI, 2019. https://digitalcommons.wpi.edu/etd-theses/1287.

Full text
Abstract:
The task of object detection and pose estimation has widely been done using template matching techniques. However, these algorithms are sensitive to outliers and occlusions, and have high latency due to their iterative nature. Recent research in computer vision and deep learning has shown great improvements in the robustness of these algorithms. However, one of the major drawbacks of these algorithms is that they are specific to the objects. Moreover, the estimation of pose depends significantly on their RGB image features. As these algorithms are trained on meticulously labeled large datasets for object's ground truth pose, it is difficult to re-train these for real-world applications. To overcome this problem, we propose a two-stage pipeline of convolutional neural networks which uses RGB images to localize objects in 2D space and depth images to estimate a 6DoF pose. Thus the pose estimation network learns only the geometric features of the object and is not biased by its color features. We evaluate the performance of this framework on LINEMOD dataset, which is widely used to benchmark object pose estimation frameworks. We found the results to be comparable with the state of the art algorithms using RGB-D images. Secondly, to show the transferability of the proposed pipeline, we implement this on ATLAS robot for a pick and place experiment. As the distribution of images in LINEMOD dataset and the images captured by the MultiSense sensor on ATLAS are different, we generate a synthetic dataset out of very few real-world images captured from the MultiSense sensor. We use this dataset to train just the object detection networks used in the ATLAS Robot experiment.
APA, Harvard, Vancouver, ISO, and other styles
4

Ambrus, Rares. "Unsupervised construction of 4D semantic maps in a long-term autonomy scenario." Doctoral thesis, KTH, Centrum för Autonoma System, CAS, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-215323.

Full text
Abstract:
Robots are operating for longer times and collecting much more data than just a few years ago. In this setting we are interested in exploring ways of modeling the environment, segmenting out areas of interest and keeping track of the segmentations over time, with the purpose of building 4D models (i.e. space and time) of the relevant parts of the environment. Our approach relies on repeatedly observing the environment and creating local maps at specific locations. The first question we address is how to choose where to build these local maps. Traditionally, an operator defines a set of waypoints on a pre-built map of the environment which the robot visits autonomously. Instead, we propose a method to automatically extract semantically meaningful regions from a point cloud representation of the environment. The resulting segmentation is purely geometric, and in the context of mobile robots operating in human environments, the semantic label associated with each segment (i.e. kitchen, office) can be of interest for a variety of applications. We therefore also look at how to obtain per-pixel semantic labels given the geometric segmentation, by fusing probabilistic distributions over scene and object types in a Conditional Random Field. For most robotic systems, the elements of interest in the environment are the ones which exhibit some dynamic properties (such as people, chairs, cups, etc.), and the ability to detect and segment such elements provides a very useful initial segmentation of the scene. We propose a method to iteratively build a static map from observations of the same scene acquired at different points in time. Dynamic elements are obtained by computing the difference between the static map and new observations. We address the problem of clustering together dynamic elements which correspond to the same physical object, observed at different points in time and in significantly different circumstances. To address some of the inherent limitations in the sensors used, we autonomously plan, navigate around and obtain additional views of the segmented dynamic elements. We look at methods of fusing the additional data and we show that both a combined point cloud model and a fused mesh representation can be used to more robustly recognize the dynamic object in future observations. In the case of the mesh representation, we also show how a Convolutional Neural Network can be trained for recognition by using mesh renderings. Finally, we present a number of methods to analyse the data acquired by the mobile robot autonomously and over extended time periods. First, we look at how the dynamic segmentations can be used to derive a probabilistic prior which can be used in the mapping process to further improve and reinforce the segmentation accuracy. We also investigate how to leverage spatial-temporal constraints in order to cluster dynamic elements observed at different points in time and under different circumstances. We show that by making a few simple assumptions we can increase the clustering accuracy even when the object appearance varies significantly between observations. The result of the clustering is a spatial-temporal footprint of the dynamic object, defining an area where the object is likely to be observed spatially as well as a set of time stamps corresponding to when the object was previously observed. Using this data, predictive models can be created and used to infer future times when the object is more likely to be observed. In an object search scenario, this model can be used to decrease the search time when looking for specific objects.

QC 20171009

APA, Harvard, Vancouver, ISO, and other styles
5

Silva, João Gonçalo Pires Ferreira da. "Object Segmentation and Classification from RGB-D Data." Master's thesis, 2017. http://hdl.handle.net/10316/83024.

Full text
Abstract:
Dissertação de Mestrado Integrado em Engenharia Mecânica apresentada à Faculdade de Ciências e Tecnologia
A classificação de objetos é um fator chave no desenvolvimento de robôs autónomos. A classificação de objetos pode ser grandemente melhorada com uma anterior segmentação e extração de características confiáveis. Com isso em mente, o principal objetivo desta dissertação é implementar um algoritmo de classificação de objetos, capaz de classificar objetos do conjunto de objetos e modelos de Yale-CMU-Berkeley (YCB), através do uso de um novo método de extração de características não supervisionado a partir de dados de vermelho, verde, azul e profundidade (RGB-D) e de redes neuronais artificiais do tipo feedforward (FFANNs). No método aqui apresentado, após a aquisição de dados a partir de uma câmara RGB-D, o ruído é removido e os objetos na cena são isolados. Para cada objeto isolado, agrupamento k-means é aplicado para extrair uma cor global e três cores principais. Três pontuações são calculadas com base no encaixe de formas primitivas (cilindro, esfera ou prisma retangular). As dimensões do objeto e volume são estimados calculando o volume da melhor forma primitiva ajustada anteriormente. De seguida, com essas características, FFANNs são treinadas e usadas para classificar esses objetos. Testes experimentais foram realizados em 20 objetos, do conjunto de objetos e modelos de YCB e os resultados indicam que este algoritmo tem uma precisão de reconhecimento de 96%, com cinco objetos no espaço de trabalho ao mesmo tempo e em poses aleatórias. Também é desenvolvido, um método de cálculo da localização de um objeto, com base na localização do centro geométrico, da melhor forma primitiva ajustada anteriormente.
Object classification is a key factor in the development of autonomous robots. Object classification can be greatly improved with previous reliable segmentation and feature extraction. With this in mind, the main objective of this dissertation is to implement an object classification algorithm, capable of classifying objects from the Yale-CMU-Berkeley (YCB) object and model set, through the use of a novel unsupervised feature extraction method from red, green, blue and depth (RGB-D) data and feedforward artificial neural networks (FFANNs). In the method presented here, after the acquisition of data from an RGB-D camera, noise is removed and the objects in the scene are isolated. For each isolated object, k-means clustering is applied to extract a global main colour and three main colours. Three scores are computed based on the fitting of primitive shapes (cylinder, sphere or rectangular prism). Object dimensions and volume are estimated by calculating the volume of the best primitive shape previously fitted. Then with these features, FFANNs are trained and used to classify these objects. Experimental tests were carried out in 20 objects, from the YCB object and model set and results indicate that this algorithm has a recognition accuracy of 96%, with five objects in the workspace at the same time and in random poses. Also, a method of calculating the location of an object, based on the location of the geometric centre, of the best primitive shape previously fitted is developed.
APA, Harvard, Vancouver, ISO, and other styles
6

Bicho, Dylan Jordão. "Detecção e Seguimento de Objectos em Grelhas de Ocupação para Aplicações em Realidade Aumentada." Master's thesis, 2018. http://hdl.handle.net/10316/86377.

Full text
Abstract:
Dissertação de Mestrado Integrado em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia
Ao longo dos últimos anos, os sistemas de Realidade Virtual e Realidade Aumentada têm vindo a ser desenvolvidos com o intuito de fornecer ao utilizador uma experiência totalmente imersiva através de uma estimulação sensorial artificial, trazendo inúmeros benefícios em várias áreas desde a saúde à educação. Contudo, estes sistemas encontram-se ainda limitados por diversos fatores: uma representação não realista da cena, falta de personalização e flexibilidade, viabilidade financeira, desconforto físico e psicológico dos utilizadores causando experiências nauseantes, entre outros. Estes também exigem que o utilizador se desloque num espaço vazio ou muito limitado pois não recriam o ambiente físico em que o utilizador se move num ambiente virtual com uma relação um-para-um (tanto nos movimentos efetuados como na interação com objetos presentes).No entanto, o desenvolvimento de tecnologias no domínio dos microprocessadores e do processamento gráfico, bem como o aparecimento de sensores de captura de informação tridimensional de baixo custo, mais eficientes para o mapeamento de cenários reais, tais como as Microsoft Kinect v2, têm vindo a tornar estes sistemas mais viáveis financeiramente. Nestes sistemas, uma boa representação tridimensional do ambiente é uma tarefa essencial, pois o seguimento de objetos e dos utilizadores é uma das componentes chave deste processo. Um pré-processamento da informação sensorial extraída dos sensores facilita este processo. Aplicando técnicas de seguimento a um objeto, é possível estimar a sua localização e a sua velocidade bem como prever futuros estados do mesmo.Nesta tese é proposto um sistema modular para a representação de um cenário tridimensional através de grelhas de ocupação recorrendo à informação sensorial de quatro Microsoft Kinects v2. Este processo pode ser dividido essencialmente em três módulos: primeiro, é feita a captura dos dados sensoriais das câmaras sendo posteriormente aplicadas técnicas para filtrar o ruído existente e para remover a informação relativa ao plano de fundo do cenário; depois, são aplicadas técnicas para a segmentação da nuvem de pontos construída; e finalmente, são aplicados filtros Bayesianos (tanto filtros de partículas, como filtros de Kalman) para o seguimento de todos os objetos e da cabeça de todos os utilizadores presentes no cenários. Deste modo, é obtida uma estimativa para a localização e para a velocidade instantânea dos objetos, bem como uma estimativa da sua próxima localização.O processo supramencionado foi sujeito a uma série de testes realizados em situações particularmente exigentes, sendo os resultados qualitativos obtidos apresentados neste documento. Os resultados demonstram que o sistema proposto é capaz de realizar o seguimento de qualquer objeto presente na cena, estando este limitado porém no caso de ocorrer uma interação com um objeto dinâmico. Relativamente ao módulo de seguimento de cabeça, este demonstrou ser robusto e aplicável em tempo real.
Over recent years, Virtual Reality and Augmented Reality systems have been developed with the mission of giving a user a completely immersive experience through artificial stimulation of the user’s senses, and have brought countless benefits in several areas such as health and education. However, these systems are still limited by different challenging factors: absence of a realistic representation of the real world, lack of customization and flexibility, financial viability, physical and psychological discomfort causing nauseating effects, among others. These systems also demand that the user moves in an empty or very limited space, given that they do not recreate the physical environment where the user is moving into the virtual representation in a one-to-one nature (both in user movements as well as objects that are present).In spite of these facts, the technological advances in the domains of microprocessors and graphical processing, as well as the upcoming low-cost and efficient 3D data capturing sensors such as the Microsoft Kinect v2 (useful for real scenario reconstructions), have made such systems more financially viable. In these systems, a good 3D representation of the environment is key to success, seeing that object and user tracking is one of the most important steps of this process. A pre-processing step of the extracted sensory data makes this process easier. Applying tracking techniques to the present objects, it is possible to know the location and velocity of any given object, as well as estimate its future states.In this thesis a modular system is presented for the accurate representation of a real-world scenario as 3D occupancy grids using sensory data from four Microsoft Kinect v2. This process can be divided into three essential modules: first, sensory data is captured from the cameras and image processing techniques are applied to filter out noise and information related to the background of the environment; next, 3D segmentation techniques are applied to the constructed point clouds; finally, Bayesian filters (both particle filters as well as Kalman filters) are applied to track all the objects in the scene, as well as the heads of all the users. In this way, an estimation of all relevant objects and users location and instantaneous velocity, as well as their next location, is obtained.The aforementioned process was subject to a series of particularly challenging tests, with results of these qualitative tests presented in this document. The results show that the proposed system is capable of correctly tracking any object present in the scene (being however limited by a possible interaction between dynamic objects). In addition, the user head tracking module showed to be robust and deployable in a real-time application.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "RGB-D object segmentation"

1

Toscana, Giorgio, Stefano Rosa, and Basilio Bona. "Fast Graph-Based Object Segmentation for RGB-D Images." In Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016, 42–58. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-56991-8_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Chao, Sheng Liu, Jianhua Zhang, Yuan Feng, and Shengyong Chen. "RGB-D Based Object Segmentation in Severe Color Degraded Environment." In Communications in Computer and Information Science, 465–76. Singapore: Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-7305-2_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Janus, Piotr, Tomasz Kryjak, and Marek Gorgon. "Foreground Object Segmentation in RGB–D Data Implemented on GPU." In Advances in Intelligent Systems and Computing, 809–20. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-50936-1_68.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Schneider, Lukas, Manuel Jasch, Björn Fröhlich, Thomas Weber, Uwe Franke, Marc Pollefeys, and Matthias Rätsch. "Multimodal Neural Networks: RGB-D for Semantic Segmentation and Object Detection." In Image Analysis, 98–109. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-59126-1_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Gupta, Saurabh, Ross Girshick, Pablo Arbeláez, and Jitendra Malik. "Learning Rich Features from RGB-D Images for Object Detection and Segmentation." In Computer Vision – ECCV 2014, 345–60. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-10584-0_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Philipsen, Mark Philip, Anders Jørgensen, Sergio Escalera, and Thomas B. Moeslund. "RGB-D Segmentation of Poultry Entrails." In Articulated Motion and Deformable Objects, 168–74. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-41778-3_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ombado Ouma, Yashon. "On the Use of Low-Cost RGB-D Sensors for Autonomous Pothole Detection with Spatial Fuzzy c-Means Segmentation." In Geographic Information Systems in Geospatial Intelligence. IntechOpen, 2020. http://dx.doi.org/10.5772/intechopen.88877.

Full text
Abstract:
The automated detection of pavement distress from remote sensing imagery is a promising but challenging task due to the complex structure of pavement surfaces, in addition to the intensity of non-uniformity, and the presence of artifacts and noise. Even though imaging and sensing systems such as high-resolution RGB cameras, stereovision imaging, LiDAR and terrestrial laser scanning can now be combined to collect pavement condition data, the data obtained by these sensors are expensive and require specially equipped vehicles and processing. This hinders the utilization of the potential efficiency and effectiveness of such sensor systems. This chapter presents the potentials of the use of the Kinect v2.0 RGB-D sensor, as a low-cost approach for the efficient and accurate pothole detection on asphalt pavements. By using spatial fuzzy c-means (SFCM) clustering, so as to incorporate the pothole neighborhood spatial information into the membership function for clustering, the RGB data are segmented into pothole and non-pothole objects. The results demonstrate the advantage of complementary processing of low-cost multisensor data, through channeling data streams and linking data processing according to the merits of the individual sensors, for autonomous cost-effective assessment of road-surface conditions using remote sensing technology.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "RGB-D object segmentation"

1

Wang, Fan, and Kris Hauser. "In-hand Object Scanning via RGB-D Video Segmentation." In 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019. http://dx.doi.org/10.1109/icra.2019.8794467.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Stuckler, Jorg, Nenad Biresev, and Sven Behnke. "Semantic mapping using object-class segmentation of RGB-D images." In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2012). IEEE, 2012. http://dx.doi.org/10.1109/iros.2012.6385983.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Pavel, Mircea Serban, Hannes Schulz, and Sven Behnke. "Recurrent convolutional neural networks for object-class segmentation of RGB-D video." In 2015 International Joint Conference on Neural Networks (IJCNN). IEEE, 2015. http://dx.doi.org/10.1109/ijcnn.2015.7280820.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Finman, Ross, Thomas Whelan, Michael Kaess, and John J. Leonard. "Toward lifelong object segmentation from change detection in dense RGB-D maps." In 2013 European Conference on Mobile Robots (ECMR). IEEE, 2013. http://dx.doi.org/10.1109/ecmr.2013.6698839.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, I.-Kuei, Szu-Lu Hsu, Chung-Yu Chi, and Liang-Gee Chen. "Automatic video segmentation and object tracking with real-time RGB-D data." In 2014 IEEE International Conference on Consumer Electronics (ICCE). IEEE, 2014. http://dx.doi.org/10.1109/icce.2014.6776097.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Chaonan, Yanbing Xue, Hua Zhang, Guangping Xu, and Zan Gao. "Object segmentation of indoor scenes using perceptual organization on RGB-D images." In 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP). IEEE, 2016. http://dx.doi.org/10.1109/wcsp.2016.7752578.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Mingshao, Zhou Zhang, El-Sayed Aziz, Sven K. Esche, and Constantin Chassapis. "Kinect-Based Universal Range Sensor for Laboratory Experiments." In ASME 2013 International Mechanical Engineering Congress and Exposition. American Society of Mechanical Engineers, 2013. http://dx.doi.org/10.1115/imece2013-62979.

Full text
Abstract:
The Microsoft Kinect is part of a wave of new sensing technologies. Its RGB-D camera is capable of providing high quality synchronized video of both color and depth data. Compared to traditional 3-D tracking techniques that use two separate RGB cameras’ images to calculate depth data, the Kinect is able to produce more robust and reliable results in object recognition and motion tracking. Also, due to its low cost, the Kinect provides more opportunities for use in many areas compared to traditional more expensive 3-D scanners. In order to use the Kinect as a range sensor, algorithms must be designed to first recognize objects of interest and then track their motions. Although a large number of algorithms for both 2-D and 3-D object detection have been published, reliable and efficient algorithms for 3-D object motion tracking are rare, especially using Kinect as a range sensor. In this paper, algorithms for object recognition and tracking that can make use of both RGB and depth data in different scenarios are introduced. Subsequently, efficient methods for scene segmentation including background and noise filtering are discussed. Taking advantage of those two kinds of methods, a prototype system that is capable of working efficiently and stably in various applications related to educational laboratories is presented.
APA, Harvard, Vancouver, ISO, and other styles
8

Weber, Henrique, Claudio Rosito Jung, and Dan Gelb. "Hand and object segmentation from RGB-D images for interaction with planar surfaces." In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015. http://dx.doi.org/10.1109/icip.2015.7351350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yalic, Hamdi Yalin, and Ahmet Burak Can. "Automatic Object Segmentation on RGB-D Data using Surface Normals and Region Similarity." In International Conference on Computer Vision Theory and Applications. SCITEPRESS - Science and Technology Publications, 2018. http://dx.doi.org/10.5220/0006617303790386.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Youbing, and Shoudong Huang. "Towards dense moving object segmentation based robust dense RGB-D SLAM in dynamic scenarios." In 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV). IEEE, 2014. http://dx.doi.org/10.1109/icarcv.2014.7064596.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography