Siga este link para ver outros tipos de publicações sobre o tema: RGB-D object segmentation.

Artigos de revistas sobre o tema "RGB-D object segmentation"

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Veja os 43 melhores artigos de revistas para estudos sobre o assunto "RGB-D object segmentation".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Veja os artigos de revistas das mais diversas áreas científicas e compile uma bibliografia correta.

1

Shen, Xiaoke, e Ioannis Stamos. "3D Object Detection and Instance Segmentation from 3D Range and 2D Color Images". Sensors 21, n.º 4 (9 de fevereiro de 2021): 1213. http://dx.doi.org/10.3390/s21041213.

Texto completo da fonte
Resumo:
Instance segmentation and object detection are significant problems in the fields of computer vision and robotics. We address those problems by proposing a novel object segmentation and detection system. First, we detect 2D objects based on RGB, depth only, or RGB-D images. A 3D convolutional-based system, named Frustum VoxNet, is proposed. This system generates frustums from 2D detection results, proposes 3D candidate voxelized images for each frustum, and uses a 3D convolutional neural network (CNN) based on these candidates voxelized images to perform the 3D instance segmentation and object detection. Results on the SUN RGB-D dataset show that our RGB-D-based system’s 3D inference is much faster than state-of-the-art methods, without a significant loss of accuracy. At the same time, we can provide segmentation and detection results using depth only images, with accuracy comparable to RGB-D-based systems. This is important since our methods can also work well in low lighting conditions, or with sensors that do not acquire RGB images. Finally, the use of segmentation as part of our pipeline increases detection accuracy, while providing at the same time 3D instance segmentation.
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Yang, J., e Z. Kang. "INDOOR SEMANTIC SEGMENTATION FROM RGB-D IMAGES BY INTEGRATING FULLY CONVOLUTIONAL NETWORK WITH HIGHER-ORDER MARKOV RANDOM FIELD". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4 (19 de setembro de 2018): 717–24. http://dx.doi.org/10.5194/isprs-archives-xlii-4-717-2018.

Texto completo da fonte
Resumo:
<p><strong>Abstract.</strong> Indoor scenes have the characteristics of abundant semantic categories, illumination changes, occlusions and overlaps among objects, which poses great challenges for indoor semantic segmentation. Therefore, we in this paper develop a method based on higher-order Markov random field model for indoor semantic segmentation from RGB-D images. Instead of directly using RGB-D images, we first train and perform RefineNet model only using RGB information for generating the high-level semantic information. Then, the spatial location relationship from depth channel and the spectral information from color channels are integrated as a prior for a marker-controlled watershed algorithm to obtain the robust and accurate visual homogenous regions. Finally, higher-order Markov random field model encodes the short-range context among the adjacent pixels and the long-range context within each visual homogenous region for refining the semantic segmentations. To evaluate the effectiveness and robustness of the proposed method, experiments were conducted on the public SUN RGB-D dataset. Experimental results indicate that compared with using RGB information alone, the proposed method remarkably improves the semantic segmentation results, especially at object boundaries.</p>
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Rafique, Adnan Ahmed, Ahmad Jalal e Kibum Kim. "Automated Sustainable Multi-Object Segmentation and Recognition via Modified Sampling Consensus and Kernel Sliding Perceptron". Symmetry 12, n.º 11 (23 de novembro de 2020): 1928. http://dx.doi.org/10.3390/sym12111928.

Texto completo da fonte
Resumo:
Object recognition in depth images is challenging and persistent task in machine vision, robotics, and automation of sustainability. Object recognition tasks are a challenging part of various multimedia technologies for video surveillance, human–computer interaction, robotic navigation, drone targeting, tourist guidance, and medical diagnostics. However, the symmetry that exists in real-world objects plays a significant role in perception and recognition of objects in both humans and machines. With advances in depth sensor technology, numerous researchers have recently proposed RGB-D object recognition techniques. In this paper, we introduce a sustainable object recognition framework that is consistent despite any change in the environment, and can recognize and analyze RGB-D objects in complex indoor scenarios. Firstly, after acquiring a depth image, the point cloud and the depth maps are extracted to obtain the planes. Then, the plane fitting model and the proposed modified maximum likelihood estimation sampling consensus (MMLESAC) are applied as a segmentation process. Then, depth kernel descriptors (DKDES) over segmented objects are computed for single and multiple object scenarios separately. These DKDES are subsequently carried forward to isometric mapping (IsoMap) for feature space reduction. Finally, the reduced feature vector is forwarded to a kernel sliding perceptron (KSP) for the recognition of objects. Three datasets are used to evaluate four different experiments by employing a cross-validation scheme to validate the proposed model. The experimental results over RGB-D object, RGB-D scene, and NYUDv1 datasets demonstrate overall accuracies of 92.2%, 88.5%, and 90.5% respectively. These results outperform existing state-of-the-art methods and verify the suitability of the method.
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Novkovic, Tonci, Fadri Furrer, Marko Panjek, Margarita Grinvald, Roland Siegwart e Juan Nieto. "CLUBS: An RGB-D dataset with cluttered box scenes containing household objects". International Journal of Robotics Research 38, n.º 14 (23 de setembro de 2019): 1538–48. http://dx.doi.org/10.1177/0278364919875221.

Texto completo da fonte
Resumo:
With the progress of machine learning, the demand for realistic data with high-quality annotations has been thriving. In order to generalize well, considerable amounts of data are required, especially realistic ground-truth data, for tasks such as object detection and scene segmentation. Such data can be difficult, time-consuming, and expensive to collect. This article presents a dataset of household objects and box scenes commonly found in warehouse environments. The dataset was obtained using a robotic setup with four different cameras. It contains reconstructed objects and scenes, as well as raw RGB and depth images, camera poses, pixel-wise labels of objects directly in the RGB images, and 3D bounding boxes with poses in the world frame. Furthermore, raw calibration data are provided, together with the intrinsic and extrinsic parameters for all the sensors. By providing object labels as pixel-wise masks, 3D, and 2D object bounding boxes, this dataset is useful for both object recognition and instance segmentation. The realistic scenes provided will serve for learning-based algorithms applied to scenarios where boxes of objects are often found, such as in the logistics sector. Both the dataset and the tools for data processing are published and available online.
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Schwarz, Max, Anton Milan, Arul Selvam Periyasamy e Sven Behnke. "RGB-D object detection and semantic segmentation for autonomous manipulation in clutter". International Journal of Robotics Research 37, n.º 4-5 (20 de junho de 2017): 437–51. http://dx.doi.org/10.1177/0278364917713117.

Texto completo da fonte
Resumo:
Autonomous robotic manipulation in clutter is challenging. A large variety of objects must be perceived in complex scenes, where they are partially occluded and embedded among many distractors, often in restricted spaces. To tackle these challenges, we developed a deep-learning approach that combines object detection and semantic segmentation. The manipulation scenes are captured with RGB-D cameras, for which we developed a depth fusion method. Employing pretrained features makes learning from small annotated robotic datasets possible. We evaluate our approach on two challenging datasets: one captured for the Amazon Picking Challenge 2016, where our team NimbRo came in second in the Stowing and third in the Picking task; and one captured in disaster-response scenarios. The experiments show that object detection and semantic segmentation complement each other and can be combined to yield reliable object perception.
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Thermos, Spyridon, Gerasimos Potamianos e Petros Daras. "Joint Object Affordance Reasoning and Segmentation in RGB-D Videos". IEEE Access 9 (2021): 89699–713. http://dx.doi.org/10.1109/access.2021.3090471.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Kang, Xujie, Jing Li, Xiangtao Fan, Hongdeng Jian e Chen Xu. "Object-Level Semantic Map Construction for Dynamic Scenes". Applied Sciences 11, n.º 2 (11 de janeiro de 2021): 645. http://dx.doi.org/10.3390/app11020645.

Texto completo da fonte
Resumo:
Visual simultaneous localization and mapping (SLAM) is challenging in dynamic environments as moving objects can impair camera pose tracking and mapping. This paper introduces a method for robust dense bject-level SLAM in dynamic environments that takes a live stream of RGB-D frame data as input, detects moving objects, and segments the scene into different objects while simultaneously tracking and reconstructing their 3D structures. This approach provides a new method of dynamic object detection, which integrates prior knowledge of the object model database constructed, object-oriented 3D tracking against the camera pose, and the association between the instance segmentation results on the current frame data and an object database to find dynamic objects in the current frame. By leveraging the 3D static model for frame-to-model alignment, as well as dynamic object culling, the camera motion estimation reduced the overall drift. According to the camera pose accuracy and instance segmentation results, an object-level semantic map representation was constructed for the world map. The experimental results obtained using the TUM RGB-D dataset, which compares the proposed method to the related state-of-the-art approaches, demonstrating that our method achieves similar performance in static scenes and improved accuracy and robustness in dynamic scenes.
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Kang, Xujie, Jing Li, Xiangtao Fan, Hongdeng Jian e Chen Xu. "Object-Level Semantic Map Construction for Dynamic Scenes". Applied Sciences 11, n.º 2 (11 de janeiro de 2021): 645. http://dx.doi.org/10.3390/app11020645.

Texto completo da fonte
Resumo:
Visual simultaneous localization and mapping (SLAM) is challenging in dynamic environments as moving objects can impair camera pose tracking and mapping. This paper introduces a method for robust dense bject-level SLAM in dynamic environments that takes a live stream of RGB-D frame data as input, detects moving objects, and segments the scene into different objects while simultaneously tracking and reconstructing their 3D structures. This approach provides a new method of dynamic object detection, which integrates prior knowledge of the object model database constructed, object-oriented 3D tracking against the camera pose, and the association between the instance segmentation results on the current frame data and an object database to find dynamic objects in the current frame. By leveraging the 3D static model for frame-to-model alignment, as well as dynamic object culling, the camera motion estimation reduced the overall drift. According to the camera pose accuracy and instance segmentation results, an object-level semantic map representation was constructed for the world map. The experimental results obtained using the TUM RGB-D dataset, which compares the proposed method to the related state-of-the-art approaches, demonstrating that our method achieves similar performance in static scenes and improved accuracy and robustness in dynamic scenes.
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Xie, Qian, Oussama Remil, Yanwen Guo, Meng Wang, Mingqiang Wei e Jun Wang. "Object Detection and Tracking Under Occlusion for Object-Level RGB-D Video Segmentation". IEEE Transactions on Multimedia 20, n.º 3 (março de 2018): 580–92. http://dx.doi.org/10.1109/tmm.2017.2751965.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Ge, Yanliang, Cong Zhang, Kang Wang, Ziqi Liu e Hongbo Bi. "WGI-Net: A weighted group integration network for RGB-D salient object detection". Computational Visual Media 7, n.º 1 (8 de janeiro de 2021): 115–25. http://dx.doi.org/10.1007/s41095-020-0200-x.

Texto completo da fonte
Resumo:
AbstractSalient object detection is used as a pre-process in many computer vision tasks (such as salient object segmentation, video salient object detection, etc.). When performing salient object detection, depth information can provide clues to the location of target objects, so effective fusion of RGB and depth feature information is important. In this paper, we propose a new feature information aggregation approach, weighted group integration (WGI), to effectively integrate RGB and depth feature information. We use a dual-branch structure to slice the input RGB image and depth map separately and then merge the results separately by concatenation. As grouped features may lose global information about the target object, we also make use of the idea of residual learning, taking the features captured by the original fusion method as supplementary information to ensure both accuracy and completeness of the fused information. Experiments on five datasets show that our model performs better than typical existing approaches for four evaluation metrics.
Estilos ABNT, Harvard, Vancouver, APA, etc.
11

Richtsfeld, Andreas, Thomas Mörwald, Johann Prankl, Michael Zillich e Markus Vincze. "Learning of perceptual grouping for object segmentation on RGB-D data". Journal of Visual Communication and Image Representation 25, n.º 1 (janeiro de 2014): 64–73. http://dx.doi.org/10.1016/j.jvcir.2013.04.006.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
12

Shao, Lin, Parth Shah, Vikranth Dwaracherla e Jeannette Bohg. "Motion-Based Object Segmentation Based on Dense RGB-D Scene Flow". IEEE Robotics and Automation Letters 3, n.º 4 (outubro de 2018): 3797–804. http://dx.doi.org/10.1109/lra.2018.2856525.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
13

Xu, Chi, Jiale Chen, Mengyang Yao, Jun Zhou, Lijun Zhang e Yi Liu. "6DoF Pose Estimation of Transparent Object from a Single RGB-D Image". Sensors 20, n.º 23 (27 de novembro de 2020): 6790. http://dx.doi.org/10.3390/s20236790.

Texto completo da fonte
Resumo:
6DoF object pose estimation is a foundation for many important applications, such as robotic grasping, automatic driving, and so on. However, it is very challenging to estimate 6DoF pose of transparent object which is commonly seen in our daily life, because the optical characteristics of transparent material lead to significant depth error which results in false estimation. To solve this problem, a two-stage approach is proposed to estimate 6DoF pose of transparent object from a single RGB-D image. In the first stage, the influence of the depth error is eliminated by transparent segmentation, surface normal recovering, and RANSAC plane estimation. In the second stage, an extended point-cloud representation is presented to accurately and efficiently estimate object pose. As far as we know, it is the first deep learning based approach which focuses on 6DoF pose estimation of transparent objects from a single RGB-D image. Experimental results show that the proposed approach can effectively estimate 6DoF pose of transparent object, and it out-performs the state-of-the-art baselines by a large margin.
Estilos ABNT, Harvard, Vancouver, APA, etc.
14

Gupta, Saurabh, Pablo Arbeláez, Ross Girshick e Jitendra Malik. "Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation". International Journal of Computer Vision 112, n.º 2 (21 de novembro de 2014): 133–49. http://dx.doi.org/10.1007/s11263-014-0777-6.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
15

Pavel, Mircea Serban, Hannes Schulz e Sven Behnke. "Object class segmentation of RGB-D video using recurrent convolutional neural networks". Neural Networks 88 (abril de 2017): 105–13. http://dx.doi.org/10.1016/j.neunet.2017.01.003.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
16

Ji, Yijun, Qing Xia e Zhijiang Zhang. "Fusing Depth and Silhouette for Scanning Transparent Object with RGB-D Sensor". International Journal of Optics 2017 (2017): 1–11. http://dx.doi.org/10.1155/2017/9796127.

Texto completo da fonte
Resumo:
3D reconstruction based on structured light or laser scan has been widely used in industrial measurement, robot navigation, and virtual reality. However, most modern range sensors fail to scan transparent objects and some other special materials, of which the surface cannot reflect back the accurate depth because of the absorption and refraction of light. In this paper, we fuse the depth and silhouette information from an RGB-D sensor (Kinect v1) to recover the lost surface of transparent objects. Our system is divided into two parts. First, we utilize the zero and wrong depth led by transparent materials from multiple views to search for the 3D region which contains the transparent object. Then, based on shape from silhouette technology, we recover the 3D model by visual hull within these noisy regions. Joint Grabcut segmentation is operated on multiple color images to extract the silhouette. The initial constraint for Grabcut is automatically determined. Experiments validate that our approach can improve the 3D model of transparent object in real-world scene. Our system is time-saving, robust, and without any interactive operation throughout the process.
Estilos ABNT, Harvard, Vancouver, APA, etc.
17

Calli, Berk, Arjun Singh, James Bruce, Aaron Walsman, Kurt Konolige, Siddhartha Srinivasa, Pieter Abbeel e Aaron M. Dollar. "Yale-CMU-Berkeley dataset for robotic manipulation research". International Journal of Robotics Research 36, n.º 3 (março de 2017): 261–68. http://dx.doi.org/10.1177/0278364917700714.

Texto completo da fonte
Resumo:
In this paper, we present an image and model dataset of the real-life objects from the Yale-CMU-Berkeley Object Set, which is specifically designed for benchmarking in manipulation research. For each object, the dataset presents 600 high-resolution RGB images, 600 RGB-D images and five sets of textured three-dimensional geometric models. Segmentation masks and calibration information for each image are also provided. These data are acquired using the BigBIRD Object Scanning Rig and Google Scanners. Together with the dataset, Python scripts and a Robot Operating System node are provided to download the data, generate point clouds and create Unified Robot Description Files. The dataset is also supported by our website, www.ycbbenchmarks.org , which serves as a portal for publishing and discussing test results along with proposing task protocols and benchmarks.
Estilos ABNT, Harvard, Vancouver, APA, etc.
18

Wu, Yongxiang, Yili Fu e Shuguo Wang. "Deep instance segmentation and 6D object pose estimation in cluttered scenes for robotic autonomous grasping". Industrial Robot: the international journal of robotics research and application 47, n.º 4 (20 de abril de 2020): 593–606. http://dx.doi.org/10.1108/ir-12-2019-0259.

Texto completo da fonte
Resumo:
Purpose This paper aims to design a deep neural network for object instance segmentation and six-dimensional (6D) pose estimation in cluttered scenes and apply the proposed method in real-world robotic autonomous grasping of household objects. Design/methodology/approach A novel deep learning method is proposed for instance segmentation and 6D pose estimation in cluttered scenes. An iterative pose refinement network is integrated with the main network to obtain more robust final pose estimation results for robotic applications. To train the network, a technique is presented to generate abundant annotated synthetic data consisting of RGB-D images and object masks in a fast manner without any hand-labeling. For robotic grasping, the offline grasp planning based on eigengrasp planner is performed and combined with the online object pose estimation. Findings The experiments on the standard pose benchmarking data sets showed that the method achieves better pose estimation and time efficiency performance than state-of-art methods with depth-based ICP refinement. The proposed method is also evaluated on a seven DOFs Kinova Jaco robot with an Intel Realsense RGB-D camera, the grasping results illustrated that the method is accurate and robust enough for real-world robotic applications. Originality/value A novel 6D pose estimation network based on the instance segmentation framework is proposed and a neural work-based iterative pose refinement module is integrated into the method. The proposed method exhibits satisfactory pose estimation and time efficiency for the robotic grasping.
Estilos ABNT, Harvard, Vancouver, APA, etc.
19

Hastürk, Özgür, e Aydan M. Erkmen. "DUDMap: 3D RGB-D mapping for dense, unstructured, and dynamic environment". International Journal of Advanced Robotic Systems 18, n.º 3 (1 de maio de 2021): 172988142110161. http://dx.doi.org/10.1177/17298814211016178.

Texto completo da fonte
Resumo:
Simultaneous localization and mapping (SLAM) problem has been extensively studied by researchers in the field of robotics, however, conventional approaches in mapping assume a static environment. The static assumption is valid only in a small region, and it limits the application of visual SLAM in dynamic environments. The recently proposed state-of-the-art SLAM solutions for dynamic environments use different semantic segmentation methods such as mask R-CNN and SegNet; however, these frameworks are based on a sparse mapping framework (ORBSLAM). In addition, segmentation process increases the computational power, which makes these SLAM algorithms unsuitable for real-time mapping. Therefore, there is no effective dense RGB-D SLAM method for real-world unstructured and dynamic environments. In this study, we propose a novel real-time dense SLAM method for dynamic environments, where 3D reconstruction error is manipulated for identification of static and dynamic classes having generalized Gaussian distribution. Our proposed approach requires neither explicit object tracking nor object classifier, which makes it robust to any type of moving object and suitable for real-time mapping. Our method eliminates the repeated views and uses consistent data that enhance the performance of volumetric fusion. For completeness, we compare our proposed method using different types of high dynamic dataset, which are publicly available, to demonstrate the versatility and robustness of our approach. Experiments show that its tracking performance is better than other dense and dynamic SLAM approaches.
Estilos ABNT, Harvard, Vancouver, APA, etc.
20

Xu, Hui, Guodong Chen, Zhenhua Wang, Lining Sun e Fan Su. "RGB-D-Based Pose Estimation of Workpieces with Semantic Segmentation and Point Cloud Registration". Sensors 19, n.º 8 (19 de abril de 2019): 1873. http://dx.doi.org/10.3390/s19081873.

Texto completo da fonte
Resumo:
As an important part of a factory’s automated production line, industrial robots can perform a variety of tasks by integrating external sensors. Among these tasks, grasping scattered workpieces on the industrial assembly line has always been a prominent and difficult point in robot manipulation research. By using RGB-D (color and depth) information, we propose an efficient and practical solution that fuses the approaches of semantic segmentation and point cloud registration to perform object recognition and pose estimation. Different from objects in an indoor environment, the characteristics of the workpiece are relatively simple; thus, we create and label an RGB image dataset from a variety of industrial scenarios and train the modified FCN (Fully Convolutional Network) on a homemade dataset to infer the semantic segmentation results of the input images. Then, we determine the point cloud of the workpieces by incorporating the depth information to estimate the real-time pose of the workpieces. To evaluate the accuracy of the solution, we propose a novel pose error evaluation method based on the robot vision system. This method does not rely on expensive measuring equipment and can also obtain accurate evaluation results. In an industrial scenario, our solution has a rotation error less than two degrees and a translation error < 10 mm.
Estilos ABNT, Harvard, Vancouver, APA, etc.
21

Li, Wei, Junhua Gu, Benwen Chen e Jungong Han. "Incremental Instance-Oriented 3D Semantic Mapping via RGB-D Cameras for Unknown Indoor Scene". Discrete Dynamics in Nature and Society 2020 (23 de abril de 2020): 1–10. http://dx.doi.org/10.1155/2020/2528954.

Texto completo da fonte
Resumo:
Scene parsing plays a crucial role when accomplishing human-robot interaction tasks. As the “eye” of the robot, RGB-D camera is one of the most important components for collecting multiview images to construct instance-oriented 3D environment semantic maps, especially in unknown indoor scenes. Although there are plenty of studies developing accurate object-level mapping systems with different types of cameras, these methods either process the instance segmentation problem in completed mapping or suffer from a critical real-time issue due to heavy computation processing required. In this paper, we propose a novel method to incrementally build instance-oriented 3D semantic maps directly from images acquired by the RGB-D camera. To ensure an efficient reconstruction of 3D objects with semantic and instance IDs, the input RGB images are operated by a real-time deep-learned object detector. To obtain accurate point cloud cluster, we adopt the Gaussian mixture model as an optimizer after processing 2D to 3D projection. Next, we present a data association strategy to update class probabilities across the frames. Finally, a map integration strategy fuses information about their 3D shapes, locations, and instance IDs in a faster way. We evaluate our system on different indoor scenes including offices, bedrooms, and living rooms from the SceneNN dataset, and the results show that our method not only builds the instance-oriented semantic map efficiently but also enhances the accuracy of the individual instance in the scene.
Estilos ABNT, Harvard, Vancouver, APA, etc.
22

Tian, Guanzhong, Liang Liu, JongHyok Ri, Yong Liu e Yiran Sun. "ObjectFusion: An object detection and segmentation framework with RGB-D SLAM and convolutional neural networks". Neurocomputing 345 (junho de 2019): 3–14. http://dx.doi.org/10.1016/j.neucom.2019.01.088.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
23

Iriondo, Ander, Elena Lazkano e Ander Ansuategi. "Affordance-Based Grasping Point Detection Using Graph Convolutional Networks for Industrial Bin-Picking Applications". Sensors 21, n.º 3 (26 de janeiro de 2021): 816. http://dx.doi.org/10.3390/s21030816.

Texto completo da fonte
Resumo:
Grasping point detection has traditionally been a core robotic and computer vision problem. In recent years, deep learning based methods have been widely used to predict grasping points, and have shown strong generalization capabilities under uncertainty. Particularly, approaches that aim at predicting object affordances without relying on the object identity, have obtained promising results in random bin-picking applications. However, most of them rely on RGB/RGB-D images, and it is not clear up to what extent 3D spatial information is used. Graph Convolutional Networks (GCNs) have been successfully used for object classification and scene segmentation in point clouds, and also to predict grasping points in simple laboratory experimentation. In the present proposal, we adapted the Deep Graph Convolutional Network model with the intuition that learning from n-dimensional point clouds would lead to a performance boost to predict object affordances. To the best of our knowledge, this is the first time that GCNs are applied to predict affordances for suction and gripper end effectors in an industrial bin-picking environment. Additionally, we designed a bin-picking oriented data preprocessing pipeline which contributes to ease the learning process and to create a flexible solution for any bin-picking application. To train our models, we created a highly accurate RGB-D/3D dataset which is openly available on demand. Finally, we benchmarked our method against a 2D Fully Convolutional Network based method, improving the top-1 precision score by 1.8% and 1.7% for suction and gripper respectively.
Estilos ABNT, Harvard, Vancouver, APA, etc.
24

Zhuang, Chungang, Zhe Wang, Heng Zhao e Han Ding. "Semantic part segmentation method based 3D object pose estimation with RGB-D images for bin-picking". Robotics and Computer-Integrated Manufacturing 68 (abril de 2021): 102086. http://dx.doi.org/10.1016/j.rcim.2020.102086.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
25

Ruiz-Sarmiento, J. R., C. Galindo e J. Gonzalez-Jimenez. "Robot@Home, a robotic dataset for semantic mapping of home environments". International Journal of Robotics Research 36, n.º 2 (fevereiro de 2017): 131–41. http://dx.doi.org/10.1177/0278364917695640.

Texto completo da fonte
Resumo:
This paper presents the Robot-at-Home dataset (Robot@Home), a collection of raw and processed sensory data from domestic settings aimed at serving as a benchmark for semantic mapping algorithms through the categorization of objects and/or rooms. The dataset contains 87,000+ time-stamped observations gathered by a mobile robot endowed with a rig of four RGB-D cameras and a 2D laser scanner. Raw observations have been processed to produce different outcomes also distributed with the dataset, including 3D reconstructions and 2D geometric maps of the inspected rooms, both annotated with the ground truth categories of the surveyed rooms and objects. The proposed dataset is particularly suited as a testbed for object and/or room categorization systems, but it can be also exploited for a variety of tasks, including robot localization, 3D map building, SLAM, and object segmentation. Robot@Home is publicly available for the research community at http://mapir.isa.uma.es/work/robot-at-home-dataset .
Estilos ABNT, Harvard, Vancouver, APA, etc.
26

Chen e Lin. "Virtual Object Replacement Based on Real Environments: Potential Application in Augmented Reality Systems". Applied Sciences 9, n.º 9 (29 de abril de 2019): 1797. http://dx.doi.org/10.3390/app9091797.

Texto completo da fonte
Resumo:
Augmented reality (AR) is an emerging technology that allows users to interact with simulated environments, including those emulating scenes in the real world. Most current AR technologies involve the placement of virtual objects within these scenes. However, difficulties in modeling real-world objects greatly limit the scope of the simulation, and thus the depth of the user experience. In this study, we developed a process by which to realize virtual environments that are based entirely on scenes in the real world. In modeling the real world, the proposed scheme divides scenes into discrete objects, which are then replaced with virtual objects. This enables users to interact in and with virtual environments without limitations. An RGB-D camera is used in conjunction with simultaneous localization and mapping (SLAM) to obtain the movement trajectory of the user and derive information related to the real environment. In modeling the environment, graph-based segmentation is used to segment point clouds and perform object segmentation to enable the subsequent replacement of objects with equivalent virtual entities. Superquadrics are used to derive shape parameters and location information from the segmentation results in order to ensure that the scale of the virtual objects matches the original objects in the real world. Only after the objects have been replaced with their virtual counterparts in the real environment converted into a virtual scene. Experiments involving the emulation of real-world locations demonstrated the feasibility of the proposed rendering scheme. A rock-climbing application scenario is finally presented to illustrate the potential use of the proposed system in AR applications.
Estilos ABNT, Harvard, Vancouver, APA, etc.
27

Gujjar, Harish S. "A Comparative Study of VoxelNet and PointNet for 3D Object Detection in Car by Using KITTI Benchmark". International Journal of Information Communication Technologies and Human Development 10, n.º 3 (julho de 2018): 28–38. http://dx.doi.org/10.4018/ijicthd.2018070103.

Texto completo da fonte
Resumo:
In today's world, 2D object recognition is a normal course of study in research. 3D objection recognition is more in demand and important in the present scenario. 3D object recognition has gained importance in areas such as navigation of vehicles, robotic vision, HoME, virtual reality, etc. This work reveals the two important methods, Voxelnet and PointNet, useful in 3D object recognition. In case of NetPoint, the recognition is good when used with segmentation of point clouds which are in small-scale. Whereas, in case of Voxelnet, scans are used directly on raw points of clouds which are directly operated on patterns. The above conclusion is arrived on KITTI car detection. The KITTI uses detection by using bird's eye view. In this method of KITTI we compare two different methods called LiDAR and RGB-D. We arrive at a conclusion that pointNet is useful and has high performance when we are using small scenarios and Voxelnet is useful and has high performance when we are using large scenarios.
Estilos ABNT, Harvard, Vancouver, APA, etc.
28

Wong, Ching-Chang, Li-Yu Yeh, Chih-Cheng Liu, Chi-Yi Tsai e Hisasuki Aoyama. "Manipulation Planning for Object Re-Orientation Based on Semantic Segmentation Keypoint Detection". Sensors 21, n.º 7 (24 de março de 2021): 2280. http://dx.doi.org/10.3390/s21072280.

Texto completo da fonte
Resumo:
In this paper, a manipulation planning method for object re-orientation based on semantic segmentation keypoint detection is proposed for robot manipulator which is able to detect and re-orientate the randomly placed objects to a specified position and pose. There are two main parts: (1) 3D keypoint detection system; and (2) manipulation planning system for object re-orientation. In the 3D keypoint detection system, an RGB-D camera is used to obtain the information of the environment and can generate 3D keypoints of the target object as inputs to represent its corresponding position and pose. This process simplifies the 3D model representation so that the manipulation planning for object re-orientation can be executed in a category-level manner by adding various training data of the object in the training phase. In addition, 3D suction points in both the object’s current and expected poses are also generated as the inputs of the next operation stage. During the next stage, Mask Region-Convolutional Neural Network (Mask R-CNN) algorithm is used for preliminary object detection and object image. The highest confidence index image is selected as the input of the semantic segmentation system in order to classify each pixel in the picture for the corresponding pack unit of the object. In addition, after using a convolutional neural network for semantic segmentation, the Conditional Random Fields (CRFs) method is used to perform several iterations to obtain a more accurate result of object recognition. When the target object is segmented into the pack units of image process, the center position of each pack unit can be obtained. Then, a normal vector of each pack unit’s center points is generated by the depth image information and pose of the object, which can be obtained by connecting the center points of each pack unit. In the manipulation planning system for object re-orientation, the pose of the object and the normal vector of each pack unit are first converted into the working coordinate system of the robot manipulator. Then, according to the current and expected pose of the object, the spherical linear interpolation (Slerp) algorithm is used to generate a series of movements in the workspace for object re-orientation on the robot manipulator. In addition, the pose of the object is adjusted on the z-axis of the object’s geodetic coordinate system based on the image features on the surface of the object, so that the pose of the placed object can approach the desired pose. Finally, a robot manipulator and a vacuum suction cup made by the laboratory are used to verify that the proposed system can indeed complete the planned task of object re-orientation.
Estilos ABNT, Harvard, Vancouver, APA, etc.
29

Liu, Weiping, Jia Sun, Wanyi Li, Ting Hu e Peng Wang. "Deep Learning on Point Clouds and Its Application: A Survey". Sensors 19, n.º 19 (26 de setembro de 2019): 4188. http://dx.doi.org/10.3390/s19194188.

Texto completo da fonte
Resumo:
Point cloud is a widely used 3D data form, which can be produced by depth sensors, such as Light Detection and Ranging (LIDAR) and RGB-D cameras. Being unordered and irregular, many researchers focused on the feature engineering of the point cloud. Being able to learn complex hierarchical structures, deep learning has achieved great success with images from cameras. Recently, many researchers have adapted it into the applications of the point cloud. In this paper, the recent existing point cloud feature learning methods are classified as point-based and tree-based. The former directly takes the raw point cloud as the input for deep learning. The latter first employs a k-dimensional tree (Kd-tree) structure to represent the point cloud with a regular representation and then feeds these representations into deep learning models. Their advantages and disadvantages are analyzed. The applications related to point cloud feature learning, including 3D object classification, semantic segmentation, and 3D object detection, are introduced, and the datasets and evaluation metrics are also collected. Finally, the future research trend is predicted.
Estilos ABNT, Harvard, Vancouver, APA, etc.
30

Zhang, Jiahao, Miao Li, Ying Feng e Chenguang Yang. "Robotic grasp detection based on image processing and random forest". Multimedia Tools and Applications 79, n.º 3-4 (21 de novembro de 2019): 2427–46. http://dx.doi.org/10.1007/s11042-019-08302-9.

Texto completo da fonte
Resumo:
AbstractReal-time grasp detection plays a key role in manipulation, and it is also a complex task, especially for detecting how to grasp novel objects. This paper proposes a very quick and accurate approach to detect robotic grasps. The main idea is to perform grasping of novel objects in a typical RGB-D scene view. Our goal is not to find the best grasp for every object but to obtain the local optimal grasps in candidate grasp rectangles. There are three main contributions to our detection work. Firstly, an improved graph segmentation approach is used to do objects detection and it can separate objects from the background directly and fast. Secondly, we develop a morphological image processing method to generate candidate grasp rectangles set which avoids us to search grasp rectangles globally. Finally, we train a random forest model to predict grasps and achieve an accuracy of 94.26%. The model is mainly used to score every element in our candidate grasps set and the one gets the highest score will be converted to the final grasp configuration for robots. For real-world experiments, we set up our system on a tabletop scene with multiple objects and when implementing robotic grasps, we control Baxter robot with a different inverse kinematics strategy rather than the built-in one.
Estilos ABNT, Harvard, Vancouver, APA, etc.
31

Tao, Chongben, Yufeng Jin, Feng Cao, Zufeng Zhang, Chunguang Li e Hanwen Gao. "3D Semantic VSLAM of Indoor Environment Based on Mask Scoring RCNN". Discrete Dynamics in Nature and Society 2020 (20 de outubro de 2020): 1–14. http://dx.doi.org/10.1155/2020/5916205.

Texto completo da fonte
Resumo:
In view of existing Visual SLAM (VSLAM) algorithms when constructing semantic map of indoor environment, there are problems with low accuracy and low label classification accuracy when feature points are sparse. This paper proposed a 3D semantic VSLAM algorithm called BMASK-RCNN based on Mask Scoring RCNN. Firstly, feature points of images are extracted by Binary Robust Invariant Scalable Keypoints (BRISK) algorithm. Secondly, map points of reference key frame are projected to current frame for feature matching and pose estimation, and an inverse depth filter is used to estimate scene depth of created key frame to obtain camera pose changes. In order to achieve object detection and semantic segmentation for both static objects and dynamic objects in indoor environments and then construct dense 3D semantic map with VSLAM algorithm, a Mask Scoring RCNN is used to adjust its structure partially, where a TUM RGB-D SLAM dataset for transfer learning is employed. Semantic information of independent targets in scenes provides semantic information including categories, which not only provides high accuracy of localization but also realizes the probability update of semantic estimation by marking movable objects, thereby reducing the impact of moving objects on real-time mapping. Through simulation and actual experimental comparison with other three algorithms, results show the proposed algorithm has better robustness, and semantic information used in 3D semantic mapping can be accurately obtained.
Estilos ABNT, Harvard, Vancouver, APA, etc.
32

Sánchez, Carlos Medina, Matteo Zella, Jesús Capitán e Pedro J. Marrón. "Semantic Mapping with Low-Density Point-Clouds for Service Robots in Indoor Environments". Applied Sciences 10, n.º 20 (14 de outubro de 2020): 7154. http://dx.doi.org/10.3390/app10207154.

Texto completo da fonte
Resumo:
The advancements in the robotic field have made it possible for service robots to increasingly become part of everyday indoor scenarios. Their ability to operate and reach defined goals depends on the perception and understanding of their surrounding environment. Detecting and positioning objects as well as people in an accurate semantic map are, therefore, essential tasks that a robot needs to carry out. In this work, we walk an alternative path to build semantic maps of indoor scenarios. Instead of relying on high-density sensory input, like the one provided by an RGB-D camera, and resource-intensive processing algorithms, like the ones based on deep learning, we investigate the use of low-density point-clouds provided by 3D LiDARs together with a set of practical segmentation methods for the detection of objects. By focusing on the physical structure of the objects of interest, it is possible to remove complex training phases and exploit sensors with lower resolution but wider Field of View (FoV). Our evaluation shows that our approach can achieve comparable (if not better) performance in object labeling and positioning with a significant decrease in processing time than established approaches based on deep learning methods. As a side-effect of using low-density point-clouds, we also better support people privacy as the lower resolution inherently prevents the use of techniques like face recognition.
Estilos ABNT, Harvard, Vancouver, APA, etc.
33

Liu, Haowei, Matthai Philipose e Ming-Ting Sun. "Automatic objects segmentation with RGB-D cameras". Journal of Visual Communication and Image Representation 25, n.º 4 (maio de 2014): 709–18. http://dx.doi.org/10.1016/j.jvcir.2013.03.012.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
34

Sebastian, C., B. Boom, T. van Lankveld, E. Bondarev e P. H. N. De With. "BOOTSTRAPPED CNNS FOR BUILDING SEGMENTATION ON RGB-D AERIAL IMAGERY". ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-4 (19 de setembro de 2018): 187–92. http://dx.doi.org/10.5194/isprs-annals-iv-4-187-2018.

Texto completo da fonte
Resumo:
<p><strong>Abstract.</strong> Detection of buildings and other objects from aerial images has various applications in urban planning and map making. Automated building detection from aerial imagery is a challenging task, as it is prone to varying lighting conditions, shadows and occlusions. Convolutional Neural Networks (CNNs) are robust against some of these variations, although they fail to distinguish easy and difficult examples. We train a detection algorithm from RGB-D images to obtain a segmented mask by using the CNN architecture DenseNet. First, we improve the performance of the model by applying a statistical re-sampling technique called Bootstrapping and demonstrate that more informative examples are retained. Second, the proposed method outperforms the non-bootstrapped version by utilizing only one-sixth of the original training data and it obtains a precision-recall break-even of 95.10<span class="thinspace"></span>% on our aerial imagery dataset.</p>
Estilos ABNT, Harvard, Vancouver, APA, etc.
35

Cheng, Junhao, Zhi Wang, Hongyan Zhou, Li Li e Jian Yao. "DM-SLAM: A Feature-Based SLAM System for Rigid Dynamic Scenes". ISPRS International Journal of Geo-Information 9, n.º 4 (27 de março de 2020): 202. http://dx.doi.org/10.3390/ijgi9040202.

Texto completo da fonte
Resumo:
Most Simultaneous Localization and Mapping (SLAM) methods assume that environments are static. Such a strong assumption limits the application of most visual SLAM systems. The dynamic objects will cause many wrong data associations during the SLAM process. To address this problem, a novel visual SLAM method that follows the pipeline of feature-based methods called DM-SLAM is proposed in this paper. DM-SLAM combines an instance segmentation network with optical flow information to improve the location accuracy in dynamic environments, which supports monocular, stereo, and RGB-D sensors. It consists of four modules: semantic segmentation, ego-motion estimation, dynamic point detection and a feature-based SLAM framework. The semantic segmentation module obtains pixel-wise segmentation results of potentially dynamic objects, and the ego-motion estimation module calculates the initial pose. In the third module, two different strategies are presented to detect dynamic feature points for RGB-D/stereo and monocular cases. In the first case, the feature points with depth information are reprojected to the current frame. The reprojection offset vectors are used to distinguish the dynamic points. In the other case, we utilize the epipolar constraint to accomplish this task. Furthermore, the static feature points left are fed into the fourth module. The experimental results on the public TUM and KITTI datasets demonstrate that DM-SLAM outperforms the standard visual SLAM baselines in terms of accuracy in highly dynamic environments.
Estilos ABNT, Harvard, Vancouver, APA, etc.
36

Runceanu, L. S., e N. Haala. "INDOOR MESH CLASSIFICATION FOR BIM". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-4 (19 de setembro de 2018): 535–39. http://dx.doi.org/10.5194/isprs-archives-xlii-4-535-2018.

Texto completo da fonte
Resumo:
<p><strong>Abstract.</strong> This work addresses the automatic reconstruction of objects useful for BIM, like walls, floors and ceilings, from meshed and textured mapped 3D point clouds of indoor scenes. For this reason, we focus on the semantic segmentation of 3D indoor meshes as the initial step for the automatic generation of BIM models. Our investigations are based on the benchmark dataset ScanNet, which aims at the interpretation of 3D indoor scenes. For this purpose it provides 3D meshed representations as collected from low cost range cameras. In our opinion such RGB-D data has a great potential for the automated reconstruction of BIM objects.</p>
Estilos ABNT, Harvard, Vancouver, APA, etc.
37

Martinez, Manuel, Kailun Yang, Angela Constantinescu e Rainer Stiefelhagen. "Helping the Blind to Get through COVID-19: Social Distancing Assistant Using Real-Time Semantic Segmentation on RGB-D Video". Sensors 20, n.º 18 (12 de setembro de 2020): 5202. http://dx.doi.org/10.3390/s20185202.

Texto completo da fonte
Resumo:
The current COVID-19 pandemic is having a major impact on our daily lives. Social distancing is one of the measures that has been implemented with the aim of slowing the spread of the disease, but it is difficult for blind people to comply with this. In this paper, we present a system that helps blind people to maintain physical distance to other persons using a combination of RGB and depth cameras. We use a real-time semantic segmentation algorithm on the RGB camera to detect where persons are and use the depth camera to assess the distance to them; then, we provide audio feedback through bone-conducting headphones if a person is closer than 1.5 m. Our system warns the user only if persons are nearby but does not react to non-person objects such as walls, trees or doors; thus, it is not intrusive, and it is possible to use it in combination with other assistive devices. We have tested our prototype system on one blind and four blindfolded persons, and found that the system is precise, easy to use, and amounts to low cognitive load.
Estilos ABNT, Harvard, Vancouver, APA, etc.
38

Cai, Yuanzhi, Hong Huang, Kaiyang Wang, Cheng Zhang, Lei Fan e Fangyu Guo. "Selecting Optimal Combination of Data Channels for Semantic Segmentation in City Information Modelling (CIM)". Remote Sensing 13, n.º 7 (2 de abril de 2021): 1367. http://dx.doi.org/10.3390/rs13071367.

Texto completo da fonte
Resumo:
Over the last decade, a 3D reconstruction technique has been developed to present the latest as-is information for various objects and build the city information models. Meanwhile, deep learning based approaches are employed to add semantic information to the models. Studies have proved that the accuracy of the model could be improved by combining multiple data channels (e.g., XYZ, Intensity, D, and RGB). Nevertheless, the redundant data channels in large-scale datasets may cause high computation cost and time during data processing. Few researchers have addressed the question of which combination of channels is optimal in terms of overall accuracy (OA) and mean intersection over union (mIoU). Therefore, a framework is proposed to explore an efficient data fusion approach for semantic segmentation by selecting an optimal combination of data channels. In the framework, a total of 13 channel combinations are investigated to pre-process data and the encoder-to-decoder structure is utilized for network permutations. A case study is carried out to investigate the efficiency of the proposed approach by adopting a city-level benchmark dataset and applying nine networks. It is found that the combination of IRGB channels provide the best OA performance, while IRGBD channels provide the best mIoU performance.
Estilos ABNT, Harvard, Vancouver, APA, etc.
39

Ge, Yanliang, Cong Zhang, Kang Wang, Ziqi Liu e Hongbo Bi. "WGI-Net: A weighted group integration network for RGB-D salient object detection". Computational Visual Media, 8 de janeiro de 2021. http://dx.doi.org/10.1007/s41095-020-0200-x.

Texto completo da fonte
Resumo:
AbstractSalient object detection is used as a pre-process in many computer vision tasks (such as salient object segmentation, video salient object detection, etc.). When performing salient object detection, depth information can provide clues to the location of target objects, so effective fusion of RGB and depth feature information is important. In this paper, we propose a new feature information aggregation approach, weighted group integration (WGI), to effectively integrate RGB and depth feature information. We use a dual-branch structure to slice the input RGB image and depth map separately and then merge the results separately by concatenation. As grouped features may lose global information about the target object, we also make use of the idea of residual learning, taking the features captured by the original fusion method as supplementary information to ensure both accuracy and completeness of the fused information. Experiments on five datasets show that our model performs better than typical existing approaches for four evaluation metrics.
Estilos ABNT, Harvard, Vancouver, APA, etc.
40

Thinh, Nguyen Hong, Tran Hoang Tung e Le Vu Ha. "Depth-aware salient object segmentation". VNU Journal of Science: Computer Science and Communication Engineering 36, n.º 2 (7 de outubro de 2020). http://dx.doi.org/10.25073/2588-1086/vnucsce.217.

Texto completo da fonte
Resumo:
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and retrieval. It can be seen as a two-phase process: object detection and segmentation. Object segmentation becomes more challenging in case there is no prior knowledge about the object in the scene. In such conditions, visual attention analysis via saliency mapping may offer a mean to predict the object location by using visual contrast, local or global, to identify regions that draw strong attention in the image. However, in such situations as clutter background, highly varied object surface, or shadow, regular and salient object segmentation approaches based on a single image feature such as color or brightness have shown to be insufficient for the task. This work proposes a new salient object segmentation method which uses a depth map obtained from the input image for enhancing the accuracy of saliency mapping. A deep learning-based method is employed for depth map estimation. Our experiments showed that the proposed method outperforms other state-of-the-art object segmentation algorithms in terms of recall and precision. KeywordsSaliency map, Depth map, deep learning, object segmentation References[1] Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on pattern analysis and machine intelligence 20(11) (1998) 1254-1259.[2] Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, IEEE transactions on pattern analysis and machine intelligence 34(10) (2012) 1915-1926.[3] Kanan, M.H. Tong, L. Zhang, G.W. Cottrell, Sun: Top-down saliency using natural statistics, Visual cognition 17(6-7) (2009) 979-1003.[4] Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H.-Y. Shum, Learning to detect a salient object, IEEE Transactions on Pattern analysis and machine intelligence 33(2) (2011) 353-367.[5] Perazzi, P. Krähenbühl, Y. Pritch, A. Hornung, Saliency filters: Contrast based filtering for salient region detection, in: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, pp. 733-740.[6] M. Cheng, N.J. Mitra, X. Huang, P.H. Torr, S.M. Hu, Global contrast based salient region detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 37(3) (2015) 569-582.[7] Borji, L. Itti, State-of-the-art in visual attention modeling, IEEE transactions on pattern analysis and machine intelligence 35(1) (2013) 185-207.[8] Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv preprint arXiv:1312.6034.[9] Li, Y. Yu, Visual saliency based on multiscale deep features, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5455-5463.[10] Liu, J. Han, Dhsnet: Deep hierarchical saliency network for salient object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 678-686.[11] Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned saliency detection model, CVPR: Proc IEEE, 2009, pp. 1597-604.Fu, J. Cheng, Z. Li, H. Lu, Saliency cuts: An automatic approach to object segmentation, in: Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, IEEE, 2008, pp. 1-4Borenstein, J. Malik, Shape guided object segmentation, in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 1, IEEE, 2006, pp. 969-976.Jiang, J. Wang, Z. Yuan, T. Liu, N. Zheng, S. Li, Automatic salient object segmentation based on context and shape prior., in: BMVC. 6 (2011) 9.Ciptadi, T. Hermans, J.M. Rehg, An in depth view of saliency, Georgia Institute of Technology, 2013.Desingh, K.M. Krishna, D. Rajan, C. Jawahar, Depth really matters: Improving visual salient region detection with depth., in: BMVC, 2013.Li, J. Ye, Y. Ji, H. Ling, J. Yu, Saliency detection on light field, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2806-2813.Koch, S. Ullman, Shifts in selective visual attention: towards the underlying neural circuitry, in: Matters of intelligence, Springer, 1987, pp. 115-141.Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab, Deeper depth prediction with fully convolutional residual networks, in: 3D Vision (3DV), 2016 Fourth International Conference on, IEEE, 2016, pp. 239-248.Bruce, J. Tsotsos, Saliency based on information maximization, in: Advances in neural information processing systems, 2006, pp. 155-162.Ren, X. Gong, L. Yu, W. Zhou, M. Ying Yang, Exploiting global priors for rgb-d saliency detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 25-32.Fang, J. Wang, M. Narwaria, P. Le Callet, W. Lin, Saliency detection for stereoscopic images., IEEE Trans. Image Processing 23(6) (2014) 2625-2636.Hou, L. Zhang, Saliency detection: A spectral residual approach, in: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, IEEE, 2007, pp. 1-8.Guo, Q. Ma, L. Zhang, Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform, in: Computer vision and pattern recognition, 2008. cvpr 2008. ieee conference on, IEEE, 2008, pp. 1-8.Fang, W. Lin, B.S. Lee, C.T. Lau, Z. Chen, C.W. Lin, Bottom-up saliency detection model based on human visual sensitivity and amplitude spectrum, IEEE Transactions on Multimedia 14(1) (2012) 187-198.Lang, T.V. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, S. Yan, Depth matters: Influence of depth cues on visual saliency, in: Computer vision-ECCV 2012, Springer, 2012, pp. 101-115.Zhang, G. Jiang, M. Yu, K. Chen, Stereoscopic visual attention model for 3d video, in: International Conference on Multimedia Modeling, Springer, 2010, pp. 314-324.Wang, M.P. Da Silva, P. Le Callet, V. Ricordel, Computational model of stereoscopic 3d visual saliency, IEEE Transactions on Image Processing 22(6) (2013) 2151-2165.Peng, B. Li, W. Xiong, W. Hu, R. Ji, Rgbd salient object detection: A benchmark and algorithms, in: European Conference on Computer Vision (ECCV), 2014, pp. 92-109.Wu, L. Duan, L. Kong, Rgb-d salient object detection via feature fusion and multi-scale enhancement, in: CCF Chinese Conference on Computer Vision, Springer, 2015, pp. 359-368.Xue, Y. Gu, Y. Li, J. Yang, Rgb-d saliency detection via mutual guided manifold ranking, in: Image Processing (ICIP), 2015 IEEE International Conference on, IEEE, 2015, pp. 666-670.Katz, A. Adler, Depth camera based on structured light and stereo vision, uS Patent App. 12/877,595 (Mar. 8 2012).Chatterjee, G. Molina, D. Lelescu, Systems and methods for determining depth from multiple views of a scene that include aliasing using hypothesized fusion, uS Patent App. 13/623,091 (Mar. 21 2013).Matthies, T. Kanade, R. Szeliski, Kalman filter-based algorithms for estimating depth from image sequences, International Journal of Computer Vision 3(3) (1989) 209-238.Y. Schechner, N. Kiryati, Depth from defocus vs. stereo: How different really are they?, International Journal of Computer Vision 39(2) (2000) 141-162.Delage, H. Lee, A.Y. Ng, A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image, in: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, Vol. 2, IEEE, 2006, pp. 2418-2428.Saxena, M. Sun, A.Y. Ng, Make3d: Learning 3d scene structure from a single still image, IEEE transactions on pattern analysis and machine intelligence 31(5) (2009) 824-840.Hedau, D. Hoiem, D. Forsyth, Recovering the spatial layout of cluttered rooms, in: Computer vision, 2009 IEEE 12th international conference on, IEEE, 2009, pp. 1849-1856.Liu, S. Gould, D. Koller, Single image depth estimation from predicted semantic labels, in: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, 2010, pp. 1253-1260.Ladicky, J. Shi, M. Pollefeys, Pulling things out of perspective, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 89-96.K. Nathan Silberman, Derek Hoiem, R. Fergus, Indoor segmentation and support inference from rgbd images, in: ECCV, 2012.Liu, J. Yuen, A. Torralba, Sift flow: Dense correspondence across scenes and its applications, IEEE transactions on pattern analysis and machine intelligence 33(5) (2011) 978-994.Konrad, M. Wang, P. Ishwar, 2d-to-3d image conversion by learning depth from examples, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, 2012, pp. 16-22.Liu, C. Shen, G. Lin, Deep convolutional neural fields for depth estimation from a single image, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5162-5170.Wang, X. Shen, Z. Lin, S. Cohen, B. Price, A.L. Yuille, Towards unified depth and semantic prediction from a single image, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2800-2809.Geiger, P. Lenz, C. Stiller, R. Urtasun, Vision meets robotics: The kitti dataset, International Journal of Robotics Research (IJRR).Achanta, S. Süsstrunk, Saliency detection using maximum symmetric surround, in: Image processing (ICIP), 2010 17th IEEE international conference on, IEEE, 2010, pp. 2653-2656.E. Rahtu, J. Kannala, M. Salo, J. Heikkilä, Segmenting salient objects from images and videos, in: Computer Vision-ECCV 2010, Springer, 2010, pp. 366-37.
Estilos ABNT, Harvard, Vancouver, APA, etc.
41

"Segmentation of Moving Objects using Numerous Background Subtraction Methods for Surveillance Applications". International Journal of Innovative Technology and Exploring Engineering 9, n.º 3 (10 de janeiro de 2020): 2553–63. http://dx.doi.org/10.35940/ijitee.c8811.019320.

Texto completo da fonte
Resumo:
Background subtraction is a key part to detect moving objects from the video in computer vision field. It is used to subtract reference frame to every new frame of video scenes. There are wide varieties of background subtraction techniques available in literature to solve real life applications like crowd analysis, human activity tracking system, traffic analysis and many more. Moreover, there were not enough benchmark datasets available which can solve all the challenges of subtraction techniques for object detection. Thus challenges were found in terms of dynamic background, illumination changes, shadow appearance, occlusion and object speed. In this perspective, we have tried to provide exhaustive literature survey on background subtraction techniques for video surveillance applications to solve these challenges in real situations. Additionally, we have surveyed eight benchmark video datasets here namely Wallflower, BMC, PET, IBM, CAVIAR, CD.Net, SABS and RGB-D along with their available ground truth. This study evaluates the performance of five background subtraction methods using performance parameters such as specificity, sensitivity, FNR, PWC and F-Score in order to identify an accurate and efficient method for detecting moving objects in less computational time.
Estilos ABNT, Harvard, Vancouver, APA, etc.
42

Höller, Benjamin, Annette Mossel e Hannes Kaufmann. "Automatic object annotation in streamed and remotely explored large 3D reconstructions". Computational Visual Media, 7 de janeiro de 2021. http://dx.doi.org/10.1007/s41095-020-0194-4.

Texto completo da fonte
Resumo:
AbstractWe introduce a novel framework for 3D scene reconstruction with simultaneous object annotation, using a pre-trained 2D convolutional neural network (CNN), incremental data streaming, and remote exploration, with a virtual reality setup. It enables versatile integration of any 2D box detection or segmentation network. We integrate new approaches to (i) asynchronously perform dense 3D-reconstruction and object annotation at interactive frame rates, (ii) efficiently optimize CNN results in terms of object prediction and spatial accuracy, and (iii) generate computationally-efficient colliders in large triangulated 3D-reconstructions at run-time for 3D scene interaction. Our method is novel in combining CNNs with long and varying inference time with live 3D-reconstruction from RGB-D camera input. We further propose a lightweight data structure to store the 3D-reconstruction data and object annotations to enable fast incremental data transmission for real-time exploration with a remote client, which has not been presented before. Our framework achieves update rates of 22 fps (SSD Mobile Net) and 19 fps (Mask RCNN) for indoor environments up to 800 m3. We evaluated the accuracy of 3D-object detection. Our work provides a versatile foundation for semantic scene understanding of large streamed 3D-reconstructions, while being independent from the CNN’s processing time. Source code is available for non-commercial use.
Estilos ABNT, Harvard, Vancouver, APA, etc.
43

Będkowski, J., e J. Naruniec. "On-line range images registration with GPGPU". Opto-Electronics Review 21, n.º 1 (1 de janeiro de 2013). http://dx.doi.org/10.2478/s11772-013-0074-x.

Texto completo da fonte
Resumo:
AbstractThis paper concerns implementation of algorithms in the two important aspects of modern 3D data processing: data registration and segmentation. Solution proposed for the first topic is based on the 3D space decomposition, while the latter on image processing and local neighbourhood search. Data processing is implemented by using NVIDIA compute unified device architecture (NIVIDIA CUDA) parallel computation. The result of the segmentation is a coloured map where different colours correspond to different objects, such as walls, floor and stairs. The research is related to the problem of collecting 3D data with a RGB-D camera mounted on a rotated head, to be used in mobile robot applications. Performance of the data registration algorithm is aimed for on-line processing. The iterative closest point (ICP) approach is chosen as a registration method. Computations are based on the parallel fast nearest neighbour search. This procedure decomposes 3D space into cubic buckets and, therefore, the time of the matching is deterministic. First technique of the data segmentation uses accele-rometers integrated with a RGB-D sensor to obtain rotation compensation and image processing method for defining pre-requisites of the known categories. The second technique uses the adapted nearest neighbour search procedure for obtaining normal vectors for each range point.
Estilos ABNT, Harvard, Vancouver, APA, etc.
Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!

Vá para a bibliografia