To see the other types of publications on this topic, follow the link: RGB-Depth Image.

Journal articles on the topic 'RGB-Depth Image'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'RGB-Depth Image.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Li, Hengyu, Hang Liu, Ning Cao, et al. "Real-time RGB-D image stitching using multiple Kinects for improved field of view." International Journal of Advanced Robotic Systems 14, no. 2 (2017): 172988141769556. http://dx.doi.org/10.1177/1729881417695560.

Full text
Abstract:
This article concerns the problems of a defective depth map and limited field of view of Kinect-style RGB-D sensors. An anisotropic diffusion based hole-filling method is proposed to recover invalid depth data in the depth map. The field of view of the Kinect-style RGB-D sensor is extended by stitching depth and color images from several RGB-D sensors. By aligning the depth map with the color image, the registration data calculated by registering color images can be used to stitch depth and color images into a depth and color panoramic image concurrently in real time. Experiments show that the
APA, Harvard, Vancouver, ISO, and other styles
2

Wu, Yan, Jiqian Li, and Jing Bai. "Multiple Classifiers-Based Feature Fusion for RGB-D Object Recognition." International Journal of Pattern Recognition and Artificial Intelligence 31, no. 05 (2017): 1750014. http://dx.doi.org/10.1142/s0218001417500148.

Full text
Abstract:
RGB-D-based object recognition has been enthusiastically investigated in the past few years. RGB and depth images provide useful and complementary information. Fusing RGB and depth features can significantly increase the accuracy of object recognition. However, previous works just simply take the depth image as the fourth channel of the RGB image and concatenate the RGB and depth features, ignoring the different power of RGB and depth information for different objects. In this paper, a new method which contains three different classifiers is proposed to fuse features extracted from RGB image a
APA, Harvard, Vancouver, ISO, and other styles
3

Cao, Hao, Xin Zhao, Ang Li, and Meng Yang. "Depth Image Rectification Based on an Effective RGB–Depth Boundary Inconsistency Model." Electronics 13, no. 16 (2024): 3330. http://dx.doi.org/10.3390/electronics13163330.

Full text
Abstract:
Depth image has been widely involved in various tasks of 3D systems with the advancement of depth acquisition sensors in recent years. Depth images suffer from serious distortions near object boundaries due to the limitations of depth sensors or estimation methods. In this paper, a simple method is proposed to rectify the erroneous object boundaries of depth images with the guidance of reference RGB images. First, an RGB–Depth boundary inconsistency model is developed to measure whether collocated pixels in depth and RGB images belong to the same object. The model extracts the structures of RG
APA, Harvard, Vancouver, ISO, and other styles
4

OYAMA, Tadahiro, and Daisuke MATSUZAKI. "Depth Image Generation from monocular RGB image." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2019 (2019): 2P2—H09. http://dx.doi.org/10.1299/jsmermd.2019.2p2-h09.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Longyu, Hao Xia, and Yanyou Qiao. "Texture Synthesis Repair of RealSense D435i Depth Images with Object-Oriented RGB Image Segmentation." Sensors 20, no. 23 (2020): 6725. http://dx.doi.org/10.3390/s20236725.

Full text
Abstract:
A depth camera is a kind of sensor that can directly collect distance information between an object and the camera. The RealSense D435i is a low-cost depth camera that is currently in widespread use. When collecting data, an RGB image and a depth image are acquired simultaneously. The quality of the RGB image is good, whereas the depth image typically has many holes. In a lot of applications using depth images, these holes can lead to serious problems. In this study, a repair method of depth images was proposed. The depth image is repaired using the texture synthesis algorithm with the RGB ima
APA, Harvard, Vancouver, ISO, and other styles
6

Kwak, Jeonghoon, and Yunsick Sung. "Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR." Remote Sensing 12, no. 7 (2020): 1142. http://dx.doi.org/10.3390/rs12071142.

Full text
Abstract:
To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point c
APA, Harvard, Vancouver, ISO, and other styles
7

Tang, Shengjun, Qing Zhu, Wu Chen, et al. "ENHANCED RGB-D MAPPING METHOD FOR DETAILED 3D MODELING OF LARGE INDOOR ENVIRONMENTS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-1 (June 2, 2016): 151–58. http://dx.doi.org/10.5194/isprsannals-iii-1-151-2016.

Full text
Abstract:
RGB-D sensors are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks with respect to 3D dense mapping of indoor environments. First, they only allow a measurement range with a limited distance (e.g., within 3 m) and a limited field of view. Second, the error of the depth measurement increases with increasing distance to the sensor. In this paper, we propose an enhanced RGB-D mapping method for detailed 3D modeling of large indoor environments by combini
APA, Harvard, Vancouver, ISO, and other styles
8

Tang, Shengjun, Qing Zhu, Wu Chen, et al. "ENHANCED RGB-D MAPPING METHOD FOR DETAILED 3D MODELING OF LARGE INDOOR ENVIRONMENTS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-1 (June 2, 2016): 151–58. http://dx.doi.org/10.5194/isprs-annals-iii-1-151-2016.

Full text
Abstract:
RGB-D sensors are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks with respect to 3D dense mapping of indoor environments. First, they only allow a measurement range with a limited distance (e.g., within 3 m) and a limited field of view. Second, the error of the depth measurement increases with increasing distance to the sensor. In this paper, we propose an enhanced RGB-D mapping method for detailed 3D modeling of large indoor environments by combini
APA, Harvard, Vancouver, ISO, and other styles
9

Xu, Dan, Ba Li, Guanyun Xi, Shusheng Wang, Lei Xu, and Juncheng Ma. "A Shooting Distance Adaptive Crop Yield Estimation Method Based on Multi-Modal Fusion." Agronomy 15, no. 5 (2025): 1036. https://doi.org/10.3390/agronomy15051036.

Full text
Abstract:
To address the low estimation accuracy of deep learning-based crop yield image recognition methods under untrained shooting distances, this study proposes a shooting distance adaptive crop yield estimation method by fusing RGB and depth image information through multi-modal data fusion. Taking strawberry fruit fresh weight as an example, RGB and depth image data of 348 strawberries were collected at nine heights ranging from 70 to 115 cm. First, based on RGB images and shooting height information, a single-modal crop yield estimation model was developed by training a convolutional neural netwo
APA, Harvard, Vancouver, ISO, and other styles
10

Lee, Ki-Seung. "Improving the Performance of Automatic Lip-Reading Using Image Conversion Techniques." Electronics 13, no. 6 (2024): 1032. http://dx.doi.org/10.3390/electronics13061032.

Full text
Abstract:
Variation in lighting conditions is a major cause of performance degradation in pattern recognition when using optical imaging. In this study, infrared (IR) and depth images were considered as possible robust alternatives against variations in illumination, particularly for improving the performance of automatic lip-reading. The variations due to lighting conditions were quantitatively analyzed for optical, IR, and depth images. Then, deep neural network (DNN)-based lip-reading rules were built for each image modality. Speech recognition techniques based on IR or depth imaging required an addi
APA, Harvard, Vancouver, ISO, and other styles
11

Kao, Yueying, Weiming Li, Qiang Wang, Zhouchen Lin, Wooshik Kim, and Sunghoon Hong. "Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11221–28. http://dx.doi.org/10.1609/aaai.v34i07.6781.

Full text
Abstract:
Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding
APA, Harvard, Vancouver, ISO, and other styles
12

Ding, Ing-Jr, and Nai-Wei Zheng. "CNN Deep Learning with Wavelet Image Fusion of CCD RGB-IR and Depth-Grayscale Sensor Data for Hand Gesture Intention Recognition." Sensors 22, no. 3 (2022): 803. http://dx.doi.org/10.3390/s22030803.

Full text
Abstract:
Pixel-based images captured by a charge-coupled device (CCD) with infrared (IR) LEDs around the image sensor are the well-known CCD Red–Green–Blue IR (the so-called CCD RGB-IR) data. The CCD RGB-IR data are generally acquired for video surveillance applications. Currently, CCD RGB-IR information has been further used to perform human gesture recognition on surveillance. Gesture recognition, including hand gesture intention recognition, is attracting great attention in the field of deep neural network (DNN) calculations. For further enhancing conventional CCD RGB-IR gesture recognition by DNN,
APA, Harvard, Vancouver, ISO, and other styles
13

Kostusiak, Aleksander, and Piotr Skrzypczyński. "Enhancing Visual Odometry with Estimated Scene Depth: Leveraging RGB-D Data with Deep Learning." Electronics 13, no. 14 (2024): 2755. http://dx.doi.org/10.3390/electronics13142755.

Full text
Abstract:
Advances in visual odometry (VO) systems have benefited from the widespread use of affordable RGB-D cameras, improving indoor localization and mapping accuracy. However, older sensors like the Kinect v1 face challenges due to depth inaccuracies and incomplete data. This study compares indoor VO systems that use RGB-D images, exploring methods to enhance depth information. We examine conventional image inpainting techniques and a deep learning approach, utilizing newer depth data from devices like the Kinect v2. Our research highlights the importance of refining data from lower-quality sensors,
APA, Harvard, Vancouver, ISO, and other styles
14

Zhao, Bohu, Lebao Li, and Haipeng Pan. "Non-Local Means Hole Repair Algorithm Based on Adaptive Block." Applied Sciences 14, no. 1 (2023): 159. http://dx.doi.org/10.3390/app14010159.

Full text
Abstract:
RGB-D cameras provide depth and color information and are widely used in 3D reconstruction and computer vision. In the majority of existing RGB-D cameras, a considerable portion of depth values is often lost due to severe occlusion or limited camera coverage, thereby adversely impacting the precise localization and three-dimensional reconstruction of objects. In this paper, to address the issue of poor-quality in-depth images captured by RGB-D cameras, a depth image hole repair algorithm based on non-local means is proposed first, leveraging the structural similarities between grayscale and de
APA, Harvard, Vancouver, ISO, and other styles
15

Peng, M., W. Wan, Y. Xing, et al. "INTEGRATING DEPTH AND IMAGE SEQUENCES FOR PLANETARY ROVER MAPPING USING RGB-D SENSOR." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-3 (April 30, 2018): 1369–74. http://dx.doi.org/10.5194/isprs-archives-xlii-3-1369-2018.

Full text
Abstract:
RGB-D camera allows the capture of depth and color information at high data rates, and this makes it possible and beneficial integrate depth and image sequences for planetary rover mapping. The proposed mapping method consists of three steps. First, the strict projection relationship among 3D space, depth data and visual texture data is established based on the imaging principle of RGB-D camera, then, an extended bundle adjustment (BA) based SLAM method with integrated 2D and 3D measurements is applied to the image network for high-precision pose estimation. Next, as the interior and exterior
APA, Harvard, Vancouver, ISO, and other styles
16

Zhang, Xiaomin, Yanning Zhang, Jinfeng Geng, Jinming Pan, Xinyao Huang, and Xiuqin Rao. "Feather Damage Monitoring System Using RGB-Depth-Thermal Model for Chickens." Animals 13, no. 1 (2022): 126. http://dx.doi.org/10.3390/ani13010126.

Full text
Abstract:
Feather damage is a continuous health and welfare challenge among laying hens. Infrared thermography is a tool that can evaluate the changes in the surface temperature, derived from an inflammatory process that would make it possible to objectively determine the depth of the damage to the dermis. Therefore, the objective of this article was to develop an approach to feather damage assessment based on visible light and infrared thermography. Fusing information obtained from these two bands can highlight their strengths, which is more evident in the assessment of feather damage. A novel pipeline
APA, Harvard, Vancouver, ISO, and other styles
17

Chen, Songnan, Mengxia Tang, Ruifang Dong, and Jiangming Kan. "Encoder–Decoder Structure Fusing Depth Information for Outdoor Semantic Segmentation." Applied Sciences 13, no. 17 (2023): 9924. http://dx.doi.org/10.3390/app13179924.

Full text
Abstract:
The semantic segmentation of outdoor images is the cornerstone of scene understanding and plays a crucial role in the autonomous navigation of robots. Although RGB–D images can provide additional depth information for improving the performance of semantic segmentation tasks, current state–of–the–art methods directly use ground truth depth maps for depth information fusion, which relies on highly developed and expensive depth sensors. Aiming to solve such a problem, we proposed a self–calibrated RGB-D image semantic segmentation neural network model based on an improved residual network without
APA, Harvard, Vancouver, ISO, and other styles
18

Büker, Linda Christin, Finnja Zuber, Andreas Hein, and Sebastian Fudickar. "HRDepthNet: Depth Image-Based Marker-Less Tracking of Body Joints." Sensors 21, no. 4 (2021): 1356. http://dx.doi.org/10.3390/s21041356.

Full text
Abstract:
With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is cr
APA, Harvard, Vancouver, ISO, and other styles
19

Uddin, Md Kamal, Amran Bhuiyan, and Mahmudul Hasan. "Fusion in Dissimilarity Space Between RGB D and Skeleton for Person Re Identification." International Journal of Innovative Technology and Exploring Engineering 10, no. 12 (2021): 69–75. http://dx.doi.org/10.35940/ijitee.l9566.10101221.

Full text
Abstract:
Person re-identification (Re-id) is one of the important tools of video surveillance systems, which aims to recognize an individual across the multiple disjoint sensors of a camera network. Despite the recent advances on RGB camera-based person re-identification methods under normal lighting conditions, Re-id researchers fail to take advantages of modern RGB-D sensor-based additional information (e.g. depth and skeleton information). When traditional RGB-based cameras fail to capture the video under poor illumination conditions, RGB-D sensor-based additional information can be advantageous to
APA, Harvard, Vancouver, ISO, and other styles
20

Md, Kamal Uddin, Bhuiyan Amran, and Hasan Mahmudul. "Fusion in Dissimilarity Space Between RGB-D and Skeleton for Person Re-Identification." International Journal of Innovative Technology and Exploring Engineering (IJITEE) 10, no. 12 (2021): 69–75. https://doi.org/10.35940/ijitee.L9566.10101221.

Full text
Abstract:
Person re-identification (Re-id) is one of the important tools of video surveillance systems, which aims to recognize an individual across the multiple disjoint sensors of a camera network. Despite the recent advances on RGB camera-based person re-identification methods under normal lighting conditions, Re-id researchers fail to take advantages of modern RGB-D sensor-based additional information (e.g. depth and skeleton information). When traditional RGB-based cameras fail to capture the video under poor illumination conditions, RGB-D sensor-based additional information can be advantageous to
APA, Harvard, Vancouver, ISO, and other styles
21

Liu, Weiyu, and Nan Di. "RSCS6D: Keypoint Extraction-Based 6D Pose Estimation." Applied Sciences 15, no. 12 (2025): 6729. https://doi.org/10.3390/app15126729.

Full text
Abstract:
In this work, we propose an improved network, RSCS6D, for 6D pose estimation from RGB-D images by extracting keypoint-based point clouds. Our key insight is that keypoint cloud can reduce data redundancy in 3D point clouds and accelerate the convergence of convolutional neural networks. First, we employ a semantic segmentation network on the RGB image to obtain mask images containing positional information and per-pixel labels. Next, we introduce a novel keypoint cloud extraction algorithm that combines RGB and depth images to detect 2D keypoints and convert them into 3D keypoints. Specificall
APA, Harvard, Vancouver, ISO, and other styles
22

Yan, Zhiqiang, Hongyuan Wang, Qianhao Ning, and Yinxi Lu. "Robust Image Matching Based on Image Feature and Depth Information Fusion." Machines 10, no. 6 (2022): 456. http://dx.doi.org/10.3390/machines10060456.

Full text
Abstract:
In this paper, we propose a robust image feature extraction and fusion method to effectively fuse image feature and depth information and improve the registration accuracy of RGB-D images. The proposed method directly splices the image feature point descriptors with the corresponding point cloud feature descriptors to obtain the fusion descriptor of the feature points. The fusion feature descriptor is constructed based on the SIFT, SURF, and ORB feature descriptors and the PFH and FPFH point cloud feature descriptors. Furthermore, the registration performance based on fusion features is tested
APA, Harvard, Vancouver, ISO, and other styles
23

Jiao, Yuzhong, Kayton Wai Keung Cheung, Mark Ping Chan Mok, and Yiu Kei Li. "Spatial Distance-based Interpolation Algorithm for Computer Generated 2D+Z Images." Electronic Imaging 2020, no. 2 (2020): 140–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.2.sda-140.

Full text
Abstract:
Computer generated 2D plus Depth (2D+Z) images are common input data for 3D display with depth image-based rendering (DIBR) technique. Due to their simplicity, linear interpolation methods are usually used to convert low-resolution images into high-resolution images for not only depth maps but also 2D RGB images. However linear methods suffer from zigzag artifacts in both depth map and RGB images, which severely affects the 3D visual experience. In this paper, spatial distance-based interpolation algorithm for computer generated 2D+Z images is proposed. The method interpolates RGB images with
APA, Harvard, Vancouver, ISO, and other styles
24

Wang, Z., T. Li, L. Pan, and Z. Kang. "SCENE SEMANTIC SEGMENTATION FROM INDOOR RGB-D IMAGES USING ENCODE-DECODER FULLY CONVOLUTIONAL NETWORKS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W7 (September 12, 2017): 397–404. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w7-397-2017.

Full text
Abstract:
With increasing attention for the indoor environment and the development of low-cost RGB-D sensors, indoor RGB-D images are easily acquired. However, scene semantic segmentation is still an open area, which restricts indoor applications. The depth information can help to distinguish the regions which are difficult to be segmented out from the RGB images with similar color or texture in the indoor scenes. How to utilize the depth information is the key problem of semantic segmentation for RGB-D images. In this paper, we propose an Encode-Decoder Fully Convolutional Networks for RGB-D image clas
APA, Harvard, Vancouver, ISO, and other styles
25

Zheng, Huiming, and Wei Gao. "End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (2024): 7562–70. http://dx.doi.org/10.1609/aaai.v38i7.28588.

Full text
Abstract:
As a kind of 3D data, RGB-D images have been extensively used in object tracking, 3D reconstruction, remote sensing mapping, and other tasks. In the realm of computer vision, the significance of RGB-D images is progressively growing. However, the existing learning-based image compression methods usually process RGB images and depth images separately, which cannot entirely exploit the redundant information between the modalities, limiting the further improvement of the Rate-Distortion performance. With the goal of overcoming the defect, in this paper, we propose a learning-based dual-branch RGB
APA, Harvard, Vancouver, ISO, and other styles
26

Jiang, Ming-xin, Chao Deng, Ming-min Zhang, Jing-song Shan, and Haiyan Zhang. "Multimodal Deep Feature Fusion (MMDFF) for RGB-D Tracking." Complexity 2018 (November 28, 2018): 1–8. http://dx.doi.org/10.1155/2018/5676095.

Full text
Abstract:
Visual tracking is still a challenging task due to occlusion, appearance changes, complex motion, etc. We propose a novel RGB-D tracker based on multimodal deep feature fusion (MMDFF) in this paper. MMDFF model consists of four deep Convolutional Neural Networks (CNNs): Motion-specific CNN, RGB- specific CNN, Depth-specific CNN, and RGB-Depth correlated CNN. The depth image is encoded into three channels which are sent into depth-specific CNN to extract deep depth features. The optical flow image is calculated for every frame and then is fed to motion-specific CNN to learn deep motion features
APA, Harvard, Vancouver, ISO, and other styles
27

Kozlova, Y. K., and V. V. Myasnikov. "Head model reconstruction and animation method using color image with depth information." Computer Optics 48, no. 1 (2024): 118–22. http://dx.doi.org/10.18287/2412-6179-co-1334.

Full text
Abstract:
The article presents a method for reconstructing and animating a digital model of a human head from a single RGBD image, a color RGB image with depth information. An approach is proposed for optimizing the parametric FLAME model using a point cloud of a face corresponding to a single RGBD image. The results of experimental studies have shown that the proposed optimization approach makes it possible to obtain a head model with more prominent features of the original face compared to optimization approaches using RGB images or the same approaches generalized to RGBD images.
APA, Harvard, Vancouver, ISO, and other styles
28

Lv, Ying, and Wujie Zhou. "Hierarchical Multimodal Adaptive Fusion (HMAF) Network for Prediction of RGB-D Saliency." Computational Intelligence and Neuroscience 2020 (November 20, 2020): 1–9. http://dx.doi.org/10.1155/2020/8841681.

Full text
Abstract:
Visual saliency prediction for RGB-D images is more challenging than that for their RGB counterparts. Additionally, very few investigations have been undertaken concerning RGB-D-saliency prediction. The proposed study presents a method based on a hierarchical multimodal adaptive fusion (HMAF) network to facilitate end-to-end prediction of RGB-D saliency. In the proposed method, hierarchical (multilevel) multimodal features are first extracted from an RGB image and depth map using a VGG-16-based two-stream network. Subsequently, the most significant hierarchical features of the said RGB image a
APA, Harvard, Vancouver, ISO, and other styles
29

Sun, Wenbo, Zhi Gao, Jinqiang Cui, Bharath Ramesh, Bin Zhang, and Ziyao Li. "Semantic Segmentation Leveraging Simultaneous Depth Estimation." Sensors 21, no. 3 (2021): 690. http://dx.doi.org/10.3390/s21030690.

Full text
Abstract:
Semantic segmentation is one of the most widely studied problems in computer vision communities, which makes a great contribution to a variety of applications. A lot of learning-based approaches, such as Convolutional Neural Network (CNN), have made a vast contribution to this problem. While rich context information of the input images can be learned from multi-scale receptive fields by convolutions with deep layers, traditional CNNs have great difficulty in learning the geometrical relationship and distribution of objects in the RGB image due to the lack of depth information, which may lead t
APA, Harvard, Vancouver, ISO, and other styles
30

Kanda, Takuya, Kazuya Miyakawa, Jeonghwang Hayashi, et al. "Locating Mechanical Switches Using RGB-D Sensor Mounted on a Disaster Response Robot." Electronic Imaging 2020, no. 6 (2020): 16–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.6.iriacv-016.

Full text
Abstract:
To achieve one of the tasks required for disaster response robots, this paper proposes a method for locating 3D structured switches’ points to be pressed by the robot in disaster sites using RGBD images acquired by Kinect sensor attached to our disaster response robot. Our method consists of the following five steps: 1)Obtain RGB and depth images using an RGB-D sensor. 2) Detect the bounding box of switch area from the RGB image using YOLOv3. 3)Generate 3D point cloud data of the target switch by combining the bounding box and the depth image.4)Detect the center position of the switch button f
APA, Harvard, Vancouver, ISO, and other styles
31

Cai, Ziyun, Yang Long, and Ling Shao. "Adaptive RGB Image Recognition by Visual-Depth Embedding." IEEE Transactions on Image Processing 27, no. 5 (2018): 2471–83. http://dx.doi.org/10.1109/tip.2018.2806839.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Feng, Guanyuan, Lin Ma, and Xuezhi Tan. "Visual Map Construction Using RGB-D Sensors for Image-Based Localization in Indoor Environments." Journal of Sensors 2017 (2017): 1–18. http://dx.doi.org/10.1155/2017/8037607.

Full text
Abstract:
RGB-D sensors capture RGB images and depth images simultaneously, which makes it possible to acquire the depth information at pixel level. This paper focuses on the use of RGB-D sensors to construct a visual map which is an extended dense 3D map containing essential elements for image-based localization, such as poses of the database camera, visual features, and 3D structures of the building. Taking advantage of matched visual features and corresponding depth values, a novel local optimization algorithm is proposed to achieve point cloud registration and database camera pose estimation. Next,
APA, Harvard, Vancouver, ISO, and other styles
33

Salazar, Isail, Said Pertuz, and Fabio Martínez. "Multi-modal RGB-D Image Segmentation from Appearance and Geometric Depth Maps." TecnoLógicas 23, no. 48 (2020): 143–61. http://dx.doi.org/10.22430/22565337.1538.

Full text
Abstract:
Classical image segmentation algorithms exploit the detection of similarities and discontinuities of different visual cues to define and differentiate multiple regions of interest in images. However, due to the high variability and uncertainty of image data, producing accurate results is difficult. In other words, segmentation based just on color is often insufficient for a large percentage of real-life scenes. This work presents a novel multi-modal segmentation strategy that integrates depth and appearance cues from RGB-D images by building a hierarchical region-based representation, i.e., a
APA, Harvard, Vancouver, ISO, and other styles
34

Li, Shipeng, Di Li, Chunhua Zhang, Jiafu Wan, and Mingyou Xie. "RGB-D Image Processing Algorithm for Target Recognition and Pose Estimation of Visual Servo System." Sensors 20, no. 2 (2020): 430. http://dx.doi.org/10.3390/s20020430.

Full text
Abstract:
This paper studies the control performance of visual servoing system under the planar camera and RGB-D cameras, the contribution of this paper is through rapid identification of target RGB-D images and precise measurement of depth direction to strengthen the performance indicators of visual servoing system such as real time and accuracy, etc. Firstly, color images acquired by the RGB-D camera are segmented based on optimized normalized cuts. Next, the gray scale is restored according to the histogram feature of the target image. Then, the obtained 2D graphics depth information and the enhanced
APA, Harvard, Vancouver, ISO, and other styles
35

Sari, Yuita Arum. "A Novel RGB-Depth Imaging Technique for Food Volume Estimation." Jurnal Teknologi Dan Sistem Informasi Bisnis 7, no. 1 (2025): 99–106. https://doi.org/10.47233/jteksis.v7i1.1764.

Full text
Abstract:
Evaluating nutrient intake among patients in a hospital is crucial, as it can accelerate their recovery process. Estimating calorie intake can be achieved by monitoring the quantity of food consumed by patients both before and after their meals. This approach involves various methods, including the use of digital scales, the Comstock level, and digital imaging techniques. Nonetheless, these techniques have their own limitations, particularly the risk of subjective assessments. To reduce errors arising from human factors, an objective evaluation is proposed. This paper introduces a new techniqu
APA, Harvard, Vancouver, ISO, and other styles
36

Kong, Yuqiu, He Wang, Lingwei Kong, Yang Liu, Cuili Yao, and Baocai Yin. "Absolute and Relative Depth-Induced Network for RGB-D Salient Object Detection." Sensors 23, no. 7 (2023): 3611. http://dx.doi.org/10.3390/s23073611.

Full text
Abstract:
Detecting salient objects in complicated scenarios is a challenging problem. Except for semantic features from the RGB image, spatial information from the depth image also provides sufficient cues about the object. Therefore, it is crucial to rationally integrate RGB and depth features for the RGB-D salient object detection task. Most existing RGB-D saliency detectors modulate RGB semantic features with absolution depth values. However, they ignore the appearance contrast and structure knowledge indicated by relative depth values between pixels. In this work, we propose a depth-induced network
APA, Harvard, Vancouver, ISO, and other styles
37

Hristova, H., M. Abegg, C. Fischer, and N. Rehush. "MONOCULAR DEPTH ESTIMATION IN FOREST ENVIRONMENTS." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2022 (May 30, 2022): 1017–23. http://dx.doi.org/10.5194/isprs-archives-xliii-b2-2022-1017-2022.

Full text
Abstract:
Abstract. Depth estimation from a single image is a challenging task, especially inside the highly structured forest environment. In this paper, we propose a supervised deep learning model for monocular depth estimation based on forest imagery. We train our model on a new data set of forest RGB-D images that we collected using a terrestrial laser scanner. Alongside the input RGB image, our model uses a sparse depth channel as input to recover the dense depth information. The prediction accuracy of our model is significantly higher than that of state-of-the-art methods when applied in the conte
APA, Harvard, Vancouver, ISO, and other styles
38

Liu, Botao, Kai Chen, Sheng-Lung Peng, and Ming Zhao. "Depth Map Super-Resolution Based on Semi-Couple Deformable Convolution Networks." Mathematics 11, no. 21 (2023): 4556. http://dx.doi.org/10.3390/math11214556.

Full text
Abstract:
Depth images obtained from lightweight, real-time depth estimation models and consumer-oriented sensors typically have low-resolution issues. Traditional interpolation methods for depth image up-sampling result in a significant information loss, especially in edges with discontinuous depth variations (depth discontinuities). To address this issue, this paper proposes a semi-coupled deformable convolution network (SCD-Net) based on the idea of guided depth map super-resolution (GDSR). The method employs a semi-coupled feature extraction scheme to learn unique and similar features between RGB im
APA, Harvard, Vancouver, ISO, and other styles
39

Du, Qinsheng, Yingxu Bian, Jianyu Wu, Shiyan Zhang, and Jian Zhao. "Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection." Applied Sciences 14, no. 17 (2024): 7440. http://dx.doi.org/10.3390/app14177440.

Full text
Abstract:
The salient object detection (SOD) task aims to automatically detect the most prominent areas observed by the human eye in an image. Since RGB images and depth images contain different information, how to effectively integrate cross-modal features in the RGB-D SOD task remains a major challenge. Therefore, this paper proposes a cross-modal adaptive interaction network (CMANet) for the RGB-D salient object detection task, which consists of a cross-modal feature integration module (CMF) and an adaptive feature fusion module (AFFM). These modules are designed to integrate and enhance multi-scale
APA, Harvard, Vancouver, ISO, and other styles
40

Chinnala Balakrishna and Shepuri Srinivasulu. "Astronomical bodies detection with stacking of CoAtNets by fusion of RGB and depth Images." International Journal of Science and Research Archive 12, no. 2 (2024): 423–27. http://dx.doi.org/10.30574/ijsra.2024.12.2.1234.

Full text
Abstract:
Space situational awareness (SSA) system requires detection of space objects that are varied in sizes, shapes, and types. The space images are difficult because of various factors such as illumination and noise and as a result make the recognition task complex. Image fusion is an important area in image processing for a variety of applications including RGB-D sensor fusion, remote sensing, medical diagnostics, and infrared and visible image fusion. In recent times, various image fusion algorithms have been developed and they showed a superior performance to explore more information that is not
APA, Harvard, Vancouver, ISO, and other styles
41

Zhou, Yang, Danqing Chen, Jun Wu, Mingyi Huang, and Yubin Weng. "Calibration of RGB-D Camera Using Depth Correction Model." Journal of Physics: Conference Series 2203, no. 1 (2022): 012032. http://dx.doi.org/10.1088/1742-6596/2203/1/012032.

Full text
Abstract:
Abstract This paper proposes a calibration method of RGB-D camera, especially its depth camera. First, use a checkerboard calibration board under auxiliary Infrared light source to collect calibration images. Then, the internal and external parameters of the depth camera are calculated by Zhang’s calibration method, which improves the accuracy of the internal parameter. Next, the depth correction model is proposed to directly calibrate the distortion of the depth image, which is more intuitive and faster than the disparity distortion correction model. This method is simple, high-precision, and
APA, Harvard, Vancouver, ISO, and other styles
42

Vashpanov, Yuriy, Jung-Young Son, Gwanghee Heo, Tatyana Podousova, and Yong Suk Kim. "Determination of Geometric Parameters of Cracks in Concrete by Image Processing." Advances in Civil Engineering 2019 (October 30, 2019): 1–14. http://dx.doi.org/10.1155/2019/2398124.

Full text
Abstract:
The 8-bit RGB image of a cracked concrete surface, obtained with a high-resolution camera based on a close-distance photographing and using an optical microscope, is used to estimate the geometrical parameters of the crack. The parameters such as the crack’s width, depth, and morphology can be determined by the pixel intensity distribution of the image. For the estimation, the image is transformed into 16-bit gray scale to enhance the geometrical parameters of the crack and then a mathematical relationship relating the intensity distribution with the depth and width is derived based on the enh
APA, Harvard, Vancouver, ISO, and other styles
43

Chi, Chen Tung, Shih Chien Yang, and Yin Tien Wang. "Calibration of RGB-D Sensors for Robot SLAM." Applied Mechanics and Materials 479-480 (December 2013): 677–81. http://dx.doi.org/10.4028/www.scientific.net/amm.479-480.677.

Full text
Abstract:
This paper presents a calibration procedure for a Kinect RGB-D sensor and its application to robot simultaneous localization and mapping(SLAM). The calibration procedure consists of two stages: in the first stage, the RGB image is aligned with the depth image by using the bilinear interpolation. The distorted RGB image is further corrected in the second stage. The calibrated RGB-D sensor is used as the sensing device for robot navigation in unknown environment. In SLAM tasks, the speeded-up robust features (SURF) are detected from the RGB image and used as landmarks in the environment map. The
APA, Harvard, Vancouver, ISO, and other styles
44

Xu, Xinhua, Hong Liu, Jianbing Wu, and Jinfu Liu. "PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 9 (2025): 8969–77. https://doi.org/10.1609/aaai.v39i9.32970.

Full text
Abstract:
The integration of RGB and depth modalities significantly enhances the accuracy of segmenting complex indoor scenes, with depth data from RGB-D cameras playing a crucial role in this improvement. However, collecting an RGB-D dataset is more expensive than an RGB dataset due to the need for specialized depth sensors. Aligning depth and RGB images also poses challenges due to sensor positioning and issues like missing data and noise. In contrast, Pseudo Depth (PD) from high-precision depth estimation algorithms can eliminate the dependence on RGB-D sensors and alignment processes, as well as pro
APA, Harvard, Vancouver, ISO, and other styles
45

Zeng, Hui, Bin Yang, Xiuqing Wang, Jiwei Liu, and Dongmei Fu. "RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory." Sensors 19, no. 3 (2019): 529. http://dx.doi.org/10.3390/s19030529.

Full text
Abstract:
With the development of low-cost RGB-D (Red Green Blue-Depth) sensors, RGB-D object recognition has attracted more and more researchers’ attention in recent years. The deep learning technique has become popular in the field of image analysis and has achieved competitive results. To make full use of the effective identification information in the RGB and depth images, we propose a multi-modal deep neural network and a DS (Dempster Shafer) evidence theory based RGB-D object recognition method. First, the RGB and depth images are preprocessed and two convolutional neural networks are trained, res
APA, Harvard, Vancouver, ISO, and other styles
46

Lin, Jiaying, Yuen-Hei Yeung, Shuquan Ye, and Rynson W. H. Lau. "Leveraging RGB-D Data with Cross-Modal Context Mining for Glass Surface Detection." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 5 (2025): 5254–61. https://doi.org/10.1609/aaai.v39i5.32558.

Full text
Abstract:
Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This, however, poses substantial challenges to the operations of autonomous systems such as robots, self-driving cars, and drones, as the glass panels can become transparent obstacles to navigation. Existing works attempt to exploit various cues, including glass boundary context or reflections, as a prior. However, they are all based on input RGB images. We observe that the transmission of 3D depth sensor light through glass surfaces often produces blank regions in the depth maps, which c
APA, Harvard, Vancouver, ISO, and other styles
47

Zhang, Zhijie, Yan Liu, Junjie Chen, Li Niu, and Liqing Zhang. "Depth Privileged Object Detection in Indoor Scenes via Deformation Hallucination." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (2021): 3456–64. http://dx.doi.org/10.1609/aaai.v35i4.16459.

Full text
Abstract:
RGB-D object detection has achieved significant advance, because depth provides complementary geometric information to RGB images. Considering depth images are unavailable in some scenarios, we focus on depth privileged object detection in indoor scenes, where the depth images are only available in the training phase. Under this setting, one prevalent research line is modality hallucination, in which depth image and depth feature are the common choices for hallucinating. In contrast, we choose to hallucinate depth deformation, which is explicit geometric information and efficient to hallucinat
APA, Harvard, Vancouver, ISO, and other styles
48

Gopalapillai, Radhakrishnan, Deepa Gupta, Mohammed Zakariah, and Yousef Ajami Alotaibi. "Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification." Sensors 21, no. 23 (2021): 7950. http://dx.doi.org/10.3390/s21237950.

Full text
Abstract:
Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel’s local surface normal (HHA) to apply transfer learning using networks
APA, Harvard, Vancouver, ISO, and other styles
49

Liu, Rui, Shuwei He, Yifan Hu, and Haizhou Li. "Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 23 (2025): 24632–40. https://doi.org/10.1609/aaai.v39i23.34643.

Full text
Abstract:
Visual Text-to-Speech (VTTS) aims to take the environmental image as the prompt to synthesize the reverberant speech for the spoken content. The challenge of this task lies in understanding the spatial environment from the image. Many attempts have been made to extract global spatial visual information from the RGB space of an spatial image. However, local and depth image information are crucial for understanding the spatial environment, which previous works have ignored. To address the issues, we propose a novel multi-modal and multi-scale spatial environment understanding scheme to achieve i
APA, Harvard, Vancouver, ISO, and other styles
50

Chen, Songnan, Mengxia Tang, and Jiangming Kan. "Predicting Depth from Single RGB Images with Pyramidal Three-Streamed Networks." Sensors 19, no. 3 (2019): 667. http://dx.doi.org/10.3390/s19030667.

Full text
Abstract:
Predicting depth from a monocular image is an ill-posed and inherently ambiguous issue in computer vision. In this paper, we propose a pyramidal third-streamed network (PTSN) that recovers the depth information using a single given RGB image. PTSN uses pyramidal structure images, which can extract multiresolution features to improve the robustness of the network as the network input. The full connection layer is changed into fully convolutional layers with a new upconvolution structure, which reduces the network parameters and computational complexity. We propose a new loss function including
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!