Academic literature on the topic 'RGB-D video'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'RGB-D video.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "RGB-D video"

1

Md, Kamal Uddin, Bhuiyan Amran, and Hasan Mahmudul. "Fusion in Dissimilarity Space Between RGB-D and Skeleton for Person Re-Identification." International Journal of Innovative Technology and Exploring Engineering (IJITEE) 10, no. 12 (2021): 69–75. https://doi.org/10.35940/ijitee.L9566.10101221.

Full text
Abstract:
Person re-identification (Re-id) is one of the important tools of video surveillance systems, which aims to recognize an individual across the multiple disjoint sensors of a camera network. Despite the recent advances on RGB camera-based person re-identification methods under normal lighting conditions, Re-id researchers fail to take advantages of modern RGB-D sensor-based additional information (e.g. depth and skeleton information). When traditional RGB-based cameras fail to capture the video under poor illumination conditions, RGB-D sensor-based additional information can be advantageous to tackle these constraints. This work takes depth images and skeleton joint points as additional information along with RGB appearance cues and proposes a person re-identification method. We combine 4-channel RGB-D image features with skeleton information using score-level fusion strategy in dissimilarity space to increase re-identification accuracy. Moreover, our propose method overcomes the illumination problem because we use illumination invariant depth image and skeleton information. We carried out rigorous experiments on two publicly available RGBD-ID re-identification datasets and proved the use of combined features of 4-channel RGB-D images and skeleton information boost up the rank 1 recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
2

Uddin, Md Kamal, Amran Bhuiyan, and Mahmudul Hasan. "Fusion in Dissimilarity Space Between RGB D and Skeleton for Person Re Identification." International Journal of Innovative Technology and Exploring Engineering 10, no. 12 (2021): 69–75. http://dx.doi.org/10.35940/ijitee.l9566.10101221.

Full text
Abstract:
Person re-identification (Re-id) is one of the important tools of video surveillance systems, which aims to recognize an individual across the multiple disjoint sensors of a camera network. Despite the recent advances on RGB camera-based person re-identification methods under normal lighting conditions, Re-id researchers fail to take advantages of modern RGB-D sensor-based additional information (e.g. depth and skeleton information). When traditional RGB-based cameras fail to capture the video under poor illumination conditions, RGB-D sensor-based additional information can be advantageous to tackle these constraints. This work takes depth images and skeleton joint points as additional information along with RGB appearance cues and proposes a person re-identification method. We combine 4-channel RGB-D image features with skeleton information using score-level fusion strategy in dissimilarity space to increase re-identification accuracy. Moreover, our propose method overcomes the illumination problem because we use illumination invariant depth image and skeleton information. We carried out rigorous experiments on two publicly available RGBD-ID re-identification datasets and proved the use of combined features of 4-channel RGB-D images and skeleton information boost up the rank 1 recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
3

Yue, Ya Jie, Xiao Jing Zhang, and Chen Ming Sha. "The Design of Wireless Video Monitoring System Based on FPGA." Advanced Materials Research 981 (July 2014): 612–15. http://dx.doi.org/10.4028/www.scientific.net/amr.981.612.

Full text
Abstract:
The wireless video monitoring system contains the video acquisition device,video transmission device,video storage device and VGA display device.In this paper,we use video acquisition device to collect video siganals in real-time.The analog video signal is transmitted by using wireless technology.The video signal is converted to a digital signal by using the dedicated A/D chip. At the same time ,the YCrCb signals will be converted into RGB signals by the format converting module.Then,the digital RGB signals are converted to analog RGB signals through the D/A,and they are finally displayed on the VGA monitor in real-time.The design mainly uses the wireless transmission technology to transmit analog video signals and uses ADV7181 to decode.The controlling system of FPGA deals with the decoded digital signals which will be transmitted to the D/A and the data finally will display in real time.
APA, Harvard, Vancouver, ISO, and other styles
4

Sharma, Richa, Manoj Sharma, Ankit Shukla, and Santanu Chaudhury. "Conditional Deep 3D-Convolutional Generative Adversarial Nets for RGB-D Generation." Mathematical Problems in Engineering 2021 (November 11, 2021): 1–8. http://dx.doi.org/10.1155/2021/8358314.

Full text
Abstract:
Generation of synthetic data is a challenging task. There are only a few significant works on RGB video generation and no pertinent works on RGB-D data generation. In the present work, we focus our attention on synthesizing RGB-D data which can further be used as dataset for various applications like object tracking, gesture recognition, and action recognition. This paper has put forward a proposal for a novel architecture that uses conditional deep 3D-convolutional generative adversarial networks to synthesize RGB-D data by exploiting 3D spatio-temporal convolutional framework. The proposed architecture can be used to generate virtually unlimited data. In this work, we have presented the architecture to generate RGB-D data conditioned on class labels. In the architecture, two parallel paths were used, one to generate RGB data and the second to synthesize depth map. The output from the two parallel paths is combined to generate RGB-D data. The proposed model is used for video generation at 30 fps (frames per second). The frame referred here is an RGB-D with the spatial resolution of 512 × 512.
APA, Harvard, Vancouver, ISO, and other styles
5

Martínez Carrillo, Fabio, Fabián Castillo, and Lola Bautista. "3D+T dense motion trajectories as kinematics primitives to recognize gestures on depth video sequences." Revista Politécnica 15, no. 29 (2019): 82–94. http://dx.doi.org/10.33571/rpolitec.v15n29a7.

Full text
Abstract:
RGB-D sensors have allowed attacking many classical problems in computer vision such as segmentation, scene representations and human interaction, among many others. Regarding motion characterization, typical RGB-D strategies are limited to namely analyze global shape changes and capture scene flow fields to describe local motions in depth sequences. Nevertheless, such strategies only recover motion information among a couple of frames, limiting the analysis of coherent large displacements along time. This work presents a novel strategy to compute 3D+t dense and long motion trajectories as fundamental kinematic primitives to represent video sequences. Each motion trajectory models kinematic words primitives that together can describe complex gestures developed along videos. Such kinematic words were processed into a bag-of-kinematic-words framework to obtain an occurrence video descriptor. The novel video descriptor based on 3D+t motion trajectories achieved an average accuracy of 80% in a dataset of 5 gestures and 100 videos.
APA, Harvard, Vancouver, ISO, and other styles
6

Aubry, Sophie, Sohaib Laraba, Joëlle Tilmanne, and Thierry Dutoit. "Action recognition based on 2D skeletons extracted from RGB videos." MATEC Web of Conferences 277 (2019): 02034. http://dx.doi.org/10.1051/matecconf/201927702034.

Full text
Abstract:
In this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D images. In this work, a solution is proposed requiring only the use of RGB videos instead of RGB-D videos. This work is based on multiple works studying the conversion of RGB-D data into 2D images. From a video stream (RGB images), a two-dimension skeleton of 18 joints for each detected body is extracted with a DNN-based human pose estimator called OpenPose. The skeleton data are encoded into Red, Green and Blue channels of images. Different ways of encoding motion data into images were studied. We successfully use state-of-the-art deep neural networks designed for image classification to recognize actions. Based on a study of the related works, we chose to use image classification models: SqueezeNet, AlexNet, DenseNet, ResNet, Inception, VGG and retrained them to perform action recognition. For all the test the NTU RGB+D database is used. The highest accuracy is obtained with ResNet: 83.317% cross-subject and 88.780% cross-view which outperforms most of state-of-the-art results.
APA, Harvard, Vancouver, ISO, and other styles
7

Bertholet, P., A. E. Ichim, and M. Zwicker. "Temporally Consistent Motion Segmentation From RGB-D Video." Computer Graphics Forum 37, no. 6 (2018): 118–34. http://dx.doi.org/10.1111/cgf.13316.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cho, Junsu, Seungwon Kim, Chi-Min Oh, and Jeong-Min Park. "Auxiliary Task Graph Convolution Network: A Skeleton-Based Action Recognition for Practical Use." Applied Sciences 15, no. 1 (2024): 198. https://doi.org/10.3390/app15010198.

Full text
Abstract:
Graph convolution networks (GCNs) have been extensively researched for action recognition by estimating human skeletons from video clips. However, their image sampling methods are not practical because they require video-length information for sampling images. In this study, we propose an Auxiliary Task Graph Convolution Network (AT-GCN) with low and high-frame pathways while supporting a new sampling method. AT-GCN learns actions at a defined frame rate in the defined range with three losses: fuse, slow, and fast losses. AT-GCN handles the slow and fast losses in two auxiliary tasks, while the mainstream handles the fuse loss. AT-GCN outperforms the original State-of-the-Art model on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets while maintaining the same inference time. AT-GCN shows the best performance on the NTU RGB+D dataset at 90.3% from subjects, 95.2 from view benchmarks, on the NTU RGB+D 120 dataset at 86.5% from subjects, 87.6% from set benchmarks, and at 93.5% on the NW-UCLA dataset as top-1 accuracy.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhu, Xiaoguang, Ye Zhu, Haoyu Wang, Honglin Wen, Yan Yan, and Peilin Liu. "Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition." ACM Transactions on Multimedia Computing, Communications, and Applications 18, no. 3 (2022): 1–24. http://dx.doi.org/10.1145/3491228.

Full text
Abstract:
Action recognition has been a heated topic in computer vision for its wide application in vision systems. Previous approaches achieve improvement by fusing the modalities of the skeleton sequence and RGB video. However, such methods pose a dilemma between the accuracy and efficiency for the high complexity of the RGB video network. To solve the problem, we propose a multi-modality feature fusion network to combine the modalities of the skeleton sequence and RGB frame instead of the RGB video, as the key information contained by the combination of the skeleton sequence and RGB frame is close to that of the skeleton sequence and RGB video. In this way, complementary information is retained while the complexity is reduced by a large margin. To better explore the correspondence of the two modalities, a two-stage fusion framework is introduced in the network. In the early fusion stage, we introduce a skeleton attention module that projects the skeleton sequence on the single RGB frame to help the RGB frame focus on the limb movement regions. In the late fusion stage, we propose a cross-attention module to fuse the skeleton feature and the RGB feature by exploiting the correlation. Experiments on two benchmarks, NTU RGB+D and SYSU, show that the proposed model achieves competitive performance compared with the state-of-the-art methods while reducing the complexity of the network.
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Xiaoqin, Yasar Ahmet Sekercioglu, Tom Drummond, Enrico Natalizio, Isabelle Fantoni, and Vincent Fremont. "Fast Depth Video Compression for Mobile RGB-D Sensors." IEEE Transactions on Circuits and Systems for Video Technology 26, no. 4 (2016): 673–86. http://dx.doi.org/10.1109/tcsvt.2015.2416571.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "RGB-D video"

1

Shen, Ju. "Computational Multimedia for Video Self Modeling." UKnowledge, 2014. http://uknowledge.uky.edu/cs_etds/26.

Full text
Abstract:
Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras.
APA, Harvard, Vancouver, ISO, and other styles
2

Lai, Po Kong. "Immersive Dynamic Scenes for Virtual Reality from a Single RGB-D Camera." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39663.

Full text
Abstract:
In this thesis we explore the concepts and components which can be used as individual building blocks for producing immersive virtual reality (VR) content from a single RGB-D sensor. We identify the properties of immersive VR videos and propose a system composed of a foreground/background separator, a dynamic scene re-constructor and a shape completer. We initially explore the foreground/background separator component in the context of video summarization. More specifically, we examined how to extract trajectories of moving objects from video sequences captured with a static camera. We then present a new approach for video summarization via minimization of the spatial-temporal projections of the extracted object trajectories. New evaluation criterion are also presented for video summarization. These concepts of foreground/background separation can then be applied towards VR scene creation by extracting relative objects of interest. We present an approach for the dynamic scene re-constructor component using a single moving RGB-D sensor. By tracking the foreground objects and removing them from the input RGB-D frames we can feed the background only data into existing RGB-D SLAM systems. The result is a static 3D background model where the foreground frames are then super-imposed to produce a coherent scene with dynamic moving foreground objects. We also present a specific method for extracting moving foreground objects from a moving RGB-D camera along with an evaluation dataset with benchmarks. Lastly, the shape completer component takes in a single view depth map of an object as input and "fills in" the occluded portions to produce a complete 3D shape. We present an approach that utilizes a new data minimal representation, the additive depth map, which allows traditional 2D convolutional neural networks to accomplish the task. The additive depth map represents the amount of depth required to transform the input into the "back depth map" which would exist if there was a sensor exactly opposite of the input. We train and benchmark our approach using existing synthetic datasets and also show that it can perform shape completion on real world data without fine-tuning. Our experiments show that our data minimal representation can achieve comparable results to existing state-of-the-art 3D networks while also being able to produce higher resolution outputs.
APA, Harvard, Vancouver, ISO, and other styles
3

Boaretto, Marco Antonio Reichert. "Machine learning techniques applied in human recognition using RGB-D videos." reponame:Repositório Institucional da UFPR, 2017. http://hdl.handle.net/1884/52576.

Full text
Abstract:
Orientador : Prof. Dr. Leandro dos Santos Coelho<br>Dissertação (mestrado) - Universidade Federal do Paraná, Setor de Tecnologia, Programa de Pós-Graduação em Engenharia Elétrica. Defesa: Curitiba, 22/11/2017<br>Inclui referências : f. 84-95<br>Resumo: De acordo com certas particularidades e dificuldades em lidar com imagens 2D, como por exemplo iluminação e obstrução de objetos, uma melhor opção para o problema em questão é utilizar imagens três dimensões (3D) ou Red, Green and Blue - Depth (RGB-D) como comumente são chamadas. Imagens RGB-D são invariantes a luz pelo fato da maioria dos seus dispositivos de aquisição utilizarem infravermelho ou sensores de laser time-of-flight. O Kinect da Microsoft® que foi desenvolvido em parceria com a PrimeSense é uma ferramenta incrível para aquisição de imagens RGB-D de baixa resolução, suas aplicações variam de jogos a imagens médicas. Como o Kinect possui um custo acessível, vem sendo muito utilizado em pesquisas de diversas áreas que fazem uso de visão computacional e classificação de imagens. Diversas base de dados para classificação de imagens RGB-D já foram desenvolvidas com o Kinect, como por exemplo a base de dados multimodal de atividade humana (MHAD) desenvolvido pelo laboratório de tele imersão da Universidade de Califórnia em parceria com o Centro de Ciências de Imagem da Universidade John Hopkins, na qual contem imagens de 10 pessoas desenvolvendo 11 atividades: pulando no lugar (pular), polichinelo (polichinelo), curvando o corpo para frente até o chão (curvar), socando (socar), acenando com as duas mãos (acenando2), acenando com a mão direita (acenando), batendo palmas (palmas), arremessando uma bola (arremessar), sentar e ficar de pé (sentar+levantar), sentando (sentar), ficando de pé (levantar). O principal objetivo da dissertação consiste em comparar duas abordagens de aprendizado de máquina, (i) usando um proposto comitê de máquina com Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost) e Artificial Neural Networks (ANN) combinado com três diferentes técnicas de redução de dimensionalidade Principal Component Analysis (PCA), Factor Analysis (FA) e Nonnegative Matrix Factorization (NMF) e (ii) de uma abordagem de aprendizado profundo usando uma proposta arquitetura de Convolutional Neural Network (CNN) chamada de BOANet, usando o MHAD como base de dados. A contribuição do projeto consiste em um sistema de reconhecimento de atividade humana que usa o Kinect para reconhecimento de imagens RGB-D e algoritmos de aprendizado de máquina para construir um modelo classificador. As abordagens propostas tiveram sua performance comparada com valores de referência de recentes trabalhos com o MHAD da literatura. Ambas abordagens tiveram ótima performance obtendo resultados melhores do que a maioria dos valores referência da literatura, a abordagem (i) conseguiu atingir um valor de 99.93% de precisão de classificação e a (ii) 99.05%. Palavras-chave: RGB-D, Kinect, Aprendizado de máquina, Aprendizado profundo, Reconhecimento de Atividade Humana.<br>Abstract: Given the particularities and issues on dealing with two Dimensions (2D) images, as illumination and object occlusion, one better option to counteract this matter is to work with three Dimensions (3D) images or Red, Green and Blue - Depth (RGBD) as they are usually called. RGB-D images are invariant of illumination since mostly of its acquisition devices use infra-red or time-of-flight laser sensors. The Microsoft® Kinect developed in partnership with PrimeSense is an amazing tool for RGB-D low resolution image acquisition, which its applications vary from gaming to medical imagery. Since Kinect has an accessible cost, it has been widely used in researches on many areas that use computer vision and image classification. Several datasets have already been developed with the Kinect for RGB-D image classification, as for example the Berkeley's Multimodal Human Activity Database (MHAD) from the Tele immersion Laboratory of University of California and the Center for Imaging Science of Johns Hopkins University, which contain images of 10 subjects performing 11 activities: jumping in place (jump), jumping jacks (jack), bending-hands up all the way down (bend), punching (punch), waving two hands (wave2), waving right hand (wave1), clapping hands (clap), throwing a ball (throw), sit down and stand up (sit +stand), sit down (sit), stand up (stand). The main goal of this dissertation is to compare different machine learning approaches, (i) using a proposed ensemble learning technique with Support Vector Machines (SVM), K-Nearest Neighbors (kNN), Extreme Gradient Boosting (XGBoost) and Artificial Neural Networks (ANN) combined with three different dimensionality reduction techniques Principal Component Analysis (PCA), Factor Analysis (FA) and Nonnegative Matrix Factorization (NMF) and (ii) from the Deep Learning (DL) approach using a proposed convolutional neural network (CNN) architecture known as BOANet, using the MHAD as Dataset. The contribution of the project consists on a human activity recognition system (HAR) that uses Kinect for RGB-D image recognition and machine learning algorithm to build the model classifier. The proposed approaches have its performance compared with reference values from recent works with the MHAD of the literature. Both approaches got remarkable performance having better results than most of the reference values from the literature, the (i) approach achieved 99.93% of classification accuracy and (ii) achieved 99.05% of classification accuracy. Key-words: RGB-D, Kinect, Machine Learning, Deep Learning, Human Activity Recognition.
APA, Harvard, Vancouver, ISO, and other styles
4

Devanne, Maxime. "3D human behavior understanding by shape analysis of human motion and pose." Thesis, Lille 1, 2015. http://www.theses.fr/2015LIL10138/document.

Full text
Abstract:
L'émergence de capteurs de profondeur capturant la structure 3D de la scène et du corps humain offre de nouvelles possibilités pour l'étude du mouvement et la compréhension des comportements humains. Cependant, la conception et le développement de modules de reconnaissance de comportements à la fois précis et efficaces est une tâche difficile en raison de la variabilité de la posture humaine, la complexité du mouvement et les interactions avec l'environnement. Dans cette thèse, nous nous concentrons d'abord sur le problème de la reconnaissance d'actions en représentant la trajectoire du corps humain au cours du temps, capturant ainsi simultanément la forme du corps et la dynamique du mouvement. Le problème de la reconnaissance d'actions est alors formulé comme le calcul de similitude entre la forme des trajectoires dans un cadre Riemannien. Les expériences menées sur quatre bases de données démontrent le potentiel de la solution en termes de précision/temps de latence de la reconnaissance d'actions. Deuxièmement, nous étendons l'étude aux comportements plus complexes en analysant l'évolution de la forme de la posture pour décomposer la séquence en unités de mouvement. Chaque unité de mouvement est alors caractérisée par la trajectoire de mouvement et l'apparence autour des mains, de manière à décrire le mouvement humain et l'interaction avec les objets. Enfin, la séquence de segments temporels est modélisée par un classifieur Bayésien naïf dynamique. Les expériences menées sur quatre bases de données évaluent le potentiel de l'approche dans différents contextes de reconnaissance et détection en ligne de comportements<br>The emergence of RGB-D sensors providing the 3D structure of both the scene and the human body offers new opportunities for studying human motion and understanding human behaviors. However, the design and development of models for behavior recognition that are both accurate and efficient is a challenging task due to the variability of the human pose, the complexity of human motion and possible interactions with the environment. In this thesis, we first focus on the action recognition problem by representing human action as the trajectory of 3D coordinates of human body joints over the time, thus capturing simultaneously the body shape and the dynamics of the motion. The action recognition problem is then formulated as the problem of computing the similarity between shape of trajectories in a Riemannian framework. Experiments carried out on four representative benchmarks demonstrate the potential of the proposed solution in terms of accuracy/latency for a low-latency action recognition. Second, we extend the study to more complex behaviors by analyzing the evolution of the human pose shape to decompose the motion stream into short motion units. Each motion unit is then characterized by the motion trajectory and depth appearance around hand joints, so as to describe the human motion and interaction with objects. Finally, the sequence of temporal segments is modeled through a Dynamic Naive Bayesian Classifier. Experiments on four representative datasets evaluate the potential of the proposed approach in different contexts, including recognition and online detection of behaviors
APA, Harvard, Vancouver, ISO, and other styles
5

Pham, Huy-Hieu. "Architectures d'apprentissage profond pour la reconnaissance d'actions humaines dans des séquences vidéo RGB-D monoculaires : application à la surveillance dans les transports publics." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30145.

Full text
Abstract:
Cette thèse porte sur la reconnaissance d'actions humaines dans des séquences vidéo RGB-D monoculaires. La question principale est, à partir d'une vidéo ou d'une séquence d'images donnée, de savoir comment reconnaître des actions particulières qui se produisent. Cette tâche est importante et est un défi majeur à cause d'un certain nombre de verrous scientifiques induits par la variabilité des conditions d'acquisition, comme l'éclairage, la position, l'orientation et le champ de vue de la caméra, ainsi que par la variabilité de la réalisation des actions, notamment de leur vitesse d'exécution. Pour surmonter certaines de ces difficultés, dans un premier temps, nous examinons et évaluons les techniques les plus récentes pour la reconnaissance d'actions dans des vidéos. Nous proposons ensuite une nouvelle approche basée sur des réseaux de neurones profonds pour la reconnaissance d'actions humaines à partir de séquences de squelettes 3D. Deux questions clés ont été traitées. Tout d'abord, comment représenter la dynamique spatio-temporelle d'une séquence de squelettes pour exploiter efficacement la capacité d'apprentissage des représentations de haut niveau des réseaux de neurones convolutifs (CNNs ou ConvNets). Ensuite, comment concevoir une architecture de CNN capable d'apprendre des caractéristiques spatio-temporelles discriminantes à partir de la représentation proposée dans un objectif de classification. Pour cela, nous introduisons deux nouvelles représentations du mouvement 3D basées sur des squelettes, appelées SPMF (Skeleton Posture-Motion Feature) et Enhanced-SPMF, qui encodent les postures et les mouvements humains extraits des séquences de squelettes sous la forme d'images couleur RGB. Pour les tâches d'apprentissage et de classification, nous proposons différentes architectures de CNNs, qui sont basées sur les modèles Residual Network (ResNet), Inception-ResNet-v2, Densely Connected Convolutional Network (DenseNet) et Efficient Neural Architecture Search (ENAS), pour extraire des caractéristiques robustes de la représentation sous forme d'image que nous proposons et pour les classer. Les résultats expérimentaux sur des bases de données publiques (MSR Action3D, Kinect Activity Recognition Dataset, SBU Kinect Interaction, et NTU-RGB+D) montrent que notre approche surpasse les méthodes de l'état de l'art. Nous proposons également une nouvelle technique pour l'estimation de postures humaines à partir d'une vidéo RGB. Pour cela, le modèle d'apprentissage profond appelé OpenPose est utilisé pour détecter les personnes et extraire leur posture en 2D. Un réseau de neurones profond est ensuite proposé pour apprendre la transformation permettant de reconstruire ces postures en trois dimensions. Les résultats expérimentaux sur la base de données Human3.6M montrent l'efficacité de la méthode proposée. Ces résultats ouvrent des perspectives pour une approche de la reconnaissance d'actions humaines à partir des séquences de squelettes 3D sans utiliser des capteurs de profondeur comme la Kinect. Nous avons également constitué la base CEMEST, une nouvelle base de données RGB-D illustrant des comportements de passagers dans les transports publics. Elle contient 203 vidéos de surveillance collectées dans une station du métro incluant des événements "normaux" et "anormaux". Nous avons obtenu des résultats prometteurs sur cette base en utilisant des techniques d'augmentation de données et de transfert d'apprentissage. Notre approche permet de concevoir des applications basées sur des techniques de l'apprentissage profond pour renforcer la qualité des services de transport en commun<br>This thesis is dealing with automatic recognition of human actions from monocular RGB-D video sequences. Our main goal is to recognize which human actions occur in unknown videos. This problem is a challenging task due to a number of obstacles caused by the variability of the acquisition conditions, including the lighting, the position, the orientation and the field of view of the camera, as well as the variability of actions which can be performed differently, notably in terms of speed. To tackle these problems, we first review and evaluate the most prominent state-of-the-art techniques to identify the current state of human action recognition in videos. We then propose a new approach for skeleton-based action recognition using Deep Neural Networks (DNNs). Two key questions have been addressed. First, how to efficiently represent the spatio-temporal patterns of skeletal data for fully exploiting the capacity in learning high-level representations of Deep Convolutional Neural Networks (D-CNNs). Second, how to design a powerful D-CNN architecture that is able to learn discriminative features from the proposed representation for classification task. As a result, we introduce two new 3D motion representations called SPMF (Skeleton Posture-Motion Feature) and Enhanced-SPMF that encode skeleton poses and their motions into color images. For learning and classification tasks, we design and train different D-CNN architectures based on the Residual Network (ResNet), Inception-ResNet-v2, Densely Connected Convolutional Network (DenseNet) and Efficient Neural Architecture Search (ENAS) to extract robust features from color-coded images and classify them. Experimental results on various public and challenging human action recognition datasets (MSR Action3D, Kinect Activity Recognition Dataset, SBU Kinect Interaction, and NTU-RGB+D) show that the proposed approach outperforms current state-of-the-art. We also conducted research on the problem of 3D human pose estimation from monocular RGB video sequences and exploited the estimated 3D poses for recognition task. Specifically, a deep learning-based model called OpenPose is deployed to detect 2D human poses. A DNN is then proposed and trained for learning a 2D-to-3D mapping in order to map the detected 2D keypoints into 3D poses. Our experiments on the Human3.6M dataset verified the effectiveness of the proposed method. These obtained results allow opening a new research direction for human action recognition from 3D skeletal data, when the depth cameras are failing. In addition, we collect and introduce in this thesis, CEMEST database, a new RGB-D dataset depicting passengers' behaviors in public transport. It consists of 203 untrimmed real-world surveillance videos of realistic "normal" and "abnormal" events. We achieve promising results on CEMEST with the support of data augmentation and transfer learning techniques. This enables the construction of real-world applications based on deep learning for enhancing public transportation management services
APA, Harvard, Vancouver, ISO, and other styles
6

Chiron, Guillaume. "Système complet d’acquisition vidéo, de suivi de trajectoires et de modélisation comportementale pour des environnements 3D naturellement encombrés : application à la surveillance apicole." Thesis, La Rochelle, 2014. http://www.theses.fr/2014LAROS030/document.

Full text
Abstract:
Ce manuscrit propose une approche méthodologique pour la constitution d’une chaîne complète de vidéosurveillance pour des environnements naturellement encombrés. Nous identifions et levons un certain nombre de verrous méthodologiques et technologiques inhérents : 1) à l’acquisition de séquences vidéo en milieu naturel, 2) au traitement d’images, 3) au suivi multi-cibles, 4) à la découverte et la modélisation de motifs comportementaux récurrents, et 5) à la fusion de données. Le contexte applicatif de nos travaux est la surveillance apicole, et en particulier, l’étude des trajectoires des abeilles en vol devant la ruche. De ce fait, cette thèse se présente également comme une étude de faisabilité et de prototypage dans le cadre des deux projets interdisciplinaires EPERAS et RISQAPI (projets menées en collaboration avec l’INRA Magneraud et le Muséum National d’Histoire Naturelle). Il s’agit pour nous informaticiens et pour les biologistes qui nous ont accompagnés, d’un domaine d’investigation totalement nouveau, pour lequel les connaissances métiers, généralement essentielles à ce genre d’applications, restent encore à définir. Contrairement aux approches existantes de suivi d’insectes, nous proposons de nous attaquer au problème dans l’espace à trois dimensions grâce à l’utilisation d’une caméra stéréovision haute fréquence. Dans ce contexte, nous détaillons notre nouvelle méthode de détection de cibles appelée segmentation HIDS. Concernant le calcul des trajectoires, nous explorons plusieurs approches de suivi de cibles, s’appuyant sur plus ou moins d’a priori, susceptibles de supporter les conditions extrêmes de l’application (e.g. cibles nombreuses, de petite taille, présentant un mouvement chaotique). Une fois les trajectoires collectées, nous les organisons selon une structure de données hiérarchique et mettons en œuvre une approche Bayésienne non-paramétrique pour la découverte de comportements émergents au sein de la colonie d’insectes. L’analyse exploratoire des trajectoires issues de la scène encombrée s’effectue par classification non supervisée, simultanément sur des niveaux sémantiques différents, et où le nombre de clusters pour chaque niveau n’est pas défini a priori mais est estimé à partir des données. Cette approche est dans un premier temps validée à l’aide d’une pseudo-vérité terrain générée par un Système Multi-Agents, puis dans un deuxième temps appliquée sur des données réelles<br>This manuscript provides the basis for a complete chain of videosurveillence for naturally cluttered environments. In the latter, we identify and solve the wide spectrum of methodological and technological barriers inherent to : 1) the acquisition of video sequences in natural conditions, 2) the image processing problems, 3) the multi-target tracking ambiguities, 4) the discovery and the modeling of recurring behavioral patterns, and 5) the data fusion. The application context of our work is the monitoring of honeybees, and in particular the study of the trajectories bees in flight in front of their hive. In fact, this thesis is part a feasibility and prototyping study carried by the two interdisciplinary projects EPERAS and RISQAPI (projects undertaken in collaboration with INRA institute and the French National Museum of Natural History). It is for us, computer scientists, and for biologists who accompanied us, a completely new area of investigation for which the scientific knowledge, usually essential for such applications, are still in their infancy. Unlike existing approaches for monitoring insects, we propose to tackle the problem in the three-dimensional space through the use of a high frequency stereo camera. In this context, we detail our new target detection method which we called HIDS segmentation. Concerning the computation of trajectories, we explored several tracking approaches, relying on more or less a priori, which are able to deal with the extreme conditions of the application (e.g. many targets, small in size, following chaotic movements). Once the trajectories are collected, we organize them according to a given hierarchical data structure and apply a Bayesian nonparametric approach for discovering emergent behaviors within the colony of insects. The exploratory analysis of the trajectories generated by the crowded scene is performed following an unsupervised classification method simultaneously over different levels of semantic, and where the number of clusters for each level is not defined a priori, but rather estimated from the data only. This approach is has been validated thanks to a ground truth generated by a Multi-Agent System. Then we tested it in the context of real data
APA, Harvard, Vancouver, ISO, and other styles
7

Chakib, Reda. "Acquisition et rendu 3D réaliste à partir de périphériques "grand public"." Thesis, Limoges, 2018. http://www.theses.fr/2018LIMO0101/document.

Full text
Abstract:
L'imagerie numérique, de la synthèse d'images à la vision par ordinateur est en train de connaître une forte évolution, due entre autres facteurs à la démocratisation et au succès commercial des caméras 3D. Dans le même contexte, l'impression 3D grand public, qui est en train de vivre un essor fulgurant, contribue à la forte demande sur ce type de caméra pour les besoins de la numérisation 3D. L'objectif de cette thèse est d'acquérir et de maîtriser un savoir-faire dans le domaine de la capture/acquisition de modèles 3D en particulier sur l'aspect rendu réaliste. La réalisation d'un scanner 3D à partir d'une caméra RGB-D fait partie de l'objectif. Lors de la phase d'acquisition, en particulier pour un dispositif portable, on est confronté à deux problèmes principaux, le problème lié au référentiel de chaque capture et le rendu final de l'objet reconstruit<br>Digital imaging, from the synthesis of images to computer vision isexperiencing a strong evolution, due among other factors to the democratization and commercial success of 3D cameras. In the same context, the consumer 3D printing, which is experiencing a rapid rise, contributes to the strong demand for this type of camera for the needs of 3D scanning. The objective of this thesis is to acquire and master a know-how in the field of the capture / acquisition of 3D models in particular on the rendered aspect. The realization of a 3D scanner from a RGB-D camera is part of the goal. During the acquisition phase, especially for a portable device, there are two main problems, the problem related to the repository of each capture and the final rendering of the reconstructed object
APA, Harvard, Vancouver, ISO, and other styles
8

Huang, Chin-Yung, and 黃智勇. "A Video Retargeting Technique for RGB-D Camera." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/55497954873032390119.

Full text
Abstract:
碩士<br>國立中正大學<br>電機工程研究所<br>99<br>In this thesis, we propose a content ware image resizing algorithm with the help of RGB-D camera based on detection of saliency objects. The content aware image resizing algorithm requires some energy terms to help separate the main contents and the background. Here we use scene depth information, gradient information, visual saliency and saliency object to create an image on the visual focus on the energy map. Finally, experimental results show that both in a single image or image sequence can successfully preserve the integrity of the main objects.
APA, Harvard, Vancouver, ISO, and other styles
9

Laio, Yen Ping, and 廖彥斌. "Video Object Reshuffling using Single Structured RGB-D Camera." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/33416584041789579711.

Full text
Abstract:
碩士<br>國立清華大學<br>資訊工程學系<br>104<br>In this thesis, we will introduce a system which transforms an object in RGB-D video A to another scene in RGB-D video B. By processing object videos and scene videos in specified sequences, a 3D data base for videos will be built. User can edit the novel scene with the data base. When editing is finished, our system will render a photo-realistic novel video by image warping with shadows and occlusions.
APA, Harvard, Vancouver, ISO, and other styles
10

Huang, Yung-Lin, and 黃泳霖. "3D Modeling using RGB-D Data for Video Synthesis." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/f2wmye.

Full text
Abstract:
博士<br>國立臺灣大學<br>資訊網路與多媒體研究所<br>105<br>Recently, three-dimensional (3D) image and video systems have attained a high level of maturity. There are many off-the-shelf 3D acquisition and display devices. In current 3D systems, RGB plus depth (RGB-D) videos are the most widely-used format. This dissertation focuses on the techniques using RGB-D data for video synthesis applications. First of all, we introduce the systems and point out the challenges in the systems. Then, we divide the proposed algorithms and techniques into three parts: depth processing, 3D modeling, and video synthesis applications. Finally, we give a conclusion and discussion for future research. The first part states two proposed techniques for defective depth images. The missing and uncertain depth values near object boundaries are corrected using edge-aware depth completion. The depth quantization errors introduced by depth image precision are reduced using an optimization framework. The processed data give better visual quality when visualizing the point-cloud 3D scene. The second part states two proposed 3D modeling techniques for point-cloud data. The planar and curved surfaces are detected using supervoxel-based agglomerative surface growing. The point-cloud background modeling is extracted from a multi-view RGB-D video. The geometric reasoning of unorganized data provides the possibility of understanding the data and synthesizing additional information. The third part states two implemented video synthesis applications. Multi-view augmented videos are synthesized using surface-based background modeling. Virtual view videos are synthesized using an efficient 3D filter. The synthesized videos shows different viewing experience of RGB-D display systems.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "RGB-D video"

1

Trabelsi, Rim, Issam Jabri, Farid Melgani, Fethi Smach, Nicola Conci, and Ammar Bouallegue. "Complex-Valued Representation for RGB-D Object Recognition." In Image and Video Technology. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-75786-5_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lin, Xiao, Josep R. Casas, and Montse Pardás. "Time Consistent Estimation of End-Effectors from RGB-D Data." In Image and Video Technology. Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-29451-3_42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Lin, Huei-Yung, Chin-Chen Chang, and Jhih-Yong Huang. "A Video Retargeting Technique for RGB-D Camera." In Communications in Computer and Information Science. Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-45944-7_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Mota-Gutierrez, Sergio A., Jean-Bernard Hayet, Salvador Ruiz-Correa, and Rogelio Hasimoto-Beltran. "Efficient Reconstruction of Complex 3-D Scenes from Incomplete RGB-D Data." In Image and Video Technology – PSIVT 2013 Workshops. Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-642-53926-8_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

González, Domingo Iván Rodríguez, and Jean-Bernard Hayet. "Fast Human Detection in RGB-D Images with Progressive SVM-Classification." In Image and Video Technology. Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-642-53842-1_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Shah, Syed Afaq Ali. "Spatial Hierarchical Analysis Deep Neural Network for RGB-D Object Recognition." In Image and Video Technology. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-39770-8_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Staranowicz, Aaron, Garrett R. Brown, Fabio Morbidi, and Gian Luca Mariottini. "Easy-to-Use and Accurate Calibration of RGB-D Cameras from Spheres." In Image and Video Technology. Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-642-53842-1_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Liciotti, Daniele, Annalisa Cenci, Emanuele Frontoni, Adriano Mancini, and Primo Zingaretti. "An Intelligent RGB-D Video System for Bus Passenger Counting." In Intelligent Autonomous Systems 14. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-48036-7_34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Liciotti, Daniele, Marco Contigiani, Emanuele Frontoni, Adriano Mancini, Primo Zingaretti, and Valerio Placidi. "Shopper Analytics: A Customer Activity Recognition System Using a Distributed RGB-D Camera Network." In Video Analytics for Audience Measurement. Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-12811-5_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Zhi, Tiancheng, Christoph Lassner, Tony Tung, Carsten Stoll, Srinivasa G. Narasimhan, and Minh Vo. "TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video." In Computer Vision – ECCV 2020. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58607-2_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "RGB-D video"

1

Xu, Mingchen, Peter Herbert, Yu-Kun Lai, Ze Ji, and Jing Wu. "RGB-D Video Mirror Detection." In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025. https://doi.org/10.1109/wacv61041.2025.00933.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Uchimura, Taichi, Toru Abe, and Takuo Suganuma. "A Method for Detecting Human-Object Interaction Using RGB-D Video." In 2024 IEEE 13th Global Conference on Consumer Electronics (GCCE). IEEE, 2024. https://doi.org/10.1109/gcce62371.2024.10760290.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Hengyi, Jingwen Wang, and Lourdes Agapito. "MorpheuS: Neural Dynamic $360^{\circ}$ Surface Reconstruction from Monocular RGB-D Video." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.01981.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Suolang, Daerji, Jiahao He, Wangchuk Tsering, Keren Fu, Xiaofeng Li, and Qijun Zhao. "Lightweight Multi-Frequency Enhancement Network for RGB-D Video Salient Object Detection." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10890388.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Hu, Yueyu, Onur G. Guleryuz, Philip A. Chou, et al. "One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Compression for Stereoscopic Teleconferencing." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2024. http://dx.doi.org/10.1109/cvprw63382.2024.00581.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Karácsony, Tamás, Nicholas Fearns, Christian Vollmar, et al. "NeuroKinect4K: A Novel 4K RGB-D-IR Video System with 3D Scene Reconstruction for Enhanced Epileptic Seizure Semiology Monitoring." In 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2024. https://doi.org/10.1109/embc53108.2024.10781546.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Xia, Hongchi, Yang Fu, Sifei Liu, and Xiaolong Wang. "RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.02112.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Oikawa, H., Y. Tsuruda, Y. Sano, T. Teiichi, M. Yamamoto, and H. Takemura. "Behavior Recognition in Mice Using RGB-D Videos Captured from Below." In 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2024. https://doi.org/10.1109/smc54092.2024.10831155.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Jinwen, Weixing Xie, Junfeng Yao, et al. "PD-SDF: Dynamic Surface Reconstruction Based on Plane Decomposition for Single View RGB-D Videos." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10889038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Yu, Honghai, Pierre Moulin, and Sujoy Roy. "RGB-D video content identification." In ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013. http://dx.doi.org/10.1109/icassp.2013.6638364.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography