To see the other types of publications on this topic, follow the link: 3D Human Pose Estimation.

Dissertations / Theses on the topic '3D Human Pose Estimation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic '3D Human Pose Estimation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Budaraju, Sri Datta. "Unsupervised 3D Human Pose Estimation." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291435.

Full text
Abstract:
The thesis proposes an unsupervised representation learning method to predict 3D human pose from a 2D skeleton via a VAEGAN (Variational Autoencoder Generative Adversarial Network) hybrid network. The method learns to lift poses from 2D to 3D using selfsupervision and adversarial learning techniques. The method does not use images, heatmaps, 3D pose annotations, paired/unpaired 2Dto3D skeletons, 3D priors, synthetic 2D skeletons, multiview or temporal information in any shape or form. The 2D skeleton input is taken by a VAE that encodes it in a latent space and then decodes that latent representation to a 3D pose. The 3D pose is then reprojected to 2D for a constrained, selfsupervised optimization using the input 2D pose. Parallelly, the 3D pose is also randomly rotated and reprojected to 2D to generate a ’novel’ 2D view for unconstrained adversarial optimization using a discriminator network. The combination of the optimizations of the original and the novel 2D views of the predicted 3D pose results in a ’realistic’ 3D pose generation. The thesis shows that the encoding and decoding process of the VAE addresses the major challenge of erroneous and incomplete skeletons from 2D detection networks as inputs and that the variance of the VAE can be altered to get various plausible 3D poses for a given 2D input. Additionally, the latent representation could be used for crossmodal training and many downstream applications. The results on Human3.6M datasets outperform previous unsupervised approaches with less model complexity while addressing more hurdles in scaling the task to the real world.
Uppsatsen föreslår en oövervakad metod för representationslärande för att förutsäga en 3Dpose från ett 2D skelett med hjälp av ett VAE GAN (Variationellt Autoenkodande Generativt Adversariellt Nätverk) hybrid neuralt nätverk. Metoden lär sig att utvidga poser från 2D till 3D genom att använda självövervakning och adversariella inlärningstekniker. Metoden använder sig vare sig av bilder, värmekartor, 3D poseannotationer, parade/oparade 2D till 3D skelett, a priori information i 3D, syntetiska 2Dskelett, flera vyer, eller tidsinformation. 2Dskelettindata tas från ett VAE som kodar det i en latent rymd och sedan avkodar den latenta representationen till en 3Dpose. 3D posen är sedan återprojicerad till 2D för att genomgå begränsad, självövervakad optimering med hjälp av den tvådimensionella posen. Parallellt roteras dessutom 3Dposen slumpmässigt och återprojiceras till 2D för att generera en ny 2D vy för obegränsad adversariell optimering med hjälp av ett diskriminatornätverk. Kombinationen av optimeringarna av den ursprungliga och den nya 2Dvyn av den förutsagda 3Dposen resulterar i en realistisk 3Dposegenerering. Resultaten i uppsatsen visar att kodningsoch avkodningsprocessen av VAE adresserar utmaningen med felaktiga och ofullständiga skelett från 2D detekteringsnätverk som indata och att variansen av VAE kan modifieras för att få flera troliga 3D poser för givna 2D indata. Dessutom kan den latenta representationen användas för crossmodal träning och flera nedströmsapplikationer. Resultaten på datamängder från Human3.6M är bättre än tidigare oövervakade metoder med mindre modellkomplexitet samtidigt som de adresserar flera hinder för att skala upp uppgiften till verkliga tillämpningar.
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Jianquan. "A Human Kinetic Dataset and a Hybrid Model for 3D Human Pose Estimation." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/41437.

Full text
Abstract:
Human pose estimation represents the skeleton of a person in color or depth images to improve a machine’s understanding of human movement. 3D human pose estimation uses a three-dimensional skeleton to represent the human body posture, which is more stereoscopic than a two-dimensional skeleton. Therefore, 3D human pose estimation can enable machines to play a role in physical education and health recovery, reducing labor costs and the risk of disease transmission. However, the existing datasets for 3D pose estimation do not involve fast motions that would cause optical blur for a monocular camera but would allow the subjects’ limbs to move in a more extensive range of angles. The existing models cannot guarantee both real-time performance and high accuracy, which are essential in physical education and health recovery applications. To improve real-time performance, researchers have tried to minimize the size of the model and have studied more efficient deployment methods. To improve accuracy, researchers have tried to use heat maps or point clouds to represent features, but this increases the difficulty of model deployment. To address the lack of datasets that include fast movements and easy-to-deploy models, we present a human kinetic dataset called the Kivi dataset and a hybrid model that combines the benefits of a heat map-based model and an end-to-end model for 3D human pose estimation. We describe the process of data collection and cleaning in this thesis. Our proposed Kivi dataset contains large-scale movements of humans. In the dataset, 18 joint points represent the human skeleton. We collected data from 12 people, and each person performed 38 sets of actions. Therefore, each frame of data has a corresponding person and action label. We design a preliminary model and propose an improved model to infer 3D human poses in real time. When validating our method on the Invariant Top-View (ITOP) dataset, we found that compared with the initial model, our improved model improves the mAP@10cm by 29%. When testing on the Kivi dataset, our improved model improves the mAP@10cm by 15.74% compared to the preliminary model. Our improved model can reach 65.89 frames per second (FPS) on the TensorRT platform.
APA, Harvard, Vancouver, ISO, and other styles
3

Gong, Wenjuan. "3D Motion Data aided Human Action Recognition and Pose Estimation." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/116189.

Full text
Abstract:
En aquest treball s’explora el reconeixement d’accions humanes i l'estimació de la seva postura en seqüències d'imatges. A diferència de les tècniques tradicionals d’aprenentatge a partir d’imatges 2D o vídeo amb la sortida anotada, en aquesta Tesi abordem aquest objectiu amb la informació de moviment 3D capturat, que ens ajudar a tancar el llaç entre les característiques 2D de la imatge i les interpretacions sobre el moviment humà.
En este trabajo se exploran el reconocimiento de acciones humanas y la estimación de su postura en secuencias de imágenes. A diferencia de las técnicas tradicionales de aprendizaje a partir de imágenes 2D o vídeo con la salida anotada, en esta Tesis abordamos este objetivo con la información de movimiento 3D capturado, que nos ayudar a cerrar el lazo entre las caracteríssticas 2D de la imagen y las interpretaciones sobre el movimiento humano.
In this work, we explore human action recognition and pose estimation problems. Different from traditional works of learning from 2D images or video sequences and their annotated output, we seek to solve the problems with additional 3D motion capture information, which helps to fill the gap between 2D image features and human interpretations.
APA, Harvard, Vancouver, ISO, and other styles
4

Yu, Tsz-Ho. "Classification and pose estimation of 3D shapes and human actions." Thesis, University of Cambridge, 2014. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.708443.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Darby, John. "3D Human Motion Tracking and Pose Estimation using Probabilistic Activity Models." Thesis, Manchester Metropolitan University, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.523145.

Full text
Abstract:
This thesis presents work on generative approaches to human motion tracking and pose estimation where a geometric model of the human body is used for comparison with observations. The existing generative tracking literature can be quite clearly divided between two groups. First, approaches that attempt to solve a difficult high-dimensional inference problem in the body model's full or ambient pose space, recovering freeform or unknown activity. Second, approaches that restrict inference to a low-dimensional latent embedding of the full pose space, recovering activity for which training data is available or known activity. Significant advances have been made in each of these subgroups. Given sufficiently rich multiocular observations and plentiful computational resources, high dimensional approaches have been proven to track fast and complex unknown activities robustly. Conversely, low-dimensional approaches have been able to support monocular tracking and to significantly reduce computational costs for the recovery of known activity. However, their competing advantages have - although complementary - remained disjoint. The central aim of this thesis is to combine low- and high-dimensional generative tracking techniques to benefit from the best of both approaches.First, a simple generative tracking approach is proposed for tracking known activities in a latent pose space using only monocular or binocular observations.A hidden Markov model (HMM) is used to provide dynamics and constrain a particle-based search for poses. The ability of the HMM to classify as well as synthesise poses means that the approach naturally extends to the modelling of a number of different known activities in a single joint-activity latent space.Second, an additional low-dimensional approach is introduced to permit transitions between segmented known activity training data by allowing particles to move between activity manifolds. Both low-dimensional approaches are then fairly and efficiently combined with a simultaneous high-dimensional generative tracking task in the ambient pose space. This combination allows for the recovery of sequences containing multiple known and unknown human activities at an appropriate ( dynamic) computational cost. Finally, a rich hierarchical embedding of the ambient pose space is investigated.This representation allows inference to progress from a single full-body or global non-linear latent pose space, through a number of gradually smaller part-based latent models, to the full ambient pose space. By preserving long-range correlations present in training data, the positions of occluded limbs can be inferred during tracking. Alternatively, by breaking the implied coordination between part-based models novel activity combinations, or composite activity, may be recovered.
APA, Harvard, Vancouver, ISO, and other styles
6

Borodulina, A. (Anastasiia). "Application of 3D human pose estimation for motion capture and character animation." Master's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201906262670.

Full text
Abstract:
Abstract. Interest in motion capture (mocap) technology is growing every day, and the number of possible applications is multiplying. But such systems are very expensive and are not affordable for personal use. Based on that, this thesis presents the framework that can produce mocap data from regular RGB video and then use it to animate a 3D character according to the movement of the person in the original video. To extract the mocap data from the input video, one of the three 3D pose estimation (PE) methods that are available within the scope of the project is used to determine where the joints of the person in each video frame are located in the 3D space. The 3D positions of the joints are used as mocap data and are imported to Blender which contains a simple 3D character. The data is assigned to the corresponding joints of the character to animate it. To test how the created animation will be working in a different environment, it was imported to the Unity game engine and applied to the native 3D character. The evaluation of the produced animations from Blender and Unity showed that even though the quality of the animation might be not perfect, the test subjects found this approach to animation promising. In addition, during the evaluation, a few issues were discovered and considered for future framework development.
APA, Harvard, Vancouver, ISO, and other styles
7

Burenius, Magnus. "Human 3D Pose Estimation in the Wild : using Geometrical Models and Pictorial Structures." Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-138136.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Mehta, Dushyant [Verfasser]. "Real-time 3D human body pose estimation from monocular RGB input / Dushyant Mehta." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2020. http://d-nb.info/1220691135/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Norman, Jacob. "3D POSE ESTIMATION IN THE CONTEXT OF GRIP POSITION FOR PHRI." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-55166.

Full text
Abstract:
For human-robot interaction with the intent to grip a human arm, it is necessary that the ideal gripping location can be identified. In this work, the gripping location is situated on the arm and thus it can be extracted using the position of the wrist and elbow joints. To achieve this human pose estimation is proposed as there exist robust methods that work both in and outside of lab environments. One such example is OpenPose which thanks to the COCO and MPII datasets has recorded impressive results in a variety of different scenarios in real-time. However, most of the images in these datasets are taken from a camera mounted at chest height on people that for the majority of the images are oriented upright. This presents the potential problem that prone humans which are the primary focus of this project can not be detected. Especially if seen from an angle that makes the human appear upside down in the camera frame. To remedy this two different approaches were tested, both aimed at creating a rotation-invariant 2D pose estimation method. The first method rotates the COCO training data in an attempt to create a model that can find humans regardless of orientation in the image. The second approach adds a RotationNet as a preprocessing step to correctly orient the images so that OpenPose can be used to estimate the 2D pose before rotating back the resulting skeletons.
APA, Harvard, Vancouver, ISO, and other styles
10

Fathollahi, Ghezelghieh Mona. "Estimation of Human Poses Categories and Physical Object Properties from Motion Trajectories." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/6835.

Full text
Abstract:
Despite the impressive advancements in people detection and tracking, safety is still a key barrier to the deployment of autonomous vehicles in urban environments [1]. For example, in non-autonomous technology, there is an implicit communication between the people crossing the street and the driver to make sure they have communicated their intent to the driver. Therefore, it is crucial for the autonomous car to infer the future intent of the pedestrian quickly. We believe that human body orientation with respect to the camera can help the intelligent unit of the car to anticipate the future movement of the pedestrians. To further improve the safety of pedestrians, it is important to recognize whether they are distracted, carrying a baby, or pushing a shopping cart. Therefore, estimating the fine- grained 3D pose, i.e. (x,y,z)-coordinates of the body joints provides additional information for decision-making units of driverless cars. In this dissertation, we have proposed a deep learning-based solution to classify the categorized body orientation in still images. We have also proposed an efficient framework based on our body orientation classification scheme to estimate human 3D pose in monocular RGB images. Furthermore, we have utilized the dynamics of human motion to infer the body orientation in image sequences. To achieve this, we employ a recurrent neural network model to estimate continuous body orientation from the trajectories of body joints in the image plane. The proposed body orientation and 3D pose estimation framework are tested on the largest 3D pose estimation benchmark, Human3.6m (both in still images and video), and we have proved the efficacy of our approach by benchmarking it against the state-of-the-art approaches. Another critical feature of self-driving car is to avoid an obstacle. In the current prototypes the car either stops or changes its lane even if it causes other traffic disruptions. However, there are situations when it is preferable to collide with the object, for example a foam box, rather than take an action that could result in a much more serious accident than collision with the object. In this dissertation, for the first time, we have presented a novel method to discriminate between physical properties of these types of objects such as bounciness, elasticity, etc. based on their motion characteristics . The proposed algorithm is tested on synthetic data, and, as a proof of concept, its effectiveness on a limited set of real-world data is demonstrated.
APA, Harvard, Vancouver, ISO, and other styles
11

Hossain, Mir Rayat Imtiaz. "Understanding the sources of error for 3D human pose estimation from monocular images and videos." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/63808.

Full text
Abstract:
With the success of deep learning in the field of computer vision, most state-of-the-art approaches of estimating 3D human pose from images or videos rely on training a network end-to-end which can regress into 3D joint locations or heatmaps from an RGB image. Although most of these approaches provide good results, the major sources of error are often difficult to understand. The errors may either come from incorrect 2D pose estimation or from the incorrect mapping of the features in 2D to 3D. In this work, we aim to understand the sources of error in estimating 3D pose from images and videos. Therefore, we have built three different systems. The first takes 2D joint locations of every frame individually as inputs and predicts 3D joint positions. To our surprise, we found that by using a simple feed-forward fully connected network, with residual connections, the ground truth 2D joint locations can be mapped to 3D space at a remarkably low error rate, outperforming the best reported result by almost 30% on the Human 3.6M dataset, the largest publicly available dataset of motion capture data. Furthermore, training this network on the outputs of an off-the-shelf 2D pose detector gives us state-of-the-art results when compared with a vast array of systems trained end-to-end. To validate the efficacy of this network, we also trained an end-to-end system that takes an image as input and regresses 3D pose directly. We found that it is harder to train the network end-to-end than decoupling the task. To examine whether temporal information over a sequence improves results, we built a sequence-to-sequence network that takes a sequence of 2D poses as input and predicts a sequence of 3D poses as output. We found that the temporal information improves the results from our first system. We argue that a large portion of error of 3D pose estimation systems results from the error in 2D pose estimation.
Science, Faculty of
Computer Science, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
12

SARMADI, HAMID. "Human Detection and Pose Estimation in aMulti-camera System : Using a 3D version of Pictorial Structures." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-142372.

Full text
Abstract:
Multi-view 3D human pose estimation is still a challenging task in uncontrolled environments. Many related approaches are still dependent on silhouettes of the subject obtained by background subtraction. Background subtraction is difficult if the cameras are moving or the background is dynamic. Our solution is to do body part detections both on the foreground and the background, similar to some 2D pose estimation algorithms. We introduce a novel approach to combine 2D body part detections from multiple views to obtain a unified space of detection scores employing the constraints of the 3D space. This combination of detections helps to remove the false positive detection scores suggested by the 2D detectors. We also propose a new human pose estimation algorithm that performs the pose inference directly in the 3D configuration space, taking advantage of the unified detections. This algorithm is a modified version of the well known pictorial structures method and uses distance transform in solving the optimization. Repulsion between joints is also considered in our algorithm which is very helpful for estimating some specific poses. Qualitative results are presented to show the potential strength of our approach.
APA, Harvard, Vancouver, ISO, and other styles
13

Carbonera, Luvizon Diogo. "Apprentissage automatique pour la reconnaissance d'action humaine et l'estimation de pose à partir de l'information 3D." Thesis, Cergy-Pontoise, 2019. http://www.theses.fr/2019CERG1015.

Full text
Abstract:
La reconnaissance d'actions humaines en 3D est une tâche difficile en raisonde la complexité de mouvements humains et de la variété des poses et desactions accomplies par différents sujets. Les technologies récentes baséessur des capteurs de profondeur peuvent fournir les représentationssquelettiques à faible coût de calcul, ce qui est une information utilepour la reconnaissance d'actions.Cependant, ce type de capteurs se limite à des environnementscontrôlés et génère fréquemment des données bruitées. Parallèlement à cesavancées technologiques, les réseaux de neurones convolutifs (CNN) ontmontré des améliorations significatives pour la reconnaissance d’actions etpour l’estimation de la pose humaine en 3D à partir des images couleurs.Même si ces problèmes sont étroitement liés, les deux tâches sont souventtraitées séparément dans la littérature.Dans ce travail, nous analysons le problème de la reconnaissance d'actionshumaines dans deux scénarios: premièrement, nous explorons lescaractéristiques spatiales et temporelles à partir de représentations desquelettes humains, et qui sont agrégées par une méthoded'apprentissage de métrique. Dans le deuxième scénario, nous montrons nonseulement l'importance de la précision de la pose en 3D pour lareconnaissance d'actions, mais aussi que les deux tâches peuvent êtreefficacement effectuées par un seul réseau de neurones profond capabled'obtenir des résultats du niveau de l'état de l'art.De plus, nous démontrons que l'optimisation de bout en bout en utilisant lapose comme contrainte intermédiaire conduit à une précision plus élevée sur latâche de reconnaissance d'action que l'apprentissage séparé de ces tâches. Enfin, nous proposons une nouvellearchitecture adaptable pour l’estimation de la pose en 3D et la reconnaissancede l’actions simultanément et en temps réel. Cette architecture offre une gammede compromis performances vs vitesse avec une seule procédure d’entraînementmultitâche et multimodale
3D human action recognition is a challenging task due to the complexity ofhuman movements and to the variety on poses and actions performed by distinctsubjects. Recent technologies based on depth sensors can provide 3D humanskeletons with low computational cost, which is an useful information foraction recognition. However, such low cost sensors are restricted tocontrolled environment and frequently output noisy data. Meanwhile,convolutional neural networks (CNN) have shown significant improvements onboth action recognition and 3D human pose estimation from RGB images. Despitebeing closely related problems, the two tasks are frequently handled separatedin the literature. In this work, we analyze the problem of 3D human actionrecognition in two scenarios: first, we explore spatial and temporalfeatures from human skeletons, which are aggregated by a shallow metriclearning approach. In the second scenario, we not only show that precise 3Dposes are beneficial to action recognition, but also that both tasks can beefficiently performed by a single deep neural network and stillachieves state-of-the-art results. Additionally, wedemonstrate that optimization from end-to-end using poses as an intermediateconstraint leads to significant higher accuracy on the action task thanseparated learning. Finally, we propose a new scalable architecture forreal-time 3D pose estimation and action recognition simultaneously, whichoffers a range of performance vs speed trade-off with a single multimodal andmultitask training procedure
APA, Harvard, Vancouver, ISO, and other styles
14

Benzine, Abdallah. "Estimation de poses 3D multi-personnes à partir d'images RGB." Thesis, Sorbonne université, 2020. http://www.theses.fr/2020SORUS103.

Full text
Abstract:
L’estimation de poses 3D humaines à partir d’images RGB monoculaires est le processus permettant de localiser les articulations humaines à partir d’une image ou d’une séquence d’images. Elle fournit une information géométrique et de mouvement riche sur le corps humain. La plus part des approches d’estimation de poses 3D existantes supposent que l’image ne contient qu’une personne, entièrement visible. Un tel scénario n’est pas réaliste. Dans des conditions réelles plusieurs personnes interagissent. Elles ont alors tendance à s’occulter mutuellement, ce qui rend l’estimation de poses 3D encore plus ambiguë et complexe. Les travaux réalisés durant cette thèse se sont focalisés sur l’estimation single-shot de poses 3D multi-personnes à partir d’images monoculaires RGB. Nous avons d’abord proposé une approche bottom-up de prédiction de poses 3D multi-personnes qui prédit d’abord les coordonnées 3D de toutes les articulations présentes dans l’image puis fait appel à un processus de regroupement afin de prédire des squelettes 3D complets. Afin d’être robuste aux cas où les personnes dans l’image sont nombreuses et éloignées de la caméra, nous avons développé PandaNet qui repose sur une représentation par ancres et qui intègre un processus permettant d’ignorer les ancres associées de manière ambiguë aux vérités de terrain et une pondération automatique des fonctions de pertes. Enfin, PandaNet est complété avec un Module d’Estimation de Distances Absolues, Absolute Distances Estimation Module (ADEM). L’ensemble, appelé Absolute PandaNet, permet de prédire des poses 3D humaines absolues exprimées dans le repère la caméra
3D human pose estimation from RGB monocular images is the processus allowing to locate human joints from an image or of a sequence of images. It provides rich geometric and motion information about the human body. Most existing 3D pose estimation approaches assume that the image contains only one person, fully visible. Such a scenario is not realistic. In real life conditions several people interact. They then tend to hide each other, which makes 3D pose estimation even more ambiguous and complex. The work carried out during this thesis focused on single-shot estimation. of multi-person 3D poses from RGB monocular images. We first proposed a bottom-up approach for predicting multi-person 3D poses that first predicts the 3D coordinates of all the joints present in the image and then uses a grouping process to predict full 3D skeletons. In order to be robust in cases where the people in the image are numerous and far away from the camera, we developed PandaNet, which is based on an anchor representation and integrates a process that allows ignoring anchors ambiguously associated to ground truthes and an automatic weighting of losses. Finally, PandaNet is completed with an Absolute Distance Estimation Module (ADEM). The combination of these two models, called Absolute PandaNet, allows the prediction of absolute human 3D poses expressed in the camera frame
APA, Harvard, Vancouver, ISO, and other styles
15

Duncan, Kester. "Scene-Dependent Human Intention Recognition for an Assistive Robotic System." Scholar Commons, 2014. https://scholarcommons.usf.edu/etd/5009.

Full text
Abstract:
In order for assistive robots to collaborate effectively with humans for completing everyday tasks, they must be endowed with the ability to effectively perceive scenes and more importantly, recognize human intentions. As a result, we present in this dissertation a novel scene-dependent human-robot collaborative system capable of recognizing and learning human intentions based on scene objects, the actions that can be performed on them, and human interaction history. The aim of this system is to reduce the amount of human interactions necessary for communicating tasks to a robot. Accordingly, the system is partitioned into scene understanding and intention recognition modules. For scene understanding, the system is responsible for segmenting objects from captured RGB-D data, determining their positions and orientations in space, and acquiring their category labels. This information is fed into our intention recognition component where the most likely object and action pair that the user desires is determined. Our contributions to the state of the art are manifold. We propose an intention recognition framework that is appropriate for persons with limited physical capabilities, whereby we do not observe human physical actions for inferring intentions as is commonplace, but rather we only observe the scene. At the core of this framework is our novel probabilistic graphical model formulation entitled Object-Action Intention Networks. These networks are undirected graphical models where the nodes are comprised of object, action, and object feature variables, and the links between them indicate some form of direct probabilistic interaction. This setup, in tandem with a recursive Bayesian learning paradigm, enables our system to adapt to a user's preferences. We also propose an algorithm for the rapid estimation of position and orientation values of scene objects from single-view 3D point cloud data using a multi-scale superquadric fitting approach. Additionally, we leverage recent advances in computer vision for an RGB-D object categorization procedure that balances discrimination and generalization as well as a depth segmentation procedure that acquires candidate objects from tabletops. We demonstrate the feasibility of the collaborative system presented herein by conducting evaluations on multiple scenes comprised of objects from 11 categories, along with 7 possible actions, and 36 possible intentions. We achieve approximately 81% reduction in interactions overall after learning despite changes to scene structure.
APA, Harvard, Vancouver, ISO, and other styles
16

Jack, Dominic. "Deep learning approaches for 3D inference from monocular vision." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/204267/1/Dominic_Jack_Thesis.pdf.

Full text
Abstract:
This thesis looks at deep learning approaches to 3D computer vision problems, using representations including occupancy grids, deformable meshes, key points, point clouds, and event streams. We focussed on methods targeted towards medium-sized mobile robotics platforms with modest computational power on board. Key results include state-of-the-art accuracies on single-view high resolution voxel reconstruction and event camera classification tasks, point cloud convolution networks capable of performing inference an order of magnitude faster than similar methods, and a 3D human pose lifting model with significantly fewer floating point operations and learnable weights than baseline deep learning methods.
APA, Harvard, Vancouver, ISO, and other styles
17

Regia, Corte Fabiola. "Studio ed implementazione di un modello di Human Pose Estimation 3D. Analisi tecnica della posizione del corpo dell’atleta durante un match di Tennis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
Abstract:
Al giorno d’oggi, senza esserne troppo consapevoli, il Machine Learning sta entrando a far parte dei più svariati settori, professionali o privati che siano; dalla Classificazione in ambito agricolo alle auto a guida autonoma; dal Riconoscimento del parlato in ambito didattico all’individuazione di oggetti in un paesaggio, fino a giungere all’ambito sportivo, che sia individuale o di squadra, di livello amatoriale o professionistico. Ed è proprio in quest’ultimo ambito che si colloca questo progetto: tratteremo infatti l’utilizzo delle reti convoluzionali per la stima della posa umana in ambito sportivo, nello specifico in ambito tennistico.
APA, Harvard, Vancouver, ISO, and other styles
18

Amin, Sikandar [Verfasser], Bernd [Akademischer Betreuer] Radig, Darius [Gutachter] Burschka, and Bernd [Gutachter] Radig. "Multi-view Part-based Models for 3D Human Pose Estimation in Real-World Scenes / Sikandar Amin ; Gutachter: Darius Burschka, Bernd Radig ; Betreuer: Bernd Radig." München : Universitätsbibliothek der TU München, 2018. http://d-nb.info/1171425422/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Rydén, Anna, and Amanda Martinsson. "Evaluation of 3D motion capture data from a deep neural network combined with a biomechanical model." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176543.

Full text
Abstract:
Motion capture has in recent years grown in interest in many fields from both game industry to sport analysis. The need of reflective markers and expensive multi-camera systems limits the business since they are costly and time-consuming. One solution to this could be a deep neural network trained to extract 3D joint estimations from a 2D video captured with a smartphone. This master thesis project has investigated the accuracy of a trained convolutional neural network, MargiPose, that estimates 25 joint positions in 3D from a 2D video, against a gold standard, multi-camera Vicon-system. The project has also investigated if the data from the deep neural network can be connected to a biomechanical modelling software, AnyBody, for further analysis. The final intention of this project was to analyze how accurate such a combination could be in golf swing analysis. The accuracy of the deep neural network has been evaluated with three parameters: marker position, angular velocity and kinetic energy for different segments of the human body. MargiPose delivers results with high accuracy (Mean Per Joint Position Error (MPJPE) = 1.52 cm) for a simpler movement but for a more advanced motion such as a golf swing, MargiPose achieves less accuracy in marker distance (MPJPE = 3.47 cm). The mean difference in angular velocity shows that MargiPose has difficulties following segments that are occluded or has a greater motion, such as the wrists in a golf swing where they both move fast and are occluded by other body segments. The conclusion of this research is that it is possible to connect data from a trained CNN with a biomechanical modelling software. The accuracy of the network is highly dependent on the intention of the data. For the purpose of golf swing analysis, this could be a great and cost-effective solution which could enable motion analysis for professionals but also for interested beginners. MargiPose shows a high accuracy when evaluating simple movements. However, when using it with the intention of analyzing a golf swing in i biomechanical modelling software, the outcome might be beyond the bounds of reliable results.
APA, Harvard, Vancouver, ISO, and other styles
20

Xu, Wanxin. "AFFECT-PRESERVING VISUAL PRIVACY PROTECTION." UKnowledge, 2018. https://uknowledge.uky.edu/ece_etds/122.

Full text
Abstract:
The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding. The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection. The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously.
APA, Harvard, Vancouver, ISO, and other styles
21

Zendjebil, Imane. "Localisation 3D basée sur une approche de suppléance multi-capteurs pour la Réalité Augmentée Mobile en Milieu Extérieur." Phd thesis, Université d'Evry-Val d'Essonne, 2010. http://tel.archives-ouvertes.fr/tell-00541366.

Full text
Abstract:
La démocratisation des terminaux mobiles telle que les téléphones cellulaires, les PDAs et les tablettes PC a rendu possible le déploiement de la réalité augmentée dans des environnements en extérieur à grande échelle. Cependant, afin de mettre en oeuvre de tels systèmes, différentes problématiques doivent êtres traitées. Parmi elle, la localisation représente l?une des plus importantes. En effet, l?estimation de la position et de l?orientation (appelée pose) du point de vue (de la caméra ou de l?utilisateur) permet de recaler les objets virtuels sur les parties observées de la scène réelle. Dans nos travaux de thèse, nous présentons un système de localisation original destiné à des environnements à grande échelle qui utilise une approche basée vision sans marqueur pour l?estimation de la pose de la caméra. Cette approche se base sur des points caractéristiques naturels extraits des images. Etant donné que ce type d?approche est sensible aux variations de luminosité, aux occultations et aux mouvements brusques de la caméra, qui sont susceptibles de survenir dans l?environnement extérieur, nous utilisons deux autres types de capteurs afin d?assister le processus de vision. Dans nos travaux, nous voulons démontrer la faisabilité d?un schéma de suppléance dans des environnements extérieurs à large échelle. Le but est de fournir un système palliatif à la vision en cas de défaillance permettant également de réinitialiser le système de vision en cas de besoin. Le système de localisation vise à être autonome et adaptable aux différentes situations rencontrées.
APA, Harvard, Vancouver, ISO, and other styles
22

Blanc, Beyne Thibault. "Estimation de posture 3D à partir de données imprécises et incomplètes : application à l'analyse d'activité d'opérateurs humains dans un centre de tri." Thesis, Toulouse, INPT, 2020. http://www.theses.fr/2020INPT0106.

Full text
Abstract:
Dans un contexte d’étude de la pénibilité et de l’ergonomie au travail pour la prévention des troubles musculo-squelettiques, la société Ebhys cherche à développer un outil d’analyse de l’activité des opérateurs humains dans un centre de tri, par l’évaluation d’indicateurs ergonomiques. Pour faire face à l’environnement non contrôlé du centre de tri et pour faciliter l’acceptabilité du dispositif, ces indicateurs sont mesurés à partir d’images de profondeur. Une étude ergonomique nous permet de définir les indicateurs à mesurer. Ces indicateurs sont les zones d’évolution des mains de l’opérateur et d’angulations de certaines articulations du haut du corps. Ce sont donc des indicateurs obtenables à partir d’une analyse de la posture 3D de l’opérateur. Le dispositif de calcul des indicateurs sera donc composé de trois parties : une première partie sépare l’opérateur du reste de la scène pour faciliter l’estimation de posture 3D, une seconde partie calcule la posture 3D de l’opérateur, et la troisième utilise la posture 3D de l’opérateur pour calculer les indicateurs ergonomiques. Tout d’abord, nous proposons un algorithme qui permet d’extraire l’opérateur du reste de l’image de profondeur. Pour ce faire, nous utilisons une première segmentation automatique basée sur la suppression du fond statique et la sélection d’un objet dynamique à l’aide de sa position et de sa taille. Cette première segmentation sert à entraîner un algorithme d’apprentissage qui améliore les résultats obtenus. Cet algorithme d’apprentissage est entraîné à l’aide des segmentations calculées précédemment, dont on sélectionne automatiquement les échantillons de meilleure qualité au cours de l’entraînement. Ensuite, nous construisons un modèle de réseau de neurones pour l’estimation de la posture 3D de l’opérateur. Nous proposons une étude qui permet de trouver un modèle léger et optimal pour l’estimation de posture 3D sur des images de profondeur de synthèse, que nous générons numériquement. Finalement, comme ce modèle n’est pas directement applicable sur les images de profondeur acquises dans les centres de tri, nous construisons un module qui permet de transformer les images de profondeur de synthèse en images de profondeur plus réalistes. Ces images de profondeur plus réalistes sont utilisées pour réentrainer l’algorithme d’estimation de posture 3D, pour finalement obtenir une estimation de posture 3D convaincante sur les images de profondeur acquises en conditions réelles, permettant ainsi de calculer les indicateurs ergonomiques
In a context of study of stress and ergonomics at work for the prevention of musculoskeletal disorders, the company Ebhys wants to develop a tool for analyzing the activity of human operators in a waste sorting center, by measuring ergonomic indicators. To cope with the uncontrolled environment of the sorting center, these indicators are measured from depth images. An ergonomic study allows us to define the indicators to be measured. These indicators are zones of movement of the operator’s hands and zones of angulations of certain joints of the upper body. They are therefore indicators that can be obtained from an analysis of the operator’s 3D pose. The software for calculating the indicators will thus be composed of three steps : a first part segments the operator from the rest of the scene to ease the 3D pose estimation, a second part estimates the operator’s 3D pose, and the third part uses the operator’s 3D pose to compute the ergonomic indicators. First of all, we propose an algorithm that extracts the operator from the rest of the depth image. To do this, we use a first automatic segmentation based on static background removal and selection of a moving element given its position and size. This first segmentation allows us to train a neural network that improves the results. This neural network is trained using the segmentations obtained from the first automatic segmentation, from which the best quality samples are automatically selected during training. Next, we build a neural network model to estimate the operator’s 3D pose. We propose a study that allows us to find a light and optimal model for 3D pose estimation on synthetic depth images, which we generate numerically. However, if this network gives outstanding performances on synthetic depth images, it is not directly applicable to real depth images that we acquired in an industrial context. To overcome this issue, we finally build a module that allows us to transform the synthetic depth images into more realistic depth images. This image-to-image translation model modifies the style of the depth image without changing its content, keeping the 3D pose of the operator from the synthetic source image unchanged on the translated realistic depth frames. These more realistic depth images are then used to re-train the 3D pose estimation neural network, to finally obtain a convincing 3D pose estimation on the depth images acquired in real conditions, to compute de ergonomic indicators
APA, Harvard, Vancouver, ISO, and other styles
23

Murali, Ram Subramanian. "Pose Estimation and 3D Reconstruction for 3D Dispensing." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288533.

Full text
Abstract:
Currently in most of the cases the material deposition or dispensing is done only on planar surfaces, in applications such as extrusion 3D printing and surface mount technology (SMT) electronics assembly solutions. In future, the dispensing will be carried out on arbitrary three dimensional objects, where the dispenser needs to know the exact shape and location of them. This drives the necessity of using vision based high degree of freedom (DoF) robotic manipulator dispensers, instead of existing hard-coded Computer Numerical Control (CNC) based limited DoF 3D dispensers. Given a 3D object to be dispensed on and a CAD model of it along with the dispensing path, this thesis aims to answer the following industrial problem: How to adapt the 3D dispensing path if the object is displaced from CAD model position? The most important requirement is high dispensing accuracy, in the order of 100 _m with respect to the ideal (CAD) dispensing path, to improve the dispensing quality. Moreover, maintaining an appropriate distance between the dispensing tip and the surface of the object is important for both the positioning accuracy and the volume precision of the deposit. In order to achieve high dispensing accuracy, robust volumetric scanning (3D reconstruction) of the object closely resembling the CAD model and robotic manipulator with high path tracking accuracy are required. However, the scope of this thesis is restricted only to 3D reconstruction and dispensing path adaptation based on the object’s pose displacement. The additional requirements are low overall tact time and low equipment cost. This thesis aims at three things: i) investigating different types of fiducial marker based camera 3D pose estimations using low cost consumer-grade RGBD camera ii) generating 3D reconstruction of the object for all the pose estimation types and iii) finding the object pose displacement from CAD model position. Checkerboard and ArUco markers are used as fiducial markers. Different types of pose estimation involving RGB only and RGB-D fused techniques are used to find the pose of the object. Truncated Signed Distance Function (TSDF) is used for surface reconstruction and Iterative Closest Point (ICP) is used for finding the pose alignment between the CAD and reconstruction. Tests are conducted for different object shapes in different positions. Then the reconstruction and the dispensing path adaptation accuracy are evaluated, along with the alignment tact time.
Dispensering av vätskor är en viktig del av många industriella tillämpningar, som 3D-skrivare och ytmontering. I en majoritet av dessa tillämpningar idag sker dispenseringen på platta, två-dimensionella ytor (eller substrat). I framtiden kommer dispensering i ökad omfattning ske på godtyckliga tredimensionella objekt, varför positionen av dispenseringshuvudet relativt hela det tre-dimensionella substratet kommer att krävas. Behovet driver på möjligheten att använda vision-baserade, fem- eller sex-axliga robotmanipulatorer, istället för befintliga lösningar med hårdkodade rumsbeskrivningar av ett objekt som dispenseringshuvudet ska följa. Given en tre-dimensionell kropp, en CAD-beskrivning av densamme och en bana på kroppen som ska följas av ett dispenseringshuvud syftar denna avhandling till att svara på följande industriella problem: Hur anpassas dispenseringsbanan om kroppen förskjuts från CAD-modellens position? Det viktigaste kravet för tillämpningen är hög volymsnoggrannhet på den nedlagda volymen som kräver hög positioneringsnoggrannhet, ned till 100 um med avseende på den tänkta CAD-definierade dispenseringsbanan. Den hög positioneringsnoggrannheten på dispenseringshuvudet garanterar även god positioneringsnoggrannhet på den dispenserade vätskan på substratet. För att uppnå hög doseringsnoggrannhet krävs robust volymetrisk skanning (3D-rekonstruktion) av objektet som liknar CADmodellen och robotmanipulator med hög noggrannhet. Arbetet i denna avhandling är begränsad till den tre-dimensionella rekonstruktionen och justering av dispenseringsbanan baserat på en ändring av objektets position. Andra tillämpningsspecifikationer är låg total takttid och låg utrustningskostnad. Avhandling syftar till att: i) undersöka olika typer av markörbaserade 3Dpositioneringsuppskattningar med en billig RGBD konsumentkamera ii) generera en 3D-rekonstruktion av objektet för alla typer av poseuppskattningar och iii) hitta objektets positioneringsförskjutning från CAD-modellens. Schackbräda och ArUco-markörer används som fiducialmarkörer. Olika typer av poseuppskattningar som utnyttjar RGB or RGB-D-kombinerade tekniker används. Truncated Signed Distance Function (TSDF) används för ytrekonstruktion. Iterative Closest Point (ICP) används för att hitta pose-justering mellan CADbeskrivningen och rekonstruktionen. Tester utförs för olika objektformer i olika positioner och riktningar. Därefter utvärderas rekonstruktionen, noggrannheten i justeringen av dispenseringsbanan, samt takttiden.
APA, Harvard, Vancouver, ISO, and other styles
24

Peñate, Sánchez Adrián. "3D pose estimation in complex environments." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/406085.

Full text
Abstract:
Although there has been remarkable progress in the pose estimation literature, there are still a number of limitations when existing algorithms must be applied in everyday applications, especially in uncontrolled environments. This thesis has addressed some of these limitations, computing the pose for uncalibrated cameras, computing the pose without knowing the correspondence between 20 and 30 points, computing the pose when the points of interest are unreliable and computing the pose using only depth data. The problems addressed, and consequently their contributions, have been analyzed in order of increasing complexity. At each new stage of the doctoral thesis existing restrictions for obtaining 30 camera pose increased. The thesis has consisted of four parts on which we will define the contributions made to the field of Computer Vision. The first contribution of the doctoral thesis has focused on providing a technique for obtaining the pose of an uncalibrated camera more robust and accurate than existing approaches. By the re-formulation of the equations used in calibrated perspectives methods and by studying numerical stability we obtained an extended equation formulation that offered a closed solution to the problem with increased stability in the presence of noise compared to the state of the art. The second contribution of the thesis has focused on the fact that most algorithms are based on having a set of 20-30 correspondences. This task usually involves the extraction and matching of points of interest. In this thesis it we have developed an algorithm that solves the estimation of correspondences between points and estimate the pose of the camera together, all this in an uncalibrated context. By solving both problems together you can optimize the steps we take much better than by just solving them separately. In articles published as a result of this work we have shown the advantages inherent in this approach. The third contribution of the thesis has been to provide a solution for estimating the pose of the camera in extreme situations where the image quality is very deteriorated. This is possible through the use of learning techniques from high-quality data and 30 models of the environment and the objects. This approach is based on the notion that by learning from high-quality data we can obtain detectors that are able to recognize objects in the worst circumstances because they know in depth what defines the object in question. The fourth contribution of the thesis is the creation of a pose estimation method that does not require color information, only depth. By defining local volumetric dense appearance and performing a dense feature extraction over the depth image. Once the dense feature sampling is obtained we perform an energy minimisation taking into account the pairwise terms between individual features. We obtain accuracy comparable to state of the art methods while performing atan arder of magnitude less time per image. The sum of the above contributions in 30 pose estimation have improved 30 reconstruction tools such as robotic vision and relocation in 30 maps. All contributions have been published in international journals and conferences of reputed scientific prestige in the area.
Aunque ha habido un progreso notable en la literatura de estimación de pose, todavía hay un número de limitaciones cuando los algoritmos existentes deben ser aplicados en aplicaciones de uso diario, especialmente en ambientes no controlados. En esta tesis se han abordado algunas de estas limitaciones, la computación de la pose para cámaras no calibradas, la computación de la pose sin conocer la correspondencia entre puntos 20 y 30, la computación de la pose cuando los puntos de interés no son fiables y la computación de la pose usando exclusivamente datos de profundidad. Los problemas abordados, y en consecuencia las contribuciones aportadas, han sido analizados en orden creciente de complejidad. En cada nueva etapa de la tesis doctoral se incrementaban las restricciones existentes para la obtención de la pose 30 de la cámara. La tesis ha constado de cuatro partes sobre las que pasaremos a definir las contribuciones realizadas al área de la Visión por Computador. La primera contribución de la tesis doctoral se ha centrado en ofrecer una técnica para la obtención de la pose de una cámara perspectiva sin calibrar más robusta y precisa que los existentes. Mediante la re-formulación de las ecuaciones perspectivas usadas en métodos calibrados y el estudio de la estabilidad numérica de las mismas se ha obtenido una formulación extendida de las ecuaciones perspectivas que ofrece una solución cerrada al problema y una mayor estabilidad en presencia de ruido. La segunda contribución de la tesis se ha centrado en el hecho de que la mayoría de los algoritmos se basan en tener un conjunto de correspondencias 20-30. Esta tarea implica generalmente la extracción y emparejamiento de puntos de interés. En esta tesis doctoral se ha desarrollado un algoritmo que aborda la estimación de las correspondencias entre puntos y estimación de la pose de la cámara de manera conjunta. Al resolver ambos problemas conjuntamente se puede optimizar los pasos a tomar mucho mejor que resolviéndolos por separado. En los trabajos publicados a raíz de este trabajo se han mostrado las ventajas inherentes a esta aproximación al problema. La tercera contribución de la tesis ha sido la de aportar una solución para la estimación de la pose de la cámara en situaciones extremas en las que la calidad de la imagen se encuentra muy deteriorada. Esto es posible mediante el uso de técnicas de aprendizaje a partir de datos de alta calidad y modelos 30 del entorno y los objetos presentes. Esta aproximación se basa en la noción de que a partir de un aprendizaje sobre datos de alta calidad se pueden obtener detectores que son capaces de reconocer los objetos en las peores circunstancias ya que conocen en profundidad aquello que define al objeto en cuestión. La cuarta contribución de la tesis es la creación de un método de estimación de pose que no requiere de información de color, solamente profundidad. Mediante una definición de apariencia volumétrica local y la extracción densa de características en la imagen de profundidad se obtiene un método comparable en precisión al estado de la cuestión pero un orden de magnitud mas rápido. La suma de las contribuciones anteriores en las tareas de estimación de pose 30 han posibilitado la mejora en las herramientas de reconstrucción 30, visión robótica y relocalización en mapas 30. Todas las contribuciones han sido publicadas en revistas y congresos internacionales y de reputado prestigio científico en el área.
APA, Harvard, Vancouver, ISO, and other styles
25

Pitteri, Giorgia. "3D Object Pose Estimation in Industrial Context." Thesis, Bordeaux, 2020. http://www.theses.fr/2020BORD0202.

Full text
Abstract:
La détection d'objets 3D et l'estimation de leur pose à partir d'images sont très importantes pour des tâches comme la robotique et la réalité augmentée et font l'objet d'intenses recherches depuis le début de la vision par ordinateur. D'importants progrès ont été réalisés récemment grâce au développement des méthodes basées sur l'apprentissage profond. Ce type d'approche fait néanmoins face à plusieurs obstacles majeurs qui se révèlent en milieu industriel, notamment la gestion des objets contenant des symétries et la généralisation à de nouveaux objets jamais vus par les réseaux lors de l'apprentissage.Dans cette thèse, nous montrons d'abord le lien entre les symétries d'un objet 3D et son apparence dans les images de manière analytique expliquant pourquoi les objets symétriques représentent un défi. Nous proposons alors une solution efficace et simple qui repose sur la normalisation de la rotation de la pose. Cette approche est générale et peut être utilisée avec n'importe quel algorithme d'estimation de pose 3D.Ensuite, nous abordons le deuxième défi: la géneralisation aux objets jamais vus pendant l'apprentissage. De nombreuses méthodes récentes d'estimation de la pose 3D sont très efficaces mais leur succès peut être attribué à l'utilisation d'approches d'apprentissage automatique supervisé. Pour chaque nouvel objet, ces méthodes doivent être re-entrainées sur de nombreuses images différentes de cet objet, ces images n'étant pas toujours disponibles. Même si les méthodes de transfert de domaine permettent de réaliser l'entrainement sur des images synthétiques plutôt que sur des images réelles, ces sessions d'entrainement prennent du temps, et il est fortement souhaitable de les éviter dans la pratique. Nous proposons deux méthodes pour traiter ce problème. La première méthode s’appuie uniquement sur la géométrie des objets et se concentre sur les objets avec des coins proéminents, ce qui est le cas pour un grand nombre d’objets industriels. Nous apprenons dans un premier temps à détecter les coins des objets de différentes formes dans les images et à prédire leurs poses 3D, en utilisant des images d'apprentissage d'un petit ensemble d'objets. Pour détecter un nouvel objet dans une image donnée, on identifie ses coins à partir de son modèle CAO, on détecte également les coins visibles sur l'image et on prédit leurs poses 3D. Nous introduisons ensuite un algorithme de type RANSAC qui détecte et estime de manière robuste et efficace la pose 3D de l'objet en faisant correspondre ses coins sur le modèle CAO avec leurs correspondants détectés dans l'image. La deuxième méthode surmonte les limites de la première et ne nécessite pas que les objets aient des coins spécifiques et la sélection hors ligne des coins sur le modèle CAO. Il combine l'apprentissage profond et la géométrie 3D, et repose sur une représentation réduite de la géométrie 3D locale pour faire correspondre les modèles CAO aux images d'entrée. Pour les points sur la surface des objets, cette représentation peut être calculée directement à partir du modèle CAO; pour les points de l'image, nous apprenons à la prédire à partir de l'image elle-même. Cela établit des correspondances entre les points 3D sur le modèle CAO et les points 2D des images. Cependant, beaucoup de ces correspondances sont ambiguës car de nombreux points peuvent avoir des géométries locales similaires. Nous utilisons alors Mask-RCNN sans l'information de la classe des objets pour détecter les nouveaux objets sans ré-entraîner le réseau et ainsi limiter drastiquement le nombre de correspondances possibles. La pose 3D est estimée à partir de ces correspondances discriminantes en utilisant un algorithme de type RANSAC
3D object detection and pose estimation are of primary importance for tasks such as robotic manipulation, augmented reality and they have been the focus of intense research in recent years. Methods relying on depth data acquired by depth cameras are robust. Unfortunately, active depth sensors are power hungry or sometimes it is not possible to use them. It is therefore often desirable to rely on color images. When training machine learning algorithms that aim at estimate object's 6D poses from images, many challenges arise, especially in industrial context that requires handling objects with symmetries and generalizing to unseen objects, i.e. objects never seen by the networks during training.In this thesis, we first analyse the link between the symmetries of a 3D object and its appearance in images. Our analysis explains why symmetrical objects can be a challenge when training machine learning algorithms to predict their 6D pose from images. We then propose an efficient and simple solution that relies on the normalization of the pose rotation. This approach is general and can be used with any 6D pose estimation algorithm.Then, we address the second main challenge: the generalization to unseen objects. Many recent methods for 6D pose estimation are robust and accurate but their success can be attributed to supervised Machine Learning approaches. For each new object, these methods have to be retrained on many different images of this object, which are not always available. Even if domain transfer methods allow for training such methods with synthetic images instead of real ones-at least to some extent-such training sessions take time, and it is highly desirable to avoid them in practice.We propose two methods to handle this problem. The first method relies only on the objects’ geometries and focuses on objects with prominent corners, which covers a large number of industrial objects. We first learn to detect object corners of various shapes in images and also to predict their 3D poses, by using training images of a small set of objects. To detect a new object in a given image, we first identify its corners from its CAD model; we also detect the corners visible in the image and predict their 3D poses. We then introduce a RANSAC-like algorithm that robustly and efficiently detects and estimates the object’s 3D pose by matching its corners on the CAD model with their detected counterparts in the image.The second method overcomes the limitations of the first one as it does not require objects to have specific corners and the offline selection of the corners on the CAD model. It combines Deep Learning and 3D geometry and relies on an embedding of the local 3D geometry to match the CAD models to the input images. For points at the surface of objects, this embedding can be computed directly from the CAD model; for image locations, we learn to predict it from the image itself. This establishes correspondences between 3D points on the CAD model and 2D locations of the input images. However, many of these correspondences are ambiguous as many points may have similar local geometries. We also show that we can use Mask-RCNN in a class-agnostic way to detect the new objects without retraining and thus drastically limit the number of possible correspondences. We can then robustly estimate a 3D pose from these discriminative correspondences using a RANSAC-like algorithm
APA, Harvard, Vancouver, ISO, and other styles
26

Madadi, Meysam. "Human segmentation, pose estimation and applications." Doctoral thesis, Universitat Autònoma de Barcelona, 2017. http://hdl.handle.net/10803/457900.

Full text
Abstract:
El análisis automático de seres humanos en fotografías o videos tiene grandes aplicaciones dentro de la visión por computador, incluyendo diagnóstico médico, deportes, entretenimiento, edición de películas y vigilancia, por nombrar sólo algunos. El cuerpo, la cara y la mano son los componentes más estudiados de los seres humanos. El cuerpo tiene muchas variabilidades en la forma y la ropa junto con altos grados de libertad en pose. La cara está compuesta por multitud de músculos, causando muchas deformaciones visibles, diferentes formas, y variabilidad en cabello. La mano es un objeto pequeño, que se mueve rápido y tiene altos grados de libertad. La adición de características humanas a todas las variabilidades antes mencionadas hace que el análisis humano sea una tarea muy difícil. En esta tesis, desarrollamos la segmentación humana en diferentes modalidades. En un primer escenario, segmentamos el cuerpo humano y la mano en imágenes de profundidad utilizando la forma basada en la deformación de forma. Desarrollamos un descriptor de forma basado en el contexto de forma y las probabilidades de clase de regiones de forma para extraer vecinos más cercanos. Consideramos entonces la alineación afın rígida frente a la deformación de forma iterativa no rígida. En un segundo escenario, segmentamos la cara en imágenes RGB usando redes neuronales convolucionales (CNN). Modelamos los Conditional Random Field con redes neuronales recurrentes. En nuestro modelo, los núcleos de pares no son fijos y aprendidos durante el entrenamiento. Hemos entrenado la red de extremo-a-extremo utilizando redes adversarias que mejoraron la segmentación del pelo con un alto margen. También hemos trabajado en la estimación de pose de manos 3D en imágenes de profundidad. En un enfoque generativo, se ajustó un modelo de dedo por separado para cada dedo. Minimizamos una función de energía basada en el área de superposición, la discrepancia de profundidad y las colisiones de los dedos. También se aplican modelos lineales en el espacio de la trayectoria articular para refinar las articulaciones ocluidas basadas en el error de las articulaciones visibles y la suavidad de la trayectoria invisible de las articulaciones. En un enfoque basado en CNN, desarrollamos una red de estructura de árbol para entrenar características específicas para cada dedo y las fusionamos para la consistencia de la pose global. También formulamos restricciones físicas y de apariencia como funciones de pérdida de la red. Finalmente, desarrollamos una serie de aplicaciones que consisten en mediciones biométricas humanas y retextura de prendas de vestir. También hemos generado algunos conjuntos de datos en esta tesis sobre diferentes tópicos del análisis de personas, que incluyen problemas de segmentación, manos sintéticas, ropa para retextura, y reconocimiento de gestos.
Automatic analyzing humans in photographs or videos has great potential applications in computer vision containing medical diagnosis, sports, entertainment, movie editing and surveillance, just to name a few. Body, face and hand are the most studied components of humans. Body has many variabilities in shape and clothing along with high degrees of freedom in pose. Face has many muscles causing many visible deformity, beside variable shape and hair style. Hand is a small object, moving fast and has high degrees of freedom. Adding human characteristics to all aforementioned variabilities makes human analysis quite a challenging task.  In this thesis, we developed human segmentation in different modalities. In a first scenario, we segmented human body and hand in depth images using example-based shape warping. We developed a shape descriptor based on shape context and class probabilities of shape regions to extract nearest neighbors. We then considered rigid affine alignment vs. non-rigid iterative shape warping. In a second scenario, we segmented face in RGB images using convolutional neural networks (CNN). We modeled conditional random field with recurrent neural networks. In our model pair-wise kernels are not fixed and learned during training. We trained the network end-to-end using adversarial networks which improved hair segmentation by a high margin. We also worked on 3D hand pose estimation in depth images. In a generative approach, we fitted a finger model separately for each finger based on our example-based rigid hand segmentation. We minimized an energy function based on overlapping area, depth discrepancy and finger collisions. We also applied linear models in joint trajectory space to refine occluded joints based on visible joints error and invisible joints trajectory smoothness. In a CNN-based approach, we developed a tree-structure network to train specific features for each finger and fused them for global pose consistency. We also formulated physical and appearance constraints as loss functions. Finally, we developed a number of applications consisting of human soft biometrics measurement and garment retexturing. We also generated some datasets in this thesis consisting of human segmentation, synthetic hand pose, garment retexturing and Italian gestures.
APA, Harvard, Vancouver, ISO, and other styles
27

Gkagkos, Polydefkis. "3D Human Pose and Shape-aware Modelling." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285922.

Full text
Abstract:
The focus of this thesis is the task of 3D pose estimation while taking into consideration the shape of a person in a single image. For rendering the human pose and the body shape we use a newly proposed statistical model, the SMPL [1]. We train a neural network to estimate the shape and the pose of a person in an image. Afterwards, we use an optimization procedure to further enhance the output. the network is trained by incorporating the optimized and the predicted parameters into the loss. This approach is based on SPIN [2]. We extend this method by using a stronger optimization that is based on several views and the error is summed over all of them. The main objective of this thesis is to utilize information from multiple views. The motivation for our method is to explore whether this optimization can provide better supervision to the network. In order to verify the effectiveness of our method, we conduct several experiments and we show appealing visual results. Lastly, to make the network generalize better we train simultaneously on seven datasets and achieve comparable to even better accuracy than similar methods from related work.
Fokus för denna avhandling är uppgiften att skatta en mänsklig 3D-pose ochsamtidigt ta hänsyn till personens form i en bild. För att rendera mänskligaposer och kroppsformer använder vi en nyligen föreslagen statistisk modell,SMPL [1]. Vi tränar ett neuralt nätverk för att skatta en persons pose och formi en bild. Därefter använder vi en optimerings-procedur för att ytterligare förbättradessa skattningar. Nätverket tränas genom att integrera de förbättradeskattningarna i en målfunktion tillsammans med de primitiva skattningarna.Denna strategi är baserad på SPIN [2]. Vi utökar denna metod genom att användaen optimerings-procedur som bygger på att inkorporera flera vyer ochsummera felet över alla dessa. Motivationen för vår metod är att utforska omden kan förbättra guidningen av nätverkets träning. För att få vårt nätverk attgeneralisera bättre så tränar vi på sju dataset samtidigt och uppnår jämförbarnoggrannhet med liknande metoder från relaterad forskning. Vi utför även fleraexperiment för att verifiera vår metods effektivitet.
APA, Harvard, Vancouver, ISO, and other styles
28

Johnson, Samuel Alan. "Articulated human pose estimation in natural images." Thesis, University of Leeds, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.598026.

Full text
Abstract:
In this thesis the problem of estimating the 2-D articulated pose, or configuration of a person in unconstrained images such as consumer photographs is addressed. Contributions are split among three major chapters. In previous work the Pictorial Structure Model approach has proven particularly successful. and is appealing because of its moderate computational cost. However, the accuracy of resulting pose estimates has been limited by the use of simple representations of limb appearance. In this thesis strong discriminatively trained limb detectors combining gradient and colour segmentation cues are proposed. The approach improves significantly on the "iterative image parsing" method which was the state-of-the-art at the time, and shows significant promise for combination with other models of pose and appearance. In the second pan of this thesis higher fidelity models of pose and appearance are proposed. The aim is to tackle extremely challenging properties of the human pose estimation task arising from variation in pose, anatomy, clothing. and imaging conditions. Current methods use simple models of body part appearance and plausible configurations due to limitations of available training data and constraints on computational expense. It is shown that such models severely limit accuracy. A new annotated database of challenging consumer images is introduced, an order of magnitude larger than currently available datasets. This larger amount of data allows partitioning of the pose space and the learning of multiple, clustered Pictorial Structure Models. A relative improvement in accuracy of over 50% is achieved compared to the standard, single model approach. In the final part of this thesis the clustered Pictorial Structure Model framework is extended to handle much larger quantities of training data. Furthermore it is shown how to utilise Amazon Mechanical Turk and a latent annotation update scheme to achieve high quality annotations at low cost. A significant increase in pose estimation accuracy is presented, while the computational expense of the framework is improved by a factor of
APA, Harvard, Vancouver, ISO, and other styles
29

Oleinikov, Georgii. "Towards human pose estimation in video sequences." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/45767.

Full text
Abstract:
Recent advancements in human pose estimation from single images have attracted wide scientific interest of the Computer Vision community to the problem domain. However, the problem of pose estimation from monocular video sequences is largely under-represented in the literature despite the wide range of its applications, such as action recognition and human-computer interaction. In this thesis we present two novel algorithms for video pose estimation that demonstrate how one could improve the performance of a state-of-the-art single-image articulated human detection algorithm on realistic video sequences. Furthermore, we release the UCF Sports Pose dataset, containing full-body pose annotations of people performing various actions in realistic videos, together with a novel pose evaluation metric that better reflects the performance of current state of the art. We also release the Video Pose Annotation tool, a highly customizable application that we used to construct the dataset. Finally, we introduce a task-based abstraction for human pose estimation, which selects the "best" algorithm for every specific instance based on a task description defined using an application programming interface covering the large volume of the human pose estimation domain.
APA, Harvard, Vancouver, ISO, and other styles
30

La, Gorce Martin de. "Model-based 3D hand pose estimation from monocular video." Thesis, Châtenay-Malabry, Ecole centrale de Paris, 2009. http://www.theses.fr/2009ECAP0045/document.

Full text
Abstract:
Dans cette thèse sont présentées deux méthodes visant à obtenir automatiquement une description tridimensionnelle des mouvements d'une main étant donnée une séquence vidéo monoculaire de cette main. En utilisant l'information fournie par la vidéo, l'objectif est de déterminer l'ensemble des paramètres cinématiques nécessaires à la description de la configuration spatiale des différentes parties de la main. Cet ensemble de paramètres est composé des angles de chaque articulation ainsi que de la position et de l'orientation globale du poignet. Ce problème est un problème difficile. La main a de nombreux degrés de liberté et les auto-occultations sont omniprésentes, ce qui rend difficile l'estimation de la configuration des parties partiellement ou totalement cachées. Dans cette thèse sont proposées deux nouvelles méthodes qui améliorent par certains aspects l'état de l'art pour ce problème. Ces deux méthodes sont basées sur un modèle de la main dont la configuration spatiale est ajustée pour que sa projection dans l'image corresponde au mieux à l'image de main observée. Ce processus est guidé par une fonction de coût qui définit une mesure quantitative de la qualité de l'alignement de la projection du modèle avec l'image observée. La procédure d'ajustement du modèle est réalisée grâce à un raffinement itératif de type descente de gradient quasi-newton qui vise à minimiser cette fonction de coût.Les deux méthodes proposées diffèrent principalement par le choix du modèle et de la fonction du coût. La première méthode repose sur un modèle de la main composé d'ellipsoïdes et d'une fonction coût utilisant un modèle de la distribution statistique de la couleur la main et du fond de l'image.La seconde méthode repose sur un modèle triangulé de la surface de la main qui est texturé est ombragé. La fonction de coût mesure directement, pixel par pixel, la différence entre l'image observée et l'image synthétique obtenue par projection du modèle de la main dans l'image. Lors du calcul du gradient de la fonction de coût, une attention particulière a été portée aux termes dûs aux changements de visibilité de la surface au voisinage des auto-occultations, termes qui ont été négligés dans les méthodes préexistantes.Ces deux méthodes ne fonctionnement malheureusement pas en temps réel, ce qui rend leur utilisation pour l'instant impossible dans un contexte d'interaction homme-machine. L'amélioration de la performance des ordinateur combinée avec une amélioration de ces méthodes pourrait éventuellement permettre d'obtenir un résultat en temps réel
In this thesis we propose two methods that allow to recover automatically a full description of the 3d motion of a hand given a monocular video sequence of this hand. Using the information provided by the video, our aimto is to determine the full set of kinematic parameters that are required to describe the pose of the skeleton of the hand. This set of parameters is composed of the angles associate to each joint/articulation and the global position and orientation of the wrist. This problem is extremely challenging. The hand as many degrees of freedom and auto-occlusion are ubiquitous, which makes difficult the estimation of occluded or partially ocluded hand parts.In this thesis, we introduce two novel methods of increasing complexity that improve to certain extend the state-of-the-art for monocular hand tracking problem. Both are model-based methods and are based on a hand model that is fitted to the image. This process is guided by an objective function that defines some image-based measure of the hand projection given the model parameters. The fitting process is achieved through an iterative refinement technique that is based on gradient-descent and aims a minimizing the objective function. The two methos differ mainly by the choice of the hand model and of the cost function.The first method relies on a hand model made of ellipsoids and a simple discrepancy measure based on global color distributions of the hand and the background. The second method uses a triangulated surface model with texture and shading and exploits a robust distance between the synthetic and observed image as discrepancy measure.While computing the gradient of the discrepancy measure, a particular attention is given to terms related to the changes of visibility of the surface near self occlusion boundaries that are neglected in existing formulations. Our hand tracking method is not real-time, which makes interactive applications not yet possible. Increase of computation power of computers and improvement of our method might make real-time attainable
APA, Harvard, Vancouver, ISO, and other styles
31

Cao, Hui, Yoshinori Takeuchi, Tetsuya Matsumoto, Hiroaki Kudo, and Noboru Ohnishi. "Recovering Human Pose by Collaborative Generative Models Estimation." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2005. http://hdl.handle.net/2237/10377.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Zhu, Aichun. "Articulated human pose estimation in images and video." Thesis, Troyes, 2016. http://www.theses.fr/2016TROY0013/document.

Full text
Abstract:
L’estimation de la pose du corps humain est un problème difficile en vision par ordinateur et les actions de toutes les difficultés de détection d’objet. Cette thèse se concentre sur les problèmes de l’estimation de la pose du corps humain dans les images ou vidéo, y compris la diversité des apparences, les changements de scène et l’éclairage de fond de confusion encombrement. Pour résoudre ces problèmes, nous construisons un modèle robuste comprenant les éléments suivants. Tout d’abord, les méthodes top-down et bottom-up sont combinés à l’estimation pose humaine. Nous étendons le modèle structure picturale (PS) de coopérer avec filtre à particules recuit (APF) pour robuste multi-vues estimation de la pose. Deuxièmement, nous proposons plusieurs parties de mélange à base (MMP) modèle d’une partie supérieure du corps pour l’estimation de la pose qui contient deux étapes. Dans la phase de pré-estimation, il y a trois étapes: la détection du haut du corps, catégorie estimation du modèle pour le haut du corps, et la sélection de modèle complet pour pose estimation. Dans l’étape de l’estimation, nous abordons le problème d’une variété de poses et les activités humaines. Enfin, le réseau de neurones à convolution (CNN) est introduit pour l’estimation de la pose. Un Local Multi-résolution réseau de neurones à convolution (LMR-CNN) est proposé pour apprendre la représentation pour chaque partie du corps. En outre, un modèle hiérarchique sur la base LMR-CNN est défini pour faire face à la complexité structurelle des parties de branche. Les résultats expérimentaux démontrent l’efficacité du modèle proposé
Human pose estimation is a challenging problem in computer vision and shares all the difficulties of object detection. This thesis focuses on the problems of human pose estimation in still images or video, including the diversity of appearances, changes in scene illumination and confounding background clutter. To tackle these problems, we build a robust model consisting of the following components. First, the top-down and bottom-up methods are combined to estimation human pose. We extend the Pictorial Structure (PS) model to cooperate with annealed particle filter (APF) for robust multi-view pose estimation. Second, we propose an upper body based multiple mixture parts (MMP) model for human pose estimation that contains two stages. In the pre-estimation stage, there are three steps: upper body detection, model category estimation for upper body, and full model selection for pose estimation. In the estimation stage, we address the problem of a variety of human poses and activities. Finally, a Deep Convolutional Neural Network (DCNN) is introduced for human pose estimation. A Local Multi-Resolution Convolutional Neural Network (LMR-CNN) is proposed to learn the representation for each body part. Moreover, a LMR-CNN based hierarchical model is defined to meet the structural complexity of limb parts. The experimental results demonstrate the effectiveness of the proposed model
APA, Harvard, Vancouver, ISO, and other styles
33

Navaratnam, Ramanan. "Probabilistic human body pose estimation from monocular images." Thesis, University of Cambridge, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.612174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Sandhu, Romeil Singh. "Statistical methods for 2D image segmentation and 3D pose estimation." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37245.

Full text
Abstract:
The field of computer vision focuses on the goal of developing techniques to exploit and extract information from underlying data that may represent images or other multidimensional data. In particular, two well-studied problems in computer vision are the fundamental tasks of 2D image segmentation and 3D pose estimation from a 2D scene. In this thesis, we first introduce two novel methodologies that attempt to independently solve 2D image segmentation and 3D pose estimation separately. Then, by leveraging the advantages of certain techniques from each problem, we couple both tasks in a variational and non-rigid manner through a single energy functional. Thus, the three theoretical components and contributions of this thesis are as follows: Firstly, a new distribution metric for 2D image segmentation is introduced. This is employed within the geometric active contour (GAC) framework. Secondly, a novel particle filtering approach is proposed for the problem of estimating the pose of two point sets that differ by a rigid body transformation. Thirdly, the two techniques of image segmentation and pose estimation are coupled in a single energy functional for a class of 3D rigid objects. After laying the groundwork and presenting these contributions, we then turn to their applicability to real world problems such as visual tracking. In particular, we present an example where we develop a novel tracking scheme for 3-D Laser RADAR imagery. However, we should mention that the proposed contributions are solutions for general imaging problems and therefore can be applied to medical imaging problems such as extracting the prostate from MRI imagery
APA, Harvard, Vancouver, ISO, and other styles
35

Thayananthan, Arasanathan. "Template-based pose estimation and tracking of 3D hand motion." Thesis, University of Cambridge, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.613782.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Cetinkaya, Guven. "A Comparative Study On Pose Estimation Algorithms Using Visual Data." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614109/index.pdf.

Full text
Abstract:
Computation of the position and orientation of an object with respect to a camera from its images is called pose estimation problem. Pose estimation is one of the major problems in computer vision, robotics and photogrammetry. Object tracking, object recognition, self-localization of robots are typical examples for the use of pose estimation. Determining the pose of an object from its projections requires 3D model of an object in its own reference system, the camera parameters and 2D image of the object. Most of the pose estimation algorithms require the correspondences between the 3D model points of the object and 2D image points. In this study, four well-known pose estimation algorithms requiring the 2D-3D correspondences to be known a priori
namely, Orthogonal Iterations, POSIT, DLT and Efficient PnP are compared. Moreover, two other well-known algorithms that solve the correspondence and pose problems simultaneously
Soft POSIT and Blind- PnP are also compared in the scope of this thesis study. In the first step of the simulations, synthetic data is formed using a realistic motion scenario and the algorithms are compared using this data. In the next step, real images captured by a calibrated camera for an object with known 3D model are exploited. The simulation results indicate that POSIT algorithm performs the best among the algorithms requiring point correspondences. Another result obtained from the experiments is that Soft-POSIT algorithm can be considered to perform better than Blind-PnP algorithm.
APA, Harvard, Vancouver, ISO, and other styles
37

Brauer, Jürgen [Verfasser]. "Human Pose Estimation with Implicit Shape Models / Jürgen Brauer." Karlsruhe : KIT Scientific Publishing, 2014. http://www.ksp.kit.edu.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Burke, Michael Glen. "Fast upper body pose estimation for human-robot interaction." Thesis, University of Cambridge, 2015. https://www.repository.cam.ac.uk/handle/1810/256305.

Full text
Abstract:
This work describes an upper body pose tracker that finds a 3D pose estimate using video sequences obtained from a monocular camera, with applications in human-robot interaction in mind. A novel mixture of Ornstein-Uhlenbeck processes model, trained in a reduced dimensional subspace and designed for analytical tractability, is introduced. This model acts as a collection of mean-reverting random walks that pull towards more commonly observed poses. Pose tracking using this model can be Rao-Blackwellised, allowing for computational efficiency while still incorporating bio-mechanical properties of the upper body. The model is used within a recursive Bayesian framework to provide reliable estimates of upper body pose when only a subset of body joints can be detected. Model training data can be extended through a retargeting process, and better pose coverage obtained through the use of Poisson disk sampling in the model training stage. Results on a number of test datasets show that the proposed approach provides pose estimation accuracy comparable with the state of the art in real time (30 fps) and can be extended to the multiple user case. As a motivating example, this work also introduces a pantomimic gesture recognition interface. Traditional approaches to gesture recognition for robot control make use of predefined codebooks of gestures, which are mapped directly to the robot behaviours they are intended to elicit. These gesture codewords are typically recognised using algorithms trained on multiple recordings of people performing the predefined gestures. Obtaining these recordings can be expensive and time consuming, and the codebook of gestures may not be particularly intuitive. This thesis presents arguments that pantomimic gestures, which mimic the intended robot behaviours directly, are potentially more intuitive, and proposes a transfer learning approach to recognition, where human hand gestures are mapped to recordings of robot behaviour by extracting temporal and spatial features that are inherently present in both pantomimed actions and robot behaviours. A Bayesian bias compensation scheme is introduced to compensate for potential classification bias in features. Results from a quadrotor behaviour selection problem show that good classification accuracy can be obtained when human hand gestures are recognised using behaviour recordings, and that classification using these behaviour recordings is more robust than using human hand recordings when users are allowed complete freedom over their choice of input gestures.
APA, Harvard, Vancouver, ISO, and other styles
39

Zhu, Youding. "Model-Based Human Pose Estimation with Spatio-Temporal Inferencing." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1242752509.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Krejov, Philip G. "Real time hand pose estimation for human computer interaction." Thesis, University of Surrey, 2016. http://epubs.surrey.ac.uk/809973/.

Full text
Abstract:
The aim of this thesis is to address the challenge of real-time pose estimation of the hand. Specifically this thesis aims to determine the joint positions of a non-augmented hand. This thesis focuses on the use of depth, performing localisation of the parts of the hand for efficient fitting of a kinematic model and consists of four main contributions. The first contribution presents an approach to Multi-touch(less) tracking, where the objective is to track the fingertips with a high degree of accuracy without sensor contact. Using a graph based approach, the surface of the hand is modelled and extrema of the hand are located. From this, gestures are identified and used for interaction. We briefly discuss one use case for this technology in the context of the Making Sense demonstrator inspired by the film ”The Minority Report”. This demonstration system allows an operator to quickly summarise and explore complex multi-modal multimedia data. The tracking approach allows for collaborative interactions due to its highly efficient tracking, resolving 4 hands simultaneously in real-time. The second contribution applies a Randomised Decision Forest (RDF) to the problem of pose estimation and presents a technique to identify regions of the hand, using features that sample depth. The RDF is an ensemble based classifier that is capable of generalising to unseen data and is capable of modelling expansive datasets, learning from over 70,000 pose examples. The approach is also demonstrated in the challenging application of American Sign Language (ASL) fingerspelling recognition. The third contribution combines a machine learning approach with a model based method to overcome the limitations of either technique in isolation. A RDF provides initial segmentation allowing surface constraints to be derived for a 3D model, which is subsequently fitted to the segmentation. This stage of global optimisation incorporates temporal information and enforces kinematic constraints. Using Rigid Body Dynamics for optimisation, invalid poses due to self-intersection and segmentation noise are resolved. Accuracy of the approach is limited by the natural variance between users and the use of a generic hand model. The final contribution therefore proposes an approach to refine pose via cascaded linear regression which samples the residual error between the depth and the model. This combination of techniques is demonstrated to provide state of the art accuracy in real time, without the use of a GPU and without the requirement for model initialisation.
APA, Harvard, Vancouver, ISO, and other styles
41

Jaeggli, Tobias. "Statistical models for human body pose estimation from videos." Konstanz Hartung-Gorre, 2008. http://d-nb.info/991839315/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Gomez-Donoso, Francisco. "Contributions to 3D object recognition and 3D hand pose estimation using deep learning techniques." Doctoral thesis, Universidad de Alicante, 2020. http://hdl.handle.net/10045/110658.

Full text
Abstract:
In this thesis, a study of two blooming fields in the artificial intelligence topic is carried out. The first part of the present document is about 3D object recognition methods. Object recognition in general is about providing the ability to understand what objects appears in the input data of an intelligent system. Any robot, from industrial robots to social robots, could benefit of such capability to improve its performance and carry out high level tasks. In fact, this topic has been largely studied and some object recognition methods present in the state of the art outperform humans in terms of accuracy. Nonetheless, these methods are image-based, namely, they focus in recognizing visual features. This could be a problem in some contexts as there exist objects that look alike some other, different objects. For instance, a social robot that recognizes a face in a picture, or an intelligent car that recognizes a pedestrian in a billboard. A potential solution for this issue would be involving tridimensional data so that the systems would not focus on visual features but topological features. Thus, in this thesis, a study of 3D object recognition methods is carried out. The approaches proposed in this document, which take advantage of deep learning methods, take as an input point clouds and are able to provide the correct category. We evaluated the proposals with a range of public challenges, datasets and real life data with high success. The second part of the thesis is about hand pose estimation. This is also an interesting topic that focuses in providing the hand's kinematics. A range of systems, from human computer interaction and virtual reality to social robots could benefit of such capability. For instance to interface a computer and control it with seamless hand gestures or to interact with a social robot that is able to understand human non-verbal communication methods. Thus, in the present document, hand pose estimation approaches are proposed. It is worth noting that the proposals take as an input color images and are able to provide 2D and 3D hand pose in the image plane and euclidean coordinate frames. Specifically, the hand poses are encoded in a collection of points that represents the joints in a hand, so that they can be easily reconstructed in the full hand pose. The methods are evaluated on custom and public datasets, and integrated with a robotic hand teleoperation application with great success.
APA, Harvard, Vancouver, ISO, and other styles
43

Devanne, Maxime. "3D human behavior understanding by shape analysis of human motion and pose." Thesis, Lille 1, 2015. http://www.theses.fr/2015LIL10138/document.

Full text
Abstract:
L'émergence de capteurs de profondeur capturant la structure 3D de la scène et du corps humain offre de nouvelles possibilités pour l'étude du mouvement et la compréhension des comportements humains. Cependant, la conception et le développement de modules de reconnaissance de comportements à la fois précis et efficaces est une tâche difficile en raison de la variabilité de la posture humaine, la complexité du mouvement et les interactions avec l'environnement. Dans cette thèse, nous nous concentrons d'abord sur le problème de la reconnaissance d'actions en représentant la trajectoire du corps humain au cours du temps, capturant ainsi simultanément la forme du corps et la dynamique du mouvement. Le problème de la reconnaissance d'actions est alors formulé comme le calcul de similitude entre la forme des trajectoires dans un cadre Riemannien. Les expériences menées sur quatre bases de données démontrent le potentiel de la solution en termes de précision/temps de latence de la reconnaissance d'actions. Deuxièmement, nous étendons l'étude aux comportements plus complexes en analysant l'évolution de la forme de la posture pour décomposer la séquence en unités de mouvement. Chaque unité de mouvement est alors caractérisée par la trajectoire de mouvement et l'apparence autour des mains, de manière à décrire le mouvement humain et l'interaction avec les objets. Enfin, la séquence de segments temporels est modélisée par un classifieur Bayésien naïf dynamique. Les expériences menées sur quatre bases de données évaluent le potentiel de l'approche dans différents contextes de reconnaissance et détection en ligne de comportements
The emergence of RGB-D sensors providing the 3D structure of both the scene and the human body offers new opportunities for studying human motion and understanding human behaviors. However, the design and development of models for behavior recognition that are both accurate and efficient is a challenging task due to the variability of the human pose, the complexity of human motion and possible interactions with the environment. In this thesis, we first focus on the action recognition problem by representing human action as the trajectory of 3D coordinates of human body joints over the time, thus capturing simultaneously the body shape and the dynamics of the motion. The action recognition problem is then formulated as the problem of computing the similarity between shape of trajectories in a Riemannian framework. Experiments carried out on four representative benchmarks demonstrate the potential of the proposed solution in terms of accuracy/latency for a low-latency action recognition. Second, we extend the study to more complex behaviors by analyzing the evolution of the human pose shape to decompose the motion stream into short motion units. Each motion unit is then characterized by the motion trajectory and depth appearance around hand joints, so as to describe the human motion and interaction with objects. Finally, the sequence of temporal segments is modeled through a Dynamic Naive Bayesian Classifier. Experiments on four representative datasets evaluate the potential of the proposed approach in different contexts, including recognition and online detection of behaviors
APA, Harvard, Vancouver, ISO, and other styles
44

Cabras, Paolo. "3D Pose estimation of continuously deformable instruments in robotic endoscopic surgery." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAD007/document.

Full text
Abstract:
Connaître la position 3D d’instruments robotisés peut être très utile dans le contexte chirurgical. Nous proposons deux méthodes automatiques pour déduire la pose 3D d’un instrument avec une unique section pliable et équipé avec des marqueurs colorés, en utilisant uniquement les images fournies par la caméra monoculaire incorporée dans l'endoscope. Une méthode basée sur les graphes permet segmenter les marqueurs et leurs coins apparents sont extraits en détectant la transition de couleur le long des courbes de Bézier qui modélisent les points du bord. Ces primitives sont utilisées pour estimer la pose 3D de l'instrument en utilisant un modèle adaptatif qui prend en compte les jeux mécaniques du système. Pour éviter les limites de cette approche dérivants des incertitudes sur le modèle géométrique, la fonction image-position-3D peut être appris selon un ensemble d’entrainement. Deux techniques ont été étudiées et améliorées : réseau des fonctions à base radiale avec noyaux gaussiens et une régression localement pondérée. Les méthodes proposées sont validées sur une cellule expérimentale robotique et sur des séquences in-vivo
Knowing the 3D position of robotized instruments can be useful in surgical context for e.g. their automatic control or gesture guidance. We propose two methods to infer the 3D pose of a single bending section instrument equipped with colored markers using only the images provided by the monocular camera embedded in the endoscope. A graph-based method is used to segment the markers. Their corners are extracted by detecting color transitions along Bézier curves fitted on edge points. These features are used to estimate the 3D pose of the instrument using an adaptive model that takes into account the mechanical plays of the system. Since this method can be affected by model uncertainties, the image-to-3d function can be learned according to a training set. We opted for two techniques that have been improved : Radial Basis Function Network with Gaussian kernel and Locally Weighted Projection. The proposed methods are validated on a robotic experimental cell and in in-vivo sequences
APA, Harvard, Vancouver, ISO, and other styles
45

Schick, Alexander [Verfasser], and R. [Akademischer Betreuer] Stiefelhagen. "Human Pose Estimation with Supervoxels / Alexander Schick. Betreuer: R. Stiefelhagen." Karlsruhe : KIT-Bibliothek, 2014. http://d-nb.info/1051371317/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Dogan, Emre. "Human pose estimation and action recognition by multi-robot systems." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEI060/document.

Full text
Abstract:
L'estimation de la pose humaine et la reconnaissance des activités humaines sont des étapes importantes dans de nombreuses applications comme la robotique, la surveillance et la sécurité, etc. Actuellement abordées dans le domaine, ces tâches ne sont toujours pas résolues dans des environnements non-coopératifs particulièrement. Ces tâches admettent de divers défis comme l'occlusion, les variations des vêtements, etc. Les méthodes qui exploitent des images de profondeur ont l’avantage concernant les défis liés à l'arrière-plan et à l'apparence, pourtant, l’application est limitée pour des raisons matérielles. Dans un premier temps, nous nous sommes concentrés sur la reconnaissance des actions complexes depuis des vidéos. Pour ceci, nous avons introduit une représentation spatio-temporelle indépendante du point de vue. Plus précisément, nous avons capturé le mouvement de la personne en utilisant un capteur de profondeur et l'avons encodé en 3D pour le représenter. Un descripteur 3D a ensuite été utilisé pour la classification des séquences avec la méthodologie bag-of-words. Pour la deuxième partie, notre objectif était l'estimation de pose articulée, qui est souvent une étape intermédiaire pour la reconnaissance de l'activité. Notre motivation était d'incorporer des informations à partir de capteurs multiples et de les fusionner pour surmonter le problème de l'auto-occlusion. Ainsi, nous avons proposé un modèle de flexible mixtures-of-parts multi-vues inspiré par la méthodologie classique de structure pictural. Nous avons démontré que les contraintes géométriques et les paramètres de cohérence d'apparence sont efficaces pour renforcer la cohérence entre les points de vue, aussi que les paramètres classiques. Finalement, nous avons évalué ces nouvelles méthodes sur des datasets publics, qui vérifie que l'utilisation de représentations indépendantes de la vue et l'intégration d'informations à partir de points de vue multiples améliore la performance pour les tâches ciblées dans le cadre de cette manuscrit
Estimating human pose and recognizing human activities are important steps in many applications, such as human computer interfaces (HCI), health care, smart conferencing, robotics, security surveillance etc. Despite the ongoing effort in the domain, these tasks remained unsolved in unconstrained and non cooperative environments in particular. Pose estimation and activity recognition face many challenges under these conditions such as occlusion or self occlusion, variations in clothing, background clutter, deformable nature of human body and diversity of human behaviors during activities. Using depth imagery has been a popular solution to address appearance and background related challenges, but it has restricted application area due to its hardware limitations and fails to handle remaining problems. Specifically, we considered action recognition scenarios where the position of the recording device is not fixed, and consequently require a method which is not affected by the viewpoint. As a second prob- lem, we tackled the human pose estimation task in particular settings where multiple visual sensors are available and allowed to collaborate. In this thesis, we addressed these two related problems separately. In the first part, we focused on indoor action recognition from videos and we consider complex ac- tivities. To this end, we explored several methodologies and eventually introduced a 3D spatio-temporal representation for a video sequence that is viewpoint independent. More specifically, we captured the movement of the person over time using depth sensor and we encoded it in 3D to represent the performed action with a single structure. A 3D feature descriptor was employed afterwards to build a codebook and classify the actions with the bag-of-words approach. As for the second part, we concentrated on articulated pose estimation, which is often an intermediate step for activity recognition. Our motivation was to incorporate information from multiple sources and views and fuse them early in the pipeline to overcome the problem of self-occlusion, and eventually obtain robust estimations. To achieve this, we proposed a multi-view flexible mixture of parts model inspired by the classical pictorial structures methodology. In addition to the single-view appearance of the human body and its kinematic priors, we demonstrated that geometrical constraints and appearance- consistency parameters are effective for boosting the coherence between the viewpoints in a multi-view setting. Both methods that we proposed was evaluated on public benchmarks and showed that the use of view-independent representations and integrating information from multiple viewpoints improves the performance of action recognition and pose estimation tasks, respectively
APA, Harvard, Vancouver, ISO, and other styles
47

Lu, Yao. "Human body tracking and pose estimation from monocular image sequences." Thesis, Curtin University, 2013. http://hdl.handle.net/20.500.11937/1665.

Full text
Abstract:
This thesis describes a bottom-up approach to estimating human pose over time based on monocular views with no restriction on human activities,Three approaches are proposed to address the weaknesses of existing approaches, including building a specific appearance model using clustering,utilising both the generic and specific appearance models in the estimation, and building an uncontaminated appearance model by removing backgroundpixels from the training samples. Experimental results show that the proposed system outperforms existing system significantly.
APA, Harvard, Vancouver, ISO, and other styles
48

Garau, Nicola. "Design of Viewpoint-Equivariant Networks to Improve Human Pose Estimation." Doctoral thesis, Università degli studi di Trento, 2022. http://hdl.handle.net/11572/345132.

Full text
Abstract:
Human pose estimation (HPE) is an ever-growing research field, with an increasing number of publications in the computer vision and deep learning fields and it covers a multitude of practical scenarios, from sports to entertainment and from surveillance to medical applications. Despite the impressive results that can be obtained with HPE, there are still many problems that need to be tackled when dealing with real-world applications. Most of the issues are linked to a poor or completely wrong detection of the pose that emerges from the inability of the network to model the viewpoint. This thesis shows how designing viewpoint-equivariant neural networks can lead to substantial improvements in the field of human pose estimation, both in terms of state-of-the-art results and better real-world applications. By jointly learning how to build hierarchical human body poses together with the observer viewpoint, a network can learn to generalise its predictions when dealing with previously unseen viewpoints. As a result, the amount of training data needed can be drastically reduced, simultaneously leading to faster and more efficient training and more robust and interpretable real-world applications.
APA, Harvard, Vancouver, ISO, and other styles
49

Derkach, Dmytro. "Spectrum analysis methods for 3D facial expression recognition and head pose estimation." Doctoral thesis, Universitat Pompeu Fabra, 2018. http://hdl.handle.net/10803/664578.

Full text
Abstract:
Al llarg de les últimes dècades, l'anàlisi facial ha atret un interès creixent i considerable per part de la comunitat investigadora amb l’objectiu de millorar la interacció i la cooperació entre les persones i les màquines. Aquest interès ha propiciat la creació de sistemes automàtics capaços de reaccionar a diversos estímuls com ara els moviments del cap o les emocions d’una persona. Més enllà, les tasques automatitzades s’han de poder realitzar amb gran precisió dins d’entorns no controlats, fet que ressalta la necessitat d'algoritmes que aprofitin al màxim els avantatges que proporcionen les dades 3D. Aquests sistemes poden ser útils en molts àmbits com ara la interacció home-màquina, tutories, entrevistes, atenció sanitària, màrqueting, etc. En aquesta tesi, ens centrem en dos aspectes de l'anàlisi facial: el reconeixement d'expressions i l'estimació de l'orientació del cap. En ambdós casos, ens enfoquem en l’ús de dades 3D i presentem contribucions que tenen com a objectiu la identificació de representacions significatives de la geometria facial mitjançant mètodes basats en la descomposició espectral: 1. Proposem una tecnologia basada en la representació espectral per al reconeixement d’expressions facials utilitzant exclusivament la geometria 3D, la qual ens permet una descripció completa de la superfície subjacent que pot ser ajustada al nivell de detall desitjat. Dita tecnologia, es basa en la descomposició de fragments locals de la superfície en les seves components de freqüència espacial, d’una manera semblant a la transformada de Fourier, que estan relacionades amb característiques intrínseques de la superfície. Concretament, proposem la utilització de les Graph Laplacian Features (GLFs) que resulten de la projecció dels fragments locals de la superfície a una base comuna obtinguda a partir del Graph Laplacian eigenspace. El mètode proposat s’ha avaluat en termes de reconeixement d’expressions i Action Units (activacions musculars facials), i els resultats obtinguts confirmen que les GLFs produeixen taxes de reconeixement comparables a l’estat de l’art. 2. Proposem un mètode per a l’estimació de l’orientació del cap que permet modelar el manifold subjacent que formen les rotacions generals en 3D. En primer lloc, construïm un sistema completament automàtic que combina la detecció de landmarks (punts facials rellevants) i característiques basades en diccionari, el qual ha obtingut els millors resultats al FG2017 Head Pose Estimation Challenge. Posteriorment, utilitzem una representació basada en tensors i la seva descomposició en els valors singulars d’ordre més alt per tal de separar els subespais de cada factor de rotació i mostrar que cada un d’ells té una estructura clara que pot ser modelada amb funcions trigonomètriques. Aquesta representació proporciona un coneixement detallat del comportament de les dades i pot ser utilitzada per millorar l’estimació de les orientacions dels angles del cap.
Facial analysis has attracted considerable research efforts over the last decades, with a growing interest in improving the interaction and cooperation between people and computers. This makes it necessary that automatic systems are able to react to things such as the head movements of a user or his/her emotions. Further, this should be done accurately and in unconstrained environments, which highlights the need for algorithms that can take full advantage of 3D data. These systems could be useful in multiple domains such as human-computer interaction, tutoring, interviewing, health-care, marketing etc. In this thesis, we focus on two aspects of facial analysis: expression recognition and head pose estimation. In both cases, we specifically target the use of 3D data and present contributions that aim to identify meaningful representations of the facial geometry based on spectral decomposition methods: 1. We propose a spectral representation framework for facial expression recognition using exclusively 3D geometry, which allows a complete description of the underlying surface that can be further tuned to the desired level of detail. It is based on the decomposition of local surface patches in their spatial frequency components, much like a Fourier transform, which are related to intrinsic characteristics of the surface. We propose the use of Graph Laplacian Features (GLFs), which result from the projection of local surface patches into a common basis obtained from the Graph Laplacian eigenspace. The proposed approach is tested in terms of expression and Action Unit recognition and results confirm that the proposed GLFs produce state-of-the-art recognition rates. 2. We propose an approach for head pose estimation that allows modeling the underlying manifold that results from general rotations in 3D. We start by building a fully-automatic system based on the combination of landmark detection and dictionary-based features, which obtained the best results in the FG2017 Head Pose Estimation Challenge. Then, we use tensor representation and higher order singular value decomposition to separate the subspaces that correspond to each rotation factor and show that each of them has a clear structure that can be modeled with trigonometric functions. Such representation provides a deep understanding of data behavior, and can be used to further improve the estimation of the head pose angles.
APA, Harvard, Vancouver, ISO, and other styles
50

Lecrosnier, Louis. "Estimation de pose multimodale- Approche robuste par les droites 2D et 3D." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMR089.

Full text
Abstract:
Avec la complexification des tâches confiées aux robots, la perception de l’environnement, aussi appelée perception de scène, doit se faire de manière plus complète. L’emploi simultané de différents types de capteurs, ou multimodalité, est l’un des leviers employés à cet effet. L’exploitation de données issues de modalités différentes nécessite généralement de connaître la pose, c’est-à-dire l’orientation et la position, de chaque capteur relativement aux autres. Les travaux présentés dans cette thèse se concentrent sur la problématique d’estimation de pose robuste dans le cas d’une multimodalité incluant des capteurs de type caméra et LiDAR. Deux contributions majeures sont présentées dans ce manuscrit. On présente dans un premier temps un algorithme d’estimation de pose original s’appuyant sur la correspondances entre droites 2D et 3D, ainsi qu’une connaissance a priori de la direction verticale. Dans l’optique d’améliorer la robustesse de cet algorithme, une deuxième contribution repose sur un algorithme d’appariement de droites et de rejet de paires aberrantes basé RANSAC. Cette méthode fonctionne à l’aide de deux ou d’une seule paire de droites, diminuant le coût en calcul du problème. Les résultats obtenus sur des jeux de données simulées et réelles démontrent une amélioration des performances en comparaison avec les méthodes de l’état de l’art
Camera pose estimation consists in determining the position and the orientation of a camera with respect to a reference frame. In the context of mobile robotics, multimodality, i.e. the use of various sensor types, is often a requirement to solve complex tasks. However, knowing the orientation and position, i.e. the pose, of each sensor regarding a common frame is generally necessary to benefit multimodality. In this context, we present two major contributions with this PhD thesis. First, we introduce a pose estimation algorithm relying on 2D and 3D line and a known vertical direction. Secondly, we present two outliers rejection and line pairing methods based on the well known RANSAC algorithm. Our methods make use of the vertical direction to reduce the number of lines required to 2 and 1, i.e. RANSAC2 and RANSAC1. A robustness evaluation of our contributions is performed on simulated and real data. We show state of the art results
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography