Увійти

Готові списки джерел за темами / Multimodal object tracking

Добірка наукової літератури з теми "Multimodal object tracking"

Автор: Grafiati

Опубліковано: 2 листопада 2022

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Multimodal object tracking".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Зміст

Статті в журналах
Дисертації
Частини книг
Тези доповідей конференцій

Статті в журналах з теми "Multimodal object tracking":

1

Zhang, Liwei, Jiahong Lai, Zenghui Zhang, Zhen Deng, Bingwei He, and Yucheng He. "Multimodal Multiobject Tracking by Fusing Deep Appearance Features and Motion Information." Complexity 2020 (September 25, 2020): 1–10. http://dx.doi.org/10.1155/2020/8810340.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Multiobject Tracking (MOT) is one of the most important abilities of autonomous driving systems. However, most of the existing MOT methods only use a single sensor, such as a camera, which has the problem of insufficient reliability. In this paper, we propose a novel Multiobject Tracking method by fusing deep appearance features and motion information of objects. In this method, the locations of objects are first determined based on a 2D object detector and a 3D object detector. We use the Nonmaximum Suppression (NMS) algorithm to combine the detection results of the two detectors to ensure the detection accuracy in complex scenes. After that, we use Convolutional Neural Network (CNN) to learn the deep appearance features of objects and employ Kalman Filter to obtain the motion information of objects. Finally, the MOT task is achieved by associating the motion information and deep appearance features. A successful match indicates that the object was tracked successfully. A set of experiments on the KITTI Tracking Benchmark shows that the proposed MOT method can effectively perform the MOT task. The Multiobject Tracking Accuracy (MOTA) is up to 76.40% and the Multiobject Tracking Precision (MOTP) is up to 83.50%.

2

Kota, John S., and Antonia Papandreou-Suppappola. "Joint Design of Transmit Waveforms for Object Tracking in Coexisting Multimodal Sensing Systems." Sensors 19, no. 8 (April 12, 2019): 1753. http://dx.doi.org/10.3390/s19081753.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

We examine a multiple object tracking problem by jointly optimizing the transmit waveforms used in a multimodal system. Coexisting sensors in this system were assumed to share the same spectrum. Depending on the application, a system can include radars tracking multiple targets or multiuser wireless communications and a radar tracking both multiple messages and a target. The proposed spectral coexistence approach was based on designing all transmit waveforms to have the same time-varying phase function while optimizing desirable performance metrics. Considering the scenario of tracking a target with a pulse–Doppler radar and multiple user messages, two signaling schemes were proposed after selecting the waveform parameters to first minimize multiple access interference. The first scheme is based on system interference minimization, whereas the second scheme explores the multiobjective optimization tradeoff between system interference and object parameter estimation error. Simulations are provided to demonstrate the performance tradeoffs due to different system requirements.

3

Muresan, Mircea Paul, Ion Giosan, and Sergiu Nedevschi. "Stabilization and Validation of 3D Object Position Using Multimodal Sensor Fusion and Semantic Segmentation." Sensors 20, no. 4 (February 18, 2020): 1110. http://dx.doi.org/10.3390/s20041110.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The stabilization and validation process of the measured position of objects is an important step for high-level perception functions and for the correct processing of sensory data. The goal of this process is to detect and handle inconsistencies between different sensor measurements, which result from the perception system. The aggregation of the detections from different sensors consists in the combination of the sensorial data in one common reference frame for each identified object, leading to the creation of a super-sensor. The result of the data aggregation may end up with errors such as false detections, misplaced object cuboids or an incorrect number of objects in the scene. The stabilization and validation process is focused on mitigating these problems. The current paper proposes four contributions for solving the stabilization and validation task, for autonomous vehicles, using the following sensors: trifocal camera, fisheye camera, long-range RADAR (Radio detection and ranging), and 4-layer and 16-layer LIDARs (Light Detection and Ranging). We propose two original data association methods used in the sensor fusion and tracking processes. The first data association algorithm is created for tracking LIDAR objects and combines multiple appearance and motion features in order to exploit the available information for road objects. The second novel data association algorithm is designed for trifocal camera objects and has the objective of finding measurement correspondences to sensor fused objects such that the super-sensor data are enriched by adding the semantic class information. The implemented trifocal object association solution uses a novel polar association scheme combined with a decision tree to find the best hypothesis–measurement correlations. Another contribution we propose for stabilizing object position and unpredictable behavior of road objects, provided by multiple types of complementary sensors, is the use of a fusion approach based on the Unscented Kalman Filter and a single-layer perceptron. The last novel contribution is related to the validation of the 3D object position, which is solved using a fuzzy logic technique combined with a semantic segmentation image. The proposed algorithms have a real-time performance, achieving a cumulative running time of 90 ms, and have been evaluated using ground truth data extracted from a high-precision GPS (global positioning system) with 2 cm accuracy, obtaining an average error of 0.8 m.

4

Motlicek, Petr, Stefan Duffner, Danil Korchagin, Hervé Bourlard, Carl Scheffler, Jean-Marc Odobez, Giovanni Del Galdo, Markus Kallinger, and Oliver Thiergart. "Real-Time Audio-Visual Analysis for Multiperson Videoconferencing." Advances in Multimedia 2013 (2013): 1–21. http://dx.doi.org/10.1155/2013/175745.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.

5

Monir, Islam A., Mohamed W. Fakhr, and Nashwa El-Bendary. "Multimodal deep learning model for human handover classification." Bulletin of Electrical Engineering and Informatics 11, no. 2 (April 1, 2022): 974–85. http://dx.doi.org/10.11591/eei.v11i2.3690.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Giving and receiving objects between humans and robots is a critical task which collaborative robots must be able to do. In order for robots to achieve that, they must be able to classify different types of human handover motions. Previous works did not mainly focus on classifying the motion type from both giver and receiver perspectives. However, they solely focused on object grasping, handover detection, and handover classification from one side only (giver/receiver). This paper discusses the design and implementation of different deep learning architectures with long short term memory (LSTM) network; and different feature selection techniques for human handover classification from both giver and receiver perspectives. Classification performance while using unimodal and multimodal deep learning models is investigated. The data used for evaluation is a publicly available dataset with four different modalities: motion tracking sensors readings, Kinect readings for 15 joints positions, 6-axis inertial sensor readings, and video recordings. The multimodality added a huge boost in the classification performance; achieving 96% accuracy with the feature selection based deep learning architecture.

6

Shibuya, Masaki, Kengo Ohnishi, and Isamu Kajitani. "Networked Multimodal Sensor Control of Powered 2-DOF Wrist and Hand." Journal of Robotics 2017 (2017): 1–12. http://dx.doi.org/10.1155/2017/7862178.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

A prosthetic limb control system to operate powered 2-DOF wrist and 1-DOF hand with environmental information, myoelectric signal, and forearm posture signal is composed and evaluated. Our concept model on fusing biosignal and environmental information for easier manipulation with upper limb prosthesis is assembled utilizing networking software and prosthetic component interlink platform. The target is to enhance the controllability of the powered wrist’s orientation by processing the information to derive the joint movement in a physiologically appropriate manner. We applied a manipulative skill model of prehension which is constrained by forearm properties, grasping object properties, and task. The myoelectric and forearm posture sensor signals were combined with the work plane posture and the operation mode for grasping object properties. To verify the reduction of the operational load with the proposed method, we conducted 2 performance tests: system performance test to identify the powered 2-DOF wrist’s tracking performance and user operation tests. From the system performance experiment, the fusion control was confirmed to be sufficient to control the wrist joint with respect to the work plane posture. Forearm posture angle ranges were reduced when the prosthesis was operated companying environmental information in the user operation tests.

7

Kandylakis, Zacharias, Konstantinos Vasili, and Konstantinos Karantzalos. "Fusing Multimodal Video Data for Detecting Moving Objects/Targets in Challenging Indoor and Outdoor Scenes." Remote Sensing 11, no. 4 (February 21, 2019): 446. http://dx.doi.org/10.3390/rs11040446.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Single sensor systems and standard optical—usually RGB CCTV video cameras—fail to provide adequate observations, or the amount of spectral information required to build rich, expressive, discriminative features for object detection and tracking tasks in challenging outdoor and indoor scenes under various environmental/illumination conditions. Towards this direction, we have designed a multisensor system based on thermal, shortwave infrared, and hyperspectral video sensors and propose a processing pipeline able to perform in real-time object detection tasks despite the huge amount of the concurrently acquired video streams. In particular, in order to avoid the computationally intensive coregistration of the hyperspectral data with other imaging modalities, the initially detected targets are projected through a local coordinate system on the hypercube image plane. Regarding the object detection, a detector-agnostic procedure has been developed, integrating both unsupervised (background subtraction) and supervised (deep learning convolutional neural networks) techniques for validation purposes. The detected and verified targets are extracted through the fusion and data association steps based on temporal spectral signatures of both target and background. The quite promising experimental results in challenging indoor and outdoor scenes indicated the robust and efficient performance of the developed methodology under different conditions like fog, smoke, and illumination changes.

8

Kim, Jongwon, and Jeongho Cho. "RGDiNet: Efficient Onboard Object Detection with Faster R-CNN for Air-to-Ground Surveillance." Sensors 21, no. 5 (March 1, 2021): 1677. http://dx.doi.org/10.3390/s21051677.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

An essential component for the autonomous flight or air-to-ground surveillance of a UAV is an object detection device. It must possess a high detection accuracy and requires real-time data processing to be employed for various tasks such as search and rescue, object tracking and disaster analysis. With the recent advancements in multimodal data-based object detection architectures, autonomous driving technology has significantly improved, and the latest algorithm has achieved an average precision of up to 96%. However, these remarkable advances may be unsuitable for the image processing of UAV aerial data directly onboard for object detection because of the following major problems: (1) Objects in aerial views generally have a smaller size than in an image and they are uneven and sparsely distributed throughout an image; (2) Objects are exposed to various environmental changes, such as occlusion and background interference; and (3) The payload weight of a UAV is limited. Thus, we propose employing a new real-time onboard object detection architecture, an RGB aerial image and a point cloud data (PCD) depth map image network (RGDiNet). A faster region-based convolutional neural network was used as the baseline detection network and an RGD, an integration of the RGB aerial image and the depth map reconstructed by the light detection and ranging PCD, was utilized as an input for computational efficiency. Performance tests and evaluation of the proposed RGDiNet were conducted under various operating conditions using hand-labeled aerial datasets. Consequently, it was shown that the proposed method has a superior performance for the detection of vehicles and pedestrians than conventional vision-based methods.

9

Popp, Constantin, and Damian T. Murphy. "Creating Audio Object-Focused Acoustic Environments for Room-Scale Virtual Reality." Applied Sciences 12, no. 14 (July 20, 2022): 7306. http://dx.doi.org/10.3390/app12147306.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Room-scale virtual reality (VR) affordance in movement and interactivity causes new challenges in creating virtual acoustic environments for VR experiences. Such environments are typically constructed from virtual interactive objects that are accompanied by an Ambisonic bed and an off-screen (“invisible”) music soundtrack, with the Ambisonic bed, music, and virtual acoustics describing the aural features of an area. This methodology can become problematic in room-scale VR as the player cannot approach or interact with such background sounds, contradicting the player’s motion aurally and limiting interactivity. Written from a sound designer’s perspective, the paper addresses these issues by proposing a musically inclusive novel methodology that reimagines an acoustic environment predominately using objects that are governed by multimodal rule-based systems and spatialized in six degrees of freedom using 3D binaural audio exclusively while minimizing the use of Ambisonic beds and non-diegetic music. This methodology is implemented using off-the-shelf, creator-oriented tools and methods and is evaluated through the development of a standalone, narrative, prototype room-scale VR experience. The experience’s target platform is a mobile, untethered VR system based on head-mounted displays, inside-out tracking, head-mounted loudspeakers or headphones, and hand-held controllers. The authors apply their methodology to the generation of ambiences based on sound-based music, sound effects, and virtual acoustics. The proposed methodology benefits the interactivity and spatial behavior of virtual acoustic environments but may be constrained by platform and project limitations.

10

Birchfield, David, and Mina Johnson-Glenberg. "A Next Gen Interface for Embodied Learning." International Journal of Gaming and Computer-Mediated Simulations 2, no. 1 (January 2010): 49–58. http://dx.doi.org/10.4018/jgcms.2010010105.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Emerging research from the learning sciences and human-computer interaction supports the premise that learning is effective when it is embodied, collaborative, and multimodal. In response, we have developed a mixed-reality environment called the Situated Multimedia Arts Learning Laboratory (SMALLab). SMALLab enables multiple students to interact with one another and digitally mediated elements via 3D movements and gestures in real physical space. It uses 3D object tracking, real time graphics, and surround-sound to enhance learning. We present two studies from the earth science domain that address questions regarding the feasibility and efficacy of SMALLab in a classroom context. We present data demonstrating that students learn more during a recent SMALLab intervention compared to regular classroom instruction. We contend that well-designed, mixed-reality environments have much to offer STEM learners, and that the learning gains transcend those that can be expected from more traditional classroom procedures.

Більше джерел

Дисертації з теми "Multimodal object tracking":

1

De, goussencourt Timothée. "Système multimodal de prévisualisation “on set” pour le cinéma." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAT106/document.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

La previz on-set est une étape de prévisualisation qui a lieu directement pendant la phase de tournage d’un film à effets spéciaux. Cette proposition de prévisualisation consiste à montrer au réalisateur une vue assemblée du plan final en temps réel. Le travail présenté dans cette thèse s’intéresse à une étape spécifique de la prévisualisation : le compositing. Cette étape consiste à mélanger plusieurs sources d’images pour composer un plan unique et cohérent. Dans notre cas, il s’agit de mélanger une image de synthèse avec une image issue de la caméra présente sur le plateau de tournage. Les effets spéciaux numériques sont ainsi ajoutés à la prise de vue réelle. L’objectif de cette thèse consiste donc à proposer un système permettant l’ajustement automatique du mélange entre les deux images. La méthode proposée nécessite la mesure de la géométrie de la scène filmée. Pour cette raison, un capteur de profondeur est ajouté à la caméra de tournage. Les données sont relayées à l’ordinateur qui exécute un algorithme permettant de fusionner les données du capteur de profondeur et de la caméra de tournage. Par le biais d’un démonstrateur matériel, nous avons formalisé une solution intégrée dans un moteur de jeux vidéo. Les expérimentations menées montrent dans un premier temps des résultats encourageants pour le compositing en temps réel. Nous avons observé une amélioration des résultats suite à l’introduction de la méthode de segmentation conjointe. La principale force de ce travail réside dans la mise en place du démonstrateur qui nous a permis d’obtenir des algorithmes efficaces dans le domaine de la previz on-set
Previz on-set is a preview step that takes place directly during the shootingphase of a film with special effects. The aim of previz on-set is to show to the film director anassembled view of the final plan in realtime. The work presented in this thesis focuses on aspecific step of the previz : the compositing. This step consists in mixing multiple images tocompose a single and coherent one. In our case, it is to mix computer graphics with an imagefrom the main camera. The objective of this thesis is to propose a system for automaticadjustment of the compositing. The method requires the measurement of the geometry ofthe scene filmed. For this reason, a depth sensor is added to the main camera. The data issent to the computer that executes an algorithm to merge data from depth sensor and themain camera. Through a hardware demonstrator, we formalized an integrated solution in avideo game engine. The experiments gives encouraging results for compositing in real time.Improved results were observed with the introduction of a joint segmentation method usingdepth and color information. The main strength of this work lies in the development of ademonstrator that allowed us to obtain effective algorithms in the field of previz on-set

2

Mozaffari, Maaref Mohammad Hamed. "A Real-Time and Automatic Ultrasound-Enhanced Multimodal Second Language Training System: A Deep Learning Approach." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/40477.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The critical role of language pronunciation in communicative competence is significant, especially for second language learners. Despite renewed awareness of the importance of articulation, it remains a challenge for instructors to handle the pronunciation needs of language learners. There are relatively scarce pedagogical tools for pronunciation teaching and learning, such as inefficient, traditional pronunciation instructions like listening and repeating. Recently, electronic visual feedback (EVF) systems (e.g., medical ultrasound imaging) have been exploited in new approaches in such a way that they could be effectively incorporated in a range of teaching and learning contexts. Evaluation of ultrasound-enhanced methods for pronunciation training, such as multimodal methods, has asserted that visualizing articulator’s system as biofeedback to language learners might improve the efficiency of articulation learning. Despite the recent successful usage of multimodal techniques for pronunciation training, manual works and human manipulation are inevitable in many stages of those systems. Furthermore, recognizing tongue shape in noisy and low-contrast ultrasound images is a challenging job, especially for non-expert users in real-time applications. On the other hand, our user study revealed that users could not perceive the placement of their tongue inside the mouth comfortably just by watching pre-recorded videos. Machine learning is a subset of Artificial Intelligence (AI), where machines can learn by experiencing and acquiring skills without human involvement. Inspired by the functionality of the human brain, deep artificial neural networks learn from large amounts of data to perform a task repeatedly. Deep learning-based methods in many computer vision tasks have emerged as the dominant paradigm in recent years. Deep learning methods are powerful in automatic learning of a new job, while unlike traditional image processing methods, they are capable of dealing with many challenges such as object occlusion, transformation variant, and background artifacts. In this dissertation, we implemented a guided language pronunciation training system, benefits from the strengths of deep learning techniques. Our modular system attempts to provide a fully automatic and real-time language pronunciation training tool using ultrasound-enhanced augmented reality. Qualitatively and quantitatively assessments indicate an exceptional performance for our system in terms of flexibility, generalization, robustness, and autonomy outperformed previous techniques. Using our ultrasound-enhanced system, a language learner can observe her/his tongue movements during real-time speech, superimposed on her/his face automatically.

3

ur, Réhman Shafiq. "Expressing emotions through vibration for perception and control." Doctoral thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-32990.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This thesis addresses a challenging problem: “how to let the visually impaired ‘see’ others emotions”. We, human beings, are heavily dependent on facial expressions to express ourselves. A smile shows that the person you are talking to is pleased, amused, relieved etc. People use emotional information from facial expressions to switch between conversation topics and to determine attitudes of individuals. Missing emotional information from facial expressions and head gestures makes the visually impaired extremely difficult to interact with others in social events. To enhance the visually impaired’s social interactive ability, in this thesis we have been working on the scientific topic of ‘expressing human emotions through vibrotactile patterns’. It is quite challenging to deliver human emotions through touch since our touch channel is very limited. We first investigated how to render emotions through a vibrator. We developed a real time “lipless” tracking system to extract dynamic emotions from the mouth and employed mobile phones as a platform for the visually impaired to perceive primary emotion types. Later on, we extended the system to render more general dynamic media signals: for example, render live football games through vibration in the mobile for improving mobile user communication and entertainment experience. To display more natural emotions (i.e. emotion type plus emotion intensity), we developed the technology to enable the visually impaired to directly interpret human emotions. This was achieved by use of machine vision techniques and vibrotactile display. The display is comprised of a ‘vibration actuators matrix’ mounted on the back of a chair and the actuators are sequentially activated to provide dynamic emotional information. The research focus has been on finding a global, analytical, and semantic representation for facial expressions to replace state of the art facial action coding systems (FACS) approach. We proposed to use the manifold of facial expressions to characterize dynamic emotions. The basic emotional expressions with increasing intensity become curves on the manifold extended from the center. The blends of emotions lie between those curves, which could be defined analytically by the positions of the main curves. The manifold is the “Braille Code” of emotions. The developed methodology and technology has been extended for building assistive wheelchair systems to aid a specific group of disabled people, cerebral palsy or stroke patients (i.e. lacking fine motor control skills), who don’t have ability to access and control the wheelchair with conventional means, such as joystick or chin stick. The solution is to extract the manifold of the head or the tongue gestures for controlling the wheelchair. The manifold is rendered by a 2D vibration array to provide user of the wheelchair with action information from gestures and system status information, which is very important in enhancing usability of such an assistive system. Current research work not only provides a foundation stone for vibrotactile rendering system based on object localization but also a concrete step to a new dimension of human-machine interaction.
Taktil Video

4

Khalidov, Vasil. "Modèles de mélanges conjugués pour la modélisation de la perception visuelle et auditive." Grenoble, 2010. http://www.theses.fr/2010GRENM064.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Dans cette thèse, nous nous intéressons à la modélisation de la perception audio-visuelle avec une tête robotique. Les problèmes associés, notamment la calibration audio-visuelle, la détection, la localisation et le suivi d'objets audio-visuels sont étudiés. Une approche spatio-temporelle de calibration d'une tête robotique est proposée, basée sur une mise en correspondance probabiliste multimodale des trajectoires. Le formalisme de modèles de mélange conjugué est introduit ainsi qu'une famille d'algorithmes d'optimisation efficaces pour effectuer le regroupement multimodal. Un cas particulier de cette famille d'algorithmes, notamment l'algorithme EM conjugue, est amélioré pour obtenir des propriétés théoriques intéressantes. Des méthodes de détection d'objets multimodaux et d'estimation du nombre d'objets sont développées et leurs propriétés théoriques sont étudiées. Enfin, la méthode de regroupement multimodal proposée est combinée avec des stratégies de détection et d'estimation du nombre d'objets ainsi qu'avec des techniques de suivi pour effectuer le suivi multimodal de plusieurs objets. La performance des méthodes est démontrée sur des données simulées et réelles issues d'une base de données de scénarios audio-visuels réalistes (base de données CAVA)
In this thesis, the modelling of audio-visual perception with a head-like device is considered. The related problems, namely audio-visual calibration, audio-visual object detection, localization and tracking are addressed. A spatio-temporal approach to the head-like device calibration is proposed based on probabilistic multimodal trajectory matching. The formalism of conjugate mixture models is introduced along with a family of efficient optimization algorithms to perform multimodal clustering. One instance of this algorithm family, namely the conjugate expectation maximization (ConjEM) algorithm is further improved to gain attractive theoretical properties. The multimodal object detection and object number estimation methods are developed, their theoretical properties are discussed. Finally, the proposed multimodal clustering method is combined with the object detection and object number estimation strategies and known tracking techniques to perform multimodal multiobject tracking. The performance is demonstrated on simulated data and the database of realistic audio-visual scenarios (CAVA database)

5

Rodríguez, Florez Sergio Alberto. "Contributions by vision systems to multi-sensor object localization and tracking for intelligent vehicles." Compiègne, 2010. http://www.theses.fr/2010COMP1910.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Les systèmes d’aide à la conduite peuvent améliorer la sécurité routière en aidant les utilisateurs via des avertissements de situations dangereuses ou en déclenchant des actions appropriées en cas de collision imminente (airbags, freinage d’urgence, etc). Dans ce cas, la connaissance de la position et de la vitesse des objets mobiles alentours constitue une information clé. C’est pourquoi, dans ce travail, nous nous focalisons sur la détection et le suivi d’objets dans une scène dynamique. En remarquant que les systèmes multi-caméras sont de plus en plus présents dans les véhicules et en sachant que le lidar est performant pour la détection d’obstacles, nous nous intéressons à l’apport de la vision stéréoscopique dans la perception géométrique multimodale de l’environnement. Afin de fusionner les informations géométriques entre le lidar et le système de vision, nous avons développé un procédé de calibrage qui détermine les paramètres extrinsèques et évalue les incertitudes sur ces estimations. Nous proposons ensuite une méthode d’odométrie visuelle temps-réel permettant d’estimer le mouvement propre du véhicule afin de simplifier l’analyse du mouvement des objets dynamiques. Dans un second temps, nous montrons comment l’intégrité de la détection et du suivi des objets par lidar peut être améliorée en utilisant une méthode de confirmation visuelle qui procède par reconstruction dense de l’environnement 3D. Pour finir, le système de perception multimodal a été intégré sur une plateforme automobile, ce qui a permis de tester expérimentalement les différentes approches proposées dans des situations routières en environnement non contrôlé
Advanced Driver Assistance Systems (ADAS) can improve road safety by supporting the driver through warnings in hazardous circumstances or triggering appropriate actions when facing imminent collision situations (e. G. Airbags, emergency brake systems, etc). In this context, the knowledge of the location and the speed of the surrounding mobile objects constitute a key information. Consequently, in this work, we focus on object detection, localization and tracking in dynamic scenes. Noticing the increasing presence of embedded multi-camera systems on vehicles and recognizing the effectiveness of lidar automotive systems to detect obstacles, we investigate stereo vision systems contributions to multi-modal perception of the environment geometry. In order to fuse geometrical information between lidar and vision system, we propose a calibration process which determines the extrinsic parameters between the exteroceptive sensors and quantifies the uncertainties of this estimation. We present a real-time visual odometry method which estimates the vehicle ego-motion and simplifies dynamic object motion analysis. Then, the integrity of the lidar-based object detection and tracking is increased by the means of a visual confirmation method that exploits stereo-vision 3D dense reconstruction in focused areas. Finally, a complete full scale automotive system integrating the considered perception modalities was implemented and tested experimentally in open road situations with an experimental car

6

Sattarov, Egor. "Etude et quantification de la contribution des systèmes de perception multimodale assistés par des informations de contexte pour la détection et le suivi d'objets dynamiques." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS354.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Cette thèse a pour but d'étudier et de quantifier la contribution de la perception multimodale assistée par le contexte pour détecter et suivre des objets en mouvement. Cette étude sera appliquée à la détection et la reconnaissance des objets pertinents dans les environnements de la circulation pour les véhicules intelligents (VI). Les résultats à obtenir devront permettre de transposer le concept proposé à un ensemble plus large de capteurs et de classes d'objets en utilisant une approche système intégrative qui implique des méthodes d'apprentissage. En particulier, ces méthodes d'apprentissage vont examiner comment l'implantation dans un système intégré, qui prévoie une multitude des sources de données différentes, peut conduire à apprendre 1) sans ou avec une supervision limitée, réduite en exploitant des corrélations 2) de façon incrémentale à la connaissance stockée au lieu de faire un entraînement complet à chaque fois qu’une nouvelle donnée arrive 3) collectivement à chaque instant d'apprentissage dans le système entraîné d'une manière qui assure approximativement une fusion optimale. Concrètement, le couplage fort entre les classifier des objets en modalités multiples aussi bien que l'extraction du contexte de la géométrie de la scène sont à étudier: d'abord en théorie, après en application du trafic routier. La nouveauté de l'approche d'intégration envisagée se pose dans le couplage fort entre les composants du système, tels que la segmentation, le suivi des objets, l'estimation de la géométrie de la scène et la catégorisation des objets basée sur la stratégie de l'inférence probabiliste. Une telle stratégie caractérise des systèmes où toutes les composants de perception émettent et reçoivent les distributions des résultats possibles avec leur score de croyance probabiliste attribué. De cette façon, chaque composant de traitement peut prendre en compte les résultats des autres composants au niveau plus bas par rapport aux combinaisons des résultats finaux. Cela diminue beaucoup le temps et les ressources pour le calcul, quand les techniques de l'application de l'inférence Bayésienne garantissent que les données d'entrée peu plausible n'apportent pas des impacts négatifs
This thesis project will investigate and quantify the contribution of context-aided multimodal perception for detecting and tracking moving objects. This research study will be applied to the detection and recognition ofrelevant objects in road traffic environments for Intelligent Vehicles (IV). The results to be obtained will allow us to transpose the proposed concept to a wide range of state-of-the-art sensors and object classes by means of an integrative system approach involving learning methods. In particular, such learning methods will investigate how the embedding into an embodied system providing a multitude of different data sources, can be harnessed to learn 1) without, or with reduced, explicit supervision by exploiting correlations 2) incrementally, by adding to existing knowledge instead of complete retraining every time new data arrive 3) collectively, each learning instance in the system being trained in a way that ensures approximately optimal fusion. Concretely, a tight coupling between object classifiers in multiple modalities as well as geometric scene context extraction will be studied, first in theory, then in the context of road traffic. The novelty of the envisioned integration approach lies in the tight coupling between system components such as object segmentation, object tracking, scene geometry estimation and object categorization based on a probabilistic inference strategy. Such a strategy characterizes systems where all perception components broadcast and receive distributions of multiple possible results together with a probabilistic belief score. In this way, each processing component can take into account the results of other components at a much earlier stage (as compared to just combining final results), thus hugely increasing its computation power, while the application of Bayesian inference techniques will ensure that implausible inputs do not cause negative effects

7

Duarte, Diogo Ferreira. "The Multi-Object Tracking with Multimodal Information for Autonomous Surface Vehicles." Master's thesis, 2022. https://hdl.handle.net/10216/140667.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Multimodal object tracking":

1

Landabaso, José Luis, and Montse Pardàs. "Foreground Regions Extraction and Characterization Towards Real-Time Object Tracking." In Machine Learning for Multimodal Interaction, 241–49. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11677482_21.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

"Software for Automatic Gaze and Face/Object Tracking and its Use for Early Diagnosis of Autism Spectrum Disorders." In Multimodal Interactive Systems Management, 147–62. EPFL Press, 2014. http://dx.doi.org/10.1201/b15535-14.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Diao, Qian, Jianye Lu, Wei Hu, Yimin Zhang, and Gary Bradski. "DBN Models for Visual Tracking and Prediction." In Bayesian Network Technologies, 176–93. IGI Global, 2007. http://dx.doi.org/10.4018/978-1-59904-141-4.ch009.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In a visual tracking task, the object may exhibit rich dynamic behavior in complex environments that can corrupt target observations via background clutter and occlusion. Such dynamics and background induce nonlinear, nonGaussian and multimodal observation densities. These densities are difficult to model with traditional methods such as Kalman filter models (KFMs) due to their Gaussian assumptions. Dynamic Bayesian networks (DBNs) provide a more general framework in which to solve these problems. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. Under the DBN umbrella, a broad class of learning and inference algorithms for time-series models can be used in visual tracking. Furthermore, DBNs provide a natural way to combine multiple vision cues. In this chapter, we describe some DBN models for tracking in nonlinear, nonGaussian and multimodal situations, and present a prediction method to assist feature extraction part by making a hypothesis for the new observations.

4

Tung, Tony, and Takashi Matsuyama. "Visual Tracking Using Multimodal Particle Filter." In Computer Vision, 1072–90. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-5204-8.ch044.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Visual tracking of humans or objects in motion is a challenging problem when observed data undergo appearance changes (e.g., due to illumination variations, occlusion, cluttered background, etc.). Moreover, tracking systems are usually initialized with predefined target templates, or trained beforehand using known datasets. Hence, they are not always efficient to detect and track objects whose appearance changes over time. In this paper, we propose a multimodal framework based on particle filtering for visual tracking of objects under challenging conditions (e.g., tracking various human body parts from multiple views). Particularly, the authors integrate various cues such as color, motion and depth in a global formulation. The Earth Mover distance is used to compare color models in a global fashion, and constraints on motion flow features prevent common drifting effects due to error propagation. In addition, the model features an online mechanism that adaptively updates a subspace of multimodal templates to cope with appearance changes. Furthermore, the proposed model is integrated in a practical detection and tracking process, and multiple instances can run in real-time. Experimental results are obtained on challenging real-world videos with poorly textured models and arbitrary non-linear motions.

Тези доповідей конференцій з теми "Multimodal object tracking":

1

Muresan, Mircea Paul, and Sergiu Nedevschi. "Multimodal sparse LIDAR object tracking in clutter." In 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP). IEEE, 2018. http://dx.doi.org/10.1109/iccp.2018.8516646.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Morrison, Katelyn, Daniel Yates, Maya Roman, and William W. Clark. "Using Object Tracking Techniques to Non-Invasively Measure Thoracic Rotation Range of Motion." In ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3395035.3425189.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Vyawahare, Vikram S., and Richard T. Stone. "Asymmetric Interface and Interactions for Bimanual Virtual Assembly With Haptics." In ASME 2012 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2012. http://dx.doi.org/10.1115/detc2012-71543.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This paper discusses development of a new bimanual interface configuration for virtual assembly consisting of a haptic device at one hand and a 6DOF tracking device at the other hand. The two devices form a multimodal interaction configuration facilitating unique interactions for virtual assembly. Tasks for virtual assembly can consist of both “one hand one object” and “bimanual single object” interactions. For one hand one object interactions this device configuration offers advantages in terms of increased manipulation workspace and provides a tradeoff between the cost effectiveness and mode of feedback. For bimanual single object manipulation an interaction method developed using this device configuration improves the realism and facilitates variation in precision of task of bimanual single object orientation. Furthermore another interaction method to expand the haptic device workspace using this configuration is introduced. The applicability of both these methods to the field of virtual assembly is discussed.

4

Valverde, Francisco Rivera, Juana Valeria Hurtado, and Abhinav Valada. "There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge." In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021. http://dx.doi.org/10.1109/cvpr46437.2021.01144.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Moraffah, Bahman, Cesar Brito, Bindya Venkatesh, and Antonia Papandreou-Suppappola. "Tracking Multiple Objects with Multimodal Dependent Measurements: Bayesian Nonparametric Modeling." In 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 2019. http://dx.doi.org/10.1109/ieeeconf44664.2019.9048817.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Smirnova, Yana, Aleksandr Mudruk, and Anna Makashova. "Lack of joint attention in preschoolers with different forms of atypical development." In Safety psychology and psychological safety: problems of interaction between theorists and practitioners. «Publishing company «World of science», LLC, 2020. http://dx.doi.org/10.15862/53mnnpk20-29.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The article analyzes the problem of the deficit of the mechanism of joint attention, which affects the formation of the child's ability to separate intentions as a social foundation for the processes of mastering cognitive functions, using speech and learning. The study is devoted to a comparative analysis of the picture of atypical joint attention in a sample of children with different forms of developmental disabilities. To understand the normative and deficient manifestations of joint attention, a comparative study of a sample of typically developing preschool children with groups of children with atypical development was carried out. The aim of the study was to highlight the manifestation of a deficit in joint attention, which prevents involvement in dyadic (bilateral) interactions with an adult, which are necessary for the comprehensive development and learning of a child. Methodology. In an experimental situation of real interaction of a child with an adult and with the help of an eye tracker, it was possible to fix eye movements as a marker of joint attention in real time. The specificity of the functional organization of oculomotor activity as an indicator of the child's participation in joint attention is highlighted. Results and its discussion. Methods of tracking eye movements made it possible to analyze critical shifts of attention, changes in focus of attention, gaze shifting, eye recognition as an informative sign and perception of the partner's gaze direction as a necessary condition for the effective establishment of an episode of joint attention. Conclusions. The following were recorded as diagnostic markers of joint attention disorders in preschoolers with different forms of atypical development: difficulties in following the direction of an adult's gaze; anticipatory actions of the child or decision-making by the method of "guessing" / "trial and error"; the predominance of the orientation of the child's attention to the object, and not to the adult; dispersion of fixations of visual attention; the use of additional multimodal means of establishing joint attention (head turn, gestures, speech, etc.); decrease in the accuracy of fixing visual attention.

7

Catalán, José M., Jorge A. Díez, Arturo Bertomeu-Motos, Francisco J. Badesa, Rafael Puerto, José M. Sabater, and Nicolás García-Aracil. "Arquitectura de control multimodal para robótica asistencial." In Actas de las XXXVII Jornadas de Automática 7, 8 y 9 de septiembre de 2016, Madrid. Universidade da Coruña, Servizo de Publicacións, 2022. http://dx.doi.org/10.17979/spudc.9788497498081.1089.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Este documento presenta una arquitectura de control multimodal para robótica de asistencia, la cual trata de tener en cuenta las decisiones del usuario para mejorar en el desempeño de las tareas al mismo tiempo que se implementa un método para minimizar posibles errores en el manejo del robot mediante un control visual. A través de la información proporcionada por el sistema de eyetracking, el usuario será capaz de interactuar con el sistema para seleccionar el objeto deseado, indicar la intención de cogerlo o incluso abortar la ejecución. El sistema incorpora un sistema de tracking 3D para conocer la ubicación de los objetos con respecto al manipulador robótico. Este sistema nos sirve tanto para definir la posición que debe alcanzar el robot, como para corregir las posibles desviaciones durante la ejecución de la trayectoria.