Log in

Relevant bibliographies by topics / Multimodal object tracking / Journal articles

To see the other types of publications on this topic, follow the link: Multimodal object tracking.

Journal articles on the topic 'Multimodal object tracking'

Author: Grafiati

Published: 2 November 2022

Last updated: 27 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multimodal object tracking.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Li, Kai, Lihua Cai, Guangjian He, and Xun Gong. "MATI: Multimodal Adaptive Tracking Integrator for Robust Visual Object Tracking." Sensors 24, no. 15 (2024): 4911. http://dx.doi.org/10.3390/s24154911.

Full text

Abstract:

Visual object tracking, pivotal for applications like earth observation and environmental monitoring, encounters challenges under adverse conditions such as low light and complex backgrounds. Traditional tracking technologies often falter, especially when tracking dynamic objects like aircraft amidst rapid movements and environmental disturbances. This study introduces an innovative adaptive multimodal image object-tracking model that harnesses the capabilities of multispectral image sensors, combining infrared and visible light imagery to significantly enhance tracking accuracy and robustness

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Kunpeng, Yanheng Liu, Fang Mei, Jingyi Jin, and Yiming Wang. "Boost Correlation Features with 3D-MiIoU-Based Camera-LiDAR Fusion for MODT in Autonomous Driving." Remote Sensing 15, no. 4 (2023): 874. http://dx.doi.org/10.3390/rs15040874.

Full text

Abstract:

Three-dimensional (3D) object tracking is critical in 3D computer vision. It has applications in autonomous driving, robotics, and human–computer interaction. However, methods for using multimodal information among objects to increase multi-object detection and tracking (MOT) accuracy remain a critical focus of research. Therefore, we present a multimodal MOT framework for autonomous driving boost correlation multi-object detection and tracking (BcMODT) in this research study to provide more trustworthy features and correlation scores for real-time detection tracking using both camera and LiDA

APA, Harvard, Vancouver, ISO, and other styles

3

Zhang, Liwei, Jiahong Lai, Zenghui Zhang, Zhen Deng, Bingwei He, and Yucheng He. "Multimodal Multiobject Tracking by Fusing Deep Appearance Features and Motion Information." Complexity 2020 (September 25, 2020): 1–10. http://dx.doi.org/10.1155/2020/8810340.

Full text

Abstract:

Multiobject Tracking (MOT) is one of the most important abilities of autonomous driving systems. However, most of the existing MOT methods only use a single sensor, such as a camera, which has the problem of insufficient reliability. In this paper, we propose a novel Multiobject Tracking method by fusing deep appearance features and motion information of objects. In this method, the locations of objects are first determined based on a 2D object detector and a 3D object detector. We use the Nonmaximum Suppression (NMS) algorithm to combine the detection results of the two detectors to ensure th

APA, Harvard, Vancouver, ISO, and other styles

4

Hu, Xiantao, Ying Tai, Xu Zhao, et al. "Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 4 (2025): 3581–89. https://doi.org/10.1609/aaai.v39i4.32372.

Full text

Abstract:

Multimodal tracking has garnered widespread attention as a result of its ability to effectively address the inherent limitations of traditional RGB tracking. However, existing multimodal trackers mainly focus on the fusion and enhancement of spatial features or merely leverage the sparse temporal relationships between video frames. These approaches do not fully exploit the temporal correlations in multimodal videos, making it difficult to capture the dynamic changes and motion information of targets in complex scenarios. To alleviate this problem, we propose a unified multimodal spatial-tempor

APA, Harvard, Vancouver, ISO, and other styles

5

Ye, Ping, Gang Xiao, and Jun Liu. "Multimodal Features Alignment for Vision–Language Object Tracking." Remote Sensing 16, no. 7 (2024): 1168. http://dx.doi.org/10.3390/rs16071168.

Full text

Abstract:

Vision–language tracking presents a crucial challenge in multimodal object tracking. Integrating language features and visual features can enhance target localization and improve the stability and accuracy of the tracking process. However, most existing fusion models in vision–language trackers simply concatenate visual and linguistic features without considering their semantic relationships. Such methods fail to distinguish the target’s appearance features from the background, particularly when the target changes dramatically. To address these limitations, we introduce an innovative technique

APA, Harvard, Vancouver, ISO, and other styles

6

Yao, Rui, Jiazhu Qiu, Yong Zhou, et al. "Visible and Infrared Object Tracking Based on Multimodal Hierarchical Relationship Modeling." Image Analysis and Stereology 43, no. 1 (2024): 41–51. http://dx.doi.org/10.5566/ias.3124.

Full text

Abstract:

Visible RGB and Thermal infrared (RGBT) object tracking has emerged as a prominent area of focus within the realm of computer vision. Nevertheless, the majority of existing RGBT tracking methods, which predominantly rely on Transformers, primarily emphasize the enhancement of features extracted by convolutional neural networks. Unfortunately, the latent potential of Transformers in representation learning has been inadequately explored. Furthermore, most studies tend to overlook the significance of distinguishing between the importance of each modality in the context of multimodal tasks. In th

APA, Harvard, Vancouver, ISO, and other styles

7

Cao, Bing, Junliang Guo, Pengfei Zhu, and Qinghua Hu. "Bi-directional Adapter for Multimodal Tracking." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 2 (2024): 927–35. http://dx.doi.org/10.1609/aaai.v38i2.27852.

Full text

Abstract:

Due to the rapid development of computer vision, single-modal (RGB) object tracking has made significant progress in recent years. Considering the limitation of single imaging sensor, multi-modal images (RGB, infrared, etc.) are introduced to compensate for this deficiency for all-weather object tracking in complex environments. However, as acquiring sufficient multi-modal tracking data is hard while the dominant modality changes with the open environment, most existing techniques fail to extract multi-modal complementary information dynamically, yielding unsatisfactory tracking performance. T

APA, Harvard, Vancouver, ISO, and other styles

8

Fu, Teng, Haiyang Yu, Ke Niu, Bin Li, and Xiangyang Xue. "Foundation Model Driven Appearance Extraction for Robust Multiple Object Tracking." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 3 (2025): 3031–39. https://doi.org/10.1609/aaai.v39i3.32311.

Full text

Abstract:

Multiple Object Tracking (MOT) is a fundamental task in computer vision. Existing methods utilize motion information or appearance information to perform object tracking. However, these algorithms still struggle with special circumstances, such as occlusion and blurring in complex scenes. Inspired by the fact that people can pinpoint objects through verbal descriptions, we explore performing long-term robust tracking using semantic features of objects. Motivated by the success of the multimodal foundation model in text-image alignment, we reconsider the appearance feature extraction module in

APA, Harvard, Vancouver, ISO, and other styles

9

Jang, Eunseong, Sang Jun Lee, and HyungGi Jo. "A New Multimodal Map Building Method Using Multiple Object Tracking and Gaussian Process Regression." Remote Sensing 16, no. 14 (2024): 2622. http://dx.doi.org/10.3390/rs16142622.

Full text

Abstract:

Recent advancements in simultaneous localization and mapping (SLAM) have significantly improved the handling of dynamic objects. Traditionally, SLAM systems mitigate the impact of dynamic objects by extracting, matching, and tracking features. However, in real-world scenarios, dynamic object information critically influences decision-making processes in autonomous navigation. To address this, we present a novel approach for incorporating dynamic object information into map representations, providing valuable insights for understanding movement context and estimating collision risks. Our method

APA, Harvard, Vancouver, ISO, and other styles

10

Kota, John S., and Antonia Papandreou-Suppappola. "Joint Design of Transmit Waveforms for Object Tracking in Coexisting Multimodal Sensing Systems." Sensors 19, no. 8 (2019): 1753. http://dx.doi.org/10.3390/s19081753.

Full text

Abstract:

We examine a multiple object tracking problem by jointly optimizing the transmit waveforms used in a multimodal system. Coexisting sensors in this system were assumed to share the same spectrum. Depending on the application, a system can include radars tracking multiple targets or multiuser wireless communications and a radar tracking both multiple messages and a target. The proposed spectral coexistence approach was based on designing all transmit waveforms to have the same time-varying phase function while optimizing desirable performance metrics. Considering the scenario of tracking a targe

APA, Harvard, Vancouver, ISO, and other styles

11

Bayraktar, Ertugrul. "ReTrackVLM: Transformer-Enhanced Multi-Object Tracking with Cross-Modal Embeddings and Zero-Shot Re-Identification Integration." Applied Sciences 15, no. 4 (2025): 1907. https://doi.org/10.3390/app15041907.

Full text

Abstract:

Multi-object tracking (MOT) is an important task in computer vision, particularly in complex, dynamic environments with crowded scenes and frequent occlusions. Traditional tracking methods often suffer from identity switches (IDSws) and fragmented tracks (FMs), which limits their ability to maintain consistent object trajectories. In this paper, we present a novel framework, called ReTrackVLM, that integrates multimodal embedding from a visual language model (VLM) with a zero-shot re-identification (ReID) module to enhance tracking accuracy and robustness. ReTrackVLM leverages the rich semanti

APA, Harvard, Vancouver, ISO, and other styles

12

Chen, Ning, Shaopeng Wu, Yupeng Chen, Zhanghua Wang, and Ziqian Zhang. "A Pose Estimation Algorithm for Multimodal Data Fusion." Traitement du Signal 39, no. 6 (2022): 1971–79. http://dx.doi.org/10.18280/ts.390609.

Full text

Abstract:

In response to the problem that the previous pose detection systems are not effective under conditions such as severe occlusion or uneven illumination, this paper focuses on the multimodal information fusion pose estimation problem. The main work is to design a multimodal data fusion pose estimation algorithm for the problem of pose estimation in complex scenes such as low-texture targets and poor lighting conditions. The network takes images and point clouds as input and extracts local color and spatial features of the target object using the improved DenseNet and PointNet++ networks, which a

APA, Harvard, Vancouver, ISO, and other styles

13

Dai, Jingyi. "Advancements in deep learning for visual object tracking." Applied and Computational Engineering 82, no. 1 (2024): 130–36. http://dx.doi.org/10.54254/2755-2721/82/20240997.

Full text

Abstract:

Abstract. Since contemporary information-retrieval systems rely heavily on the content of titles and abstracts to identify relevant articles in literature searches, great care should be taken in constructing both. This comprehensive review delves into the transformative impact of deep learning on the domain of visual object tracking. Since the inception of AlexNet in 2012, deep learning has revolutionized feature extraction, leading to significant advancements in tracking accuracy and robustness. The article explores the integration of deep learning with various tracking algorithms, including

APA, Harvard, Vancouver, ISO, and other styles

14

Muresan, Mircea Paul, Ion Giosan, and Sergiu Nedevschi. "Stabilization and Validation of 3D Object Position Using Multimodal Sensor Fusion and Semantic Segmentation." Sensors 20, no. 4 (2020): 1110. http://dx.doi.org/10.3390/s20041110.

Full text

Abstract:

The stabilization and validation process of the measured position of objects is an important step for high-level perception functions and for the correct processing of sensory data. The goal of this process is to detect and handle inconsistencies between different sensor measurements, which result from the perception system. The aggregation of the detections from different sensors consists in the combination of the sensorial data in one common reference frame for each identified object, leading to the creation of a super-sensor. The result of the data aggregation may end up with errors such as

APA, Harvard, Vancouver, ISO, and other styles

15

Zhu, Ziming, Jiahao Nie, Han Wu, Zhiwei He, and Mingyu Gao. "MSA-MOT: Multi-Stage Association for 3D Multimodality Multi-Object Tracking." Sensors 22, no. 22 (2022): 8650. http://dx.doi.org/10.3390/s22228650.

Full text

Abstract:

Three-dimensional multimodality multi-object tracking has attracted great attention due to the use of complementary information. However, such a framework generally adopts a one-stage association approach, which fails to perform precise matching between detections and tracklets, and, thus, cannot robustly track objects in complex scenes. To address this matching problem caused by one-stage association, we propose a novel multi-stage association method, which consists of a hierarchical matching module and a customized track management module. Specifically, the hierarchical matching module defin

APA, Harvard, Vancouver, ISO, and other styles

16

Motlicek, Petr, Stefan Duffner, Danil Korchagin, et al. "Real-Time Audio-Visual Analysis for Multiperson Videoconferencing." Advances in Multimedia 2013 (2013): 1–21. http://dx.doi.org/10.1155/2013/175745.

Full text

Abstract:

We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of

APA, Harvard, Vancouver, ISO, and other styles

17

Zhang, Lian, Lingxue Wang, Yuzhen Wu, Mingkun Chen, Dezhi Zheng, and Yi Cai. "ACNTrack: Agent cross-attention guided Multimodal Multi-Object Tracking with Neural Kalman Filter." Neurocomputing 650 (October 2025): 130811. https://doi.org/10.1016/j.neucom.2025.130811.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Yin, Shoulin, Qunming Wang, Liguo Wang, Mirjana Ivanovic, and Hang Li. "Multimodal deep learning-based feature fusion for object detection in remote sensing images." Computer Science and Information Systems, no. 00 (2025): 11. https://doi.org/10.2298/csis241110011y.

Full text

Abstract:

Object detection is an important computer vision task, which is developed from image classification task. The difference is that it is no longer only to classify a single type of object in an image, but to complete the classification and positioning of multiple objects that may exist in an image at the same time. Classification refers to assigning category labels to the object, and positioning refers to determining the vertex coordinates of the peripheral rectangular box of the object. Therefore, object detection is more challenging and has broader application prospects, such as automatic driv

APA, Harvard, Vancouver, ISO, and other styles

19

Monir, Islam A., Mohamed W. Fakhr, and Nashwa El-Bendary. "Multimodal deep learning model for human handover classification." Bulletin of Electrical Engineering and Informatics 11, no. 2 (2022): 974–85. http://dx.doi.org/10.11591/eei.v11i2.3690.

Full text

Abstract:

Giving and receiving objects between humans and robots is a critical task which collaborative robots must be able to do. In order for robots to achieve that, they must be able to classify different types of human handover motions. Previous works did not mainly focus on classifying the motion type from both giver and receiver perspectives. However, they solely focused on object grasping, handover detection, and handover classification from one side only (giver/receiver). This paper discusses the design and implementation of different deep learning architectures with long short term memory (LSTM

APA, Harvard, Vancouver, ISO, and other styles

20

Islam, A. Monir, W. Fakhr Mohamed, and El-Bendary Nashwa. "Multimodal deep learning model for human handover classification." Bulletin of Electrical Engineering and Informatics 11, no. 2 (2022): 974~985. https://doi.org/10.11591/eei.v11i2.3690.

Full text

Abstract:

Giving and receiving objects between humans and robots is a critical task which collaborative robots must be able to do. In order for robots to achieve that, they must be able to classify different types of human handover motions. Previous works did not mainly focus on classifying the motion type from both giver and receiver perspectives. However, they solely focused on object grasping, handover detection, and handover classification from one side only (giver/receiver). This paper discusses the design and implementation of different deep learning architectures with long short term memory (LSTM

APA, Harvard, Vancouver, ISO, and other styles

21

Majcher, Mateusz, and Bogdan Kwolek. "Object pose tracking using multimodal knowledge from RGB images and quaternion-based rotation contexts." Applied Soft Computing 170 (February 2025): 112699. https://doi.org/10.1016/j.asoc.2025.112699.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Mikhalev, Anton, Nikolai Guliutin, Nadezhda Ermienko, and Oleslav Antamoshkin. "Autonomous on-board object and phenomenon detection system." ITM Web of Conferences 72 (2025): 03010. https://doi.org/10.1051/itmconf/20257203010.

Full text

Abstract:

This paper presents the design, implementation, and evaluation of an autonomous on-board object and phenomenon detection system optimized for real-time performance and resource-constrained environments. The proposed framework integrates a multimodal sensor array, including RGB cameras and LiDAR, with lightweight deep learning algorithms for object detection, tracking, and classification. Four state-of-the-art detection models - YOLO, DETR, CenterNet, and M2Det - were examined using the Lacmus Drone Dataset, a publicly available collection of over 3,000 aerial images. Experimental results highl

APA, Harvard, Vancouver, ISO, and other styles

23

Zuo, Yunpeng, and Yunwei Zhang. "A Lightweight Framework for Audio-Visual Segmentation with an Audio-Guided Space–Time Memory Network." Applied Sciences 15, no. 12 (2025): 6585. https://doi.org/10.3390/app15126585.

Full text

Abstract:

As a multimodal fusion task, audio-visual segmentation (AVS) aims to locate sounding objects at the pixel level within a given image. This capability holds significant importance and practical value in applications such as intelligent surveillance, multimedia content analysis, and human–robot interaction. However, existing AVS models typically feature complex architectures, require a large number of parameters, and are challenging to deploy on embedded platforms. Furthermore, these models often lack integration with object tracking mechanisms and fail to address the issue of the mis-segmentati

APA, Harvard, Vancouver, ISO, and other styles

24

Krishtopik, A. S., and D. A. Yudin. "MONITORING OF DYNAMIC OBJECTS ON A 2D OCCUPANCY MAP USING NEURAL NETWORKS AND MULTIMODAL DATA." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-2/W3-2023 (May 12, 2023): 137–43. http://dx.doi.org/10.5194/isprs-archives-xlviii-2-w3-2023-137-2023.

Full text

Abstract:

Abstract. The paper deals with the construction of dynamic occupancy maps, where the grid cell can contain not only information about the presence or absence of an obstacle, but also information about its velocity. We propose a multimodal approach to constructing 2D dynamic occupancy maps from LiDAR point clouds and camera images. The approach involves building a static occupancy map from LiDAR data and then adding information about cell velocities based on neural network instance segmentation and object tracking in monocular onboard camera images. Pedestrians and vehicles were considered as d

APA, Harvard, Vancouver, ISO, and other styles

25

Oise, Godfrey Perfectson, Nkem Belinda Unuigbokhai, Chioma Julia Onwuzo, et al. "YOLOv8-DeepSORT: A High-Performance Framework for Real-Time Multi-Object Tracking with Attention and Adaptive Optimization." Journal of Science Research and Reviews 2, no. 2 (2025): 92–100. https://doi.org/10.70882/josrar.2025.v2i2.50.

Full text

Abstract:

The integration of YOLOv8 and DeepSORT has significantly advanced real-time multi-object tracking in computer vision, delivering a robust solution for dynamic video analysis. This study comprehensively evaluates the YOLOv8-DeepSORT pipeline, combining YOLOv8's high-accuracy detection capabilities with DeepSORT's efficient identity association to achieve precise and consistent tracking. Key contributions include domain-specific fine-tuning of YOLOv4, optimization through model pruning and quantization, and seamless integration with DeepSORT's deep appearance descriptors and Kalman filtering. Th

APA, Harvard, Vancouver, ISO, and other styles

26

Lindenheim-Locher, Wojciech, Adam Świtoński, Tomasz Krzeszowski, et al. "YOLOv5 Drone Detection Using Multimodal Data Registered by the Vicon System." Sensors 23, no. 14 (2023): 6396. http://dx.doi.org/10.3390/s23146396.

Full text

Abstract:

This work is focused on the preliminary stage of the 3D drone tracking challenge, namely the precise detection of drones on images obtained from a synchronized multi-camera system. The YOLOv5 deep network with different input resolutions is trained and tested on the basis of real, multimodal data containing synchronized video sequences and precise motion capture data as a ground truth reference. The bounding boxes are determined based on the 3D position and orientation of an asymmetric cross attached to the top of the tracked object with known translation to the object’s center. The arms of th

APA, Harvard, Vancouver, ISO, and other styles

27

Zhang, Qing, and Wei Xiang. "Cross-Modal Image Registration via Rasterized Parameter Prediction for Object Tracking." Applied Sciences 13, no. 9 (2023): 5359. http://dx.doi.org/10.3390/app13095359.

Full text

Abstract:

Object tracking requires heterogeneous images that are well registered in advance, with cross-modal image registration used to transform images of the same scene generated by different sensors into the same coordinate system. Infrared and visible light sensors are the most widely used in environmental perception; however, misaligned pixel coordinates in cross-modal images remain a challenge in practical applications of the object tracking task. Traditional feature-based approaches can only be applied in single-mode scenarios, and cannot be well extended to cross-modal scenarios. Recent deep le

APA, Harvard, Vancouver, ISO, and other styles

28

Shibuya, Masaki, Kengo Ohnishi, and Isamu Kajitani. "Networked Multimodal Sensor Control of Powered 2-DOF Wrist and Hand." Journal of Robotics 2017 (2017): 1–12. http://dx.doi.org/10.1155/2017/7862178.

Full text

Abstract:

A prosthetic limb control system to operate powered 2-DOF wrist and 1-DOF hand with environmental information, myoelectric signal, and forearm posture signal is composed and evaluated. Our concept model on fusing biosignal and environmental information for easier manipulation with upper limb prosthesis is assembled utilizing networking software and prosthetic component interlink platform. The target is to enhance the controllability of the powered wrist’s orientation by processing the information to derive the joint movement in a physiologically appropriate manner. We applied a manipulative sk

APA, Harvard, Vancouver, ISO, and other styles

29

Popp, Christoph, Andreas Serov, Felix Glatzki, et al. "PRORETA 5 – building blocks for automated urban driving enhancing city road safety." at - Automatisierungstechnik 72, no. 4 (2024): 293–307. http://dx.doi.org/10.1515/auto-2023-0092.

Full text

Abstract:

Abstract In the joint research project PRORETA 5, building blocks for automated driving in urban areas have been developed, implemented, and tested. The developed blocks involve an object tracking for cars, bicycles, and pedestrians that feeds a multimodal object prediction which is able to predict the traffic participants’ most likely trajectories. Then, an anytime tree-based planning algorithm calculates the vehicle’s desired path. Finally, logic-based safety functions ensure a collision-free trajectory for the ego vehicle. The mentioned building blocks were integrated and tested in a protot

APA, Harvard, Vancouver, ISO, and other styles

30

Anigala, Omeshamisu, Kwanghee Won, and Chulwoo Pack. "PEARL: Perceptual and Analytical Representation Learning for Video Anomaly Detection." ACM SIGAPP Applied Computing Review 25, no. 1 (2025): 5–15. https://doi.org/10.1145/3727257.3727258.

Full text

Abstract:

Video anomaly detection is crucial for applications like surveillance and autonomous systems. Traditional methods often rely solely on visual cues, missing valuable contextual data. This paper presents Perceptual and Analytical Representation Learning (PEARL), a novel method that combines perceptual (raw sensory input) and analytical (higher-level context) modalities. Specifically, we integrate visual information with object tracking data, along with the tracking data-specialized normalization method, DOT-Norm , leveraging ID switching to capture high-level contexts of abnormal movements. We e

APA, Harvard, Vancouver, ISO, and other styles

31

Ervin, Lauren, Max Eastepp, Mason McVicker, and Kenneth Ricks. "Evaluation of Semantic Segmentation Performance for a Multimodal Roadside Vehicle Detection System on the Edge." Sensors 25, no. 2 (2025): 370. https://doi.org/10.3390/s25020370.

Full text

Abstract:

Discretely monitoring traffic systems and tracking payloads on vehicle targets can be challenging when traversal occurs off main roads where overhead traffic cameras are not present. This work proposes a portable roadside vehicle detection system as part of a solution for tracking traffic along any path. Training semantic segmentation networks to automatically detect specific types of vehicles while ignoring others will allow the user to track payloads present only on certain vehicles of interest, such as train cars or semi-trucks. Different vision sensors offer varying advantages for detectin

APA, Harvard, Vancouver, ISO, and other styles

32

Kandylakis, Zacharias, Konstantinos Vasili, and Konstantinos Karantzalos. "Fusing Multimodal Video Data for Detecting Moving Objects/Targets in Challenging Indoor and Outdoor Scenes." Remote Sensing 11, no. 4 (2019): 446. http://dx.doi.org/10.3390/rs11040446.

Full text

Abstract:

Single sensor systems and standard optical—usually RGB CCTV video cameras—fail to provide adequate observations, or the amount of spectral information required to build rich, expressive, discriminative features for object detection and tracking tasks in challenging outdoor and indoor scenes under various environmental/illumination conditions. Towards this direction, we have designed a multisensor system based on thermal, shortwave infrared, and hyperspectral video sensors and propose a processing pipeline able to perform in real-time object detection tasks despite the huge amount of the concur

APA, Harvard, Vancouver, ISO, and other styles

33

Johnson Kolluri, Sandeep Kumar Dash, and Ranjita Das. "MM_Fast_RCNN_ResNet: Construction of Multimodal Faster RCNN Inception and ResNet V2 for Pedestrian Tracking and detection." International Journal of Maritime Engineering 1, no. 1 (2024): 509–20. http://dx.doi.org/10.5750/ijme.v1i1.1381.

Full text

Abstract:

Pedestrian identification and tracking is a crucial duty in smart building monitoring. The development of sensors has led to architects' focus on smart building design. The image distortions caused by numerous external environmental factors present a significant problem for pedestrian recognition in smart buildings. It is difficult for machine learning algorithms and other conventional filter-based image classification methods, such as histograms of oriented gradient filters, to function efficiently when dealing with many input photos of pedestrians. Deep learning algorithms are now performing

APA, Harvard, Vancouver, ISO, and other styles

34

Kim, Jongwon, and Jeongho Cho. "RGDiNet: Efficient Onboard Object Detection with Faster R-CNN for Air-to-Ground Surveillance." Sensors 21, no. 5 (2021): 1677. http://dx.doi.org/10.3390/s21051677.

Full text

Abstract:

An essential component for the autonomous flight or air-to-ground surveillance of a UAV is an object detection device. It must possess a high detection accuracy and requires real-time data processing to be employed for various tasks such as search and rescue, object tracking and disaster analysis. With the recent advancements in multimodal data-based object detection architectures, autonomous driving technology has significantly improved, and the latest algorithm has achieved an average precision of up to 96%. However, these remarkable advances may be unsuitable for the image processing of UAV

APA, Harvard, Vancouver, ISO, and other styles

35

Smirnova, Y. K. "Eye Tracking Study of Visual Attention of Children with Hearing Impairments in a Learning Situation." Experimental Psychology (Russia) 16, no. 1 (2023): 4–22. http://dx.doi.org/10.17759/exppsy.2023160101.

Full text

Abstract:

<p>Potential mechanisms underlying atypical joint attention that impede effective learning are analyzed using the example of the consequences of hearing impairment. A sample of preschool children with hearing impairment after cochlear implantation (sensorineural hearing loss, ICD-10 class H90) was studied. For the study, an experimental situation was created that would allow tracing the learning difficulties in children with hearing impairments associated with the skills of joint attention. In the course of completing a training task jointly with an adult in children with hearing impairm

APA, Harvard, Vancouver, ISO, and other styles

36

Gu, Junyi, Artjom Lind, Tek Raj Chhetri, Mauro Bellone, and Raivo Sell. "End-to-End Multimodal Sensor Dataset Collection Framework for Autonomous Vehicles." Sensors 23, no. 15 (2023): 6783. http://dx.doi.org/10.3390/s23156783.

Full text

Abstract:

Autonomous driving vehicles rely on sensors for the robust perception of their surroundings. Such vehicles are equipped with multiple perceptive sensors with a high level of redundancy to ensure safety and reliability in any driving condition. However, multi-sensor, such as camera, LiDAR, and radar systems raise requirements related to sensor calibration and synchronization, which are the fundamental blocks of any autonomous system. On the other hand, sensor fusion and integration have become important aspects of autonomous driving research and directly determine the efficiency and accuracy of

APA, Harvard, Vancouver, ISO, and other styles

37

Bahn, Daniela, Dilara Deniz Türk, Nikol Tsenkova, Gudrun Schwarzer, Melissa Le-Hoa Võ, and Christina Kauschke. "Processing of Scene-Grammar Inconsistencies in Children with Developmental Language Disorder—Insights from Implicit and Explicit Measures." Brain Sciences 15, no. 2 (2025): 139. https://doi.org/10.3390/brainsci15020139.

Full text

Abstract:

Background/Objectives: Developmental language disorders (DLD) are often associated with co-occurring neurodevelopmental difficulties, including attentional or social–emotional problems. Another nonverbal domain, i.e., visual cognition and its relationship to DLD, is virtually unexplored. However, learning visuospatial regularities – a scene-grammar - is crucial for navigating our daily environment. These regularities show certain similarities to the structure of language and there is preliminary evidence for a relationship between scene processing and language competence in preschoolers with a

APA, Harvard, Vancouver, ISO, and other styles

38

Gabryel, Marcin. "The Bag-of-Words Method with Different Types of Image Features and Dictionary Analysis." JUCS - Journal of Universal Computer Science 24, no. (4) (2018): 357–71. https://doi.org/10.3217/jucs-024-04-0357.

Full text

Abstract:

Algorithms from the field of computer vision are widely applied in various fields including security, monitoring, automation elements, but also in multimodal human-computer interactions where they are used for face detection, body tracking and object recognition. Designing algorithms to reliably perform these tasks with limited computing resources and the ability to detect the presence of nearby people and objects in the background, changes in illumination and camera pose is a huge challenge for the field. Many of these problems use different classification methods. One of many image classific

APA, Harvard, Vancouver, ISO, and other styles

39

Rahman, Md Mahfuzur, Sunzida Siddique, Marufa Kamal, Rakib Hossain Rifat, and Kishor Datta Gupta. "UAV (Unmanned Aerial Vehicle): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking." Algorithms 17, no. 12 (2024): 594. https://doi.org/10.3390/a17120594.

Full text

Abstract:

Unmanned Aerial Vehicles (UAVs) have transformed the process of data collection and analysis in a variety of research disciplines, delivering unparalleled adaptability and efficacy. This paper presents a thorough examination of UAV datasets, emphasizing their wide range of applications and progress. UAV datasets consist of various types of data, such as satellite imagery, images captured by drones, and videos. These datasets can be categorized as either unimodal or multimodal, offering a wide range of detailed and comprehensive information. These datasets play a crucial role in disaster damage

APA, Harvard, Vancouver, ISO, and other styles

40

Zhang, Meng, Lingxi Zhang, and Tao Liu. "Aircraft Behavior Recognition on Trajectory Data with a Multimodal Approach." Electronics 13, no. 2 (2024): 367. http://dx.doi.org/10.3390/electronics13020367.

Full text

Abstract:

Moving traces are essential data for target detection and associated behavior recognition. Previous studies have used time–location sequences, route maps, or tracking videos to establish mathematical recognition models for behavior recognition. The multimodal approach has seldom been considered because of the limited modality of sensing data. With the rapid development of natural language processing and computer vision, the multimodal model has become a possible choice to process multisource data. In this study, we have proposed a mathematical model for aircraft behavior recognition with joint

APA, Harvard, Vancouver, ISO, and other styles

41

Smirnova, Yana K. "EYE TRACKING RESEARCH ON THE USE OF DIFFERENT FORMS OF INSTRUCTION IN TEACHING CHILDREN WITH HEARING IMPAIRMENT." Moscow University Psychology Bulletin, no. 2 (2022): 192–222. http://dx.doi.org/10.11621/vsp.2022.02.09.

Full text

Abstract:

Background. It is analyzed how the method of eye movement registration can be used to study the learning processes of children with hearing impairment. On the basis of oculomotor activity data, diffi culties are identifi ed that impede the learning of children with hearing impairment, which contributes to the discovery of eff ective ways of learning. A separate research issue is the search for the eff ective use of diff erent forms of instruction. Objective. Eye-tracking study of learning diffi culties with diff erent forms of instruction (as diff erent forms of multimodal means of establishin

APA, Harvard, Vancouver, ISO, and other styles

42

Popp, Constantin, and Damian T. Murphy. "Creating Audio Object-Focused Acoustic Environments for Room-Scale Virtual Reality." Applied Sciences 12, no. 14 (2022): 7306. http://dx.doi.org/10.3390/app12147306.

Full text

Abstract:

Room-scale virtual reality (VR) affordance in movement and interactivity causes new challenges in creating virtual acoustic environments for VR experiences. Such environments are typically constructed from virtual interactive objects that are accompanied by an Ambisonic bed and an off-screen (“invisible”) music soundtrack, with the Ambisonic bed, music, and virtual acoustics describing the aural features of an area. This methodology can become problematic in room-scale VR as the player cannot approach or interact with such background sounds, contradicting the player’s motion aurally and limiti

APA, Harvard, Vancouver, ISO, and other styles

43

Birchfield, David, and Mina Johnson-Glenberg. "A Next Gen Interface for Embodied Learning." International Journal of Gaming and Computer-Mediated Simulations 2, no. 1 (2010): 49–58. http://dx.doi.org/10.4018/jgcms.2010010105.

Full text

Abstract:

Emerging research from the learning sciences and human-computer interaction supports the premise that learning is effective when it is embodied, collaborative, and multimodal. In response, we have developed a mixed-reality environment called the Situated Multimedia Arts Learning Laboratory (SMALLab). SMALLab enables multiple students to interact with one another and digitally mediated elements via 3D movements and gestures in real physical space. It uses 3D object tracking, real time graphics, and surround-sound to enhance learning. We present two studies from the earth science domain that add

APA, Harvard, Vancouver, ISO, and other styles

44

Li, Jujia, Kaiwen Man, and Joni M. Lakin. "Enhancing Spatial Ability Assessment: Integrating Problem-Solving Strategies in Object Assembly Tasks Using Multimodal Joint-Hierarchical Cognitive Diagnosis Modeling." Journal of Intelligence 13, no. 3 (2025): 30. https://doi.org/10.3390/jintelligence13030030.

Full text

Abstract:

We proposed a novel approach to investigate how problem-solving strategies, identified using response time and eye-tracking data, can impact individuals’ performance on the Object Assembly (OA) task. To conduct an integrated assessment of spatial reasoning ability and problem-solving strategy, we applied the Multimodal Joint-Hierarchical Cognitive Diagnosis Model (MJ-DINA) to analyze the performance of young students (aged 6 to 14) on 17 OA items. The MJ-DINA model consists of three sub-models: a Deterministic Inputs, Noisy “and” Gate (DINA) model for estimating spatial ability, a lognormal RT

APA, Harvard, Vancouver, ISO, and other styles

45

Becerra, Victor, Francisco J. Perales, Miquel Roca, José M. Buades, and Margaret Miró-Julià. "A Wireless Hand Grip Device for Motion and Force Analysis." Applied Sciences 11, no. 13 (2021): 6036. http://dx.doi.org/10.3390/app11136036.

Full text

Abstract:

A prototype portable device that allows for simultaneous hand and fingers motion and precise force measurements has been. Wireless microelectromechanical systems based on inertial and force sensors are suitable for tracking bodily measurements. In particular, they can be used for hand interaction with computer applications. Our interest is to design a multimodal wireless hand grip device that measures and evaluates this activity for ludic or medical rehabilitation purposes. The accuracy and reliability of the proposed device has been evaluated against two different commercial dynamometers (Tak

APA, Harvard, Vancouver, ISO, and other styles

46

Westin, Thomas, José Neves, Peter Mozelius, Carla Sousa, and Lara Mantovan. "Inclusive AR-games for Education of Deaf Children: Challenges and Opportunities." European Conference on Games Based Learning 16, no. 1 (2022): 597–604. http://dx.doi.org/10.34190/ecgbl.16.1.588.

Full text

Abstract:

Game-based learning has had a rapid development in the 21st century, attracting an increasing audience. However, inclusion of all is still not a reality in society, with accessibility for deaf and hard of hearing children as a remaining challenge. To be excluded from learning due to communication barriers can have severe consequences for further studies and work. Based on previous research Augmented Reality (AR) games can be joyful learning tools that include activities with different sign languages, but AR based learning games for deaf and hard of hearing lack research. This paper aims to pre

APA, Harvard, Vancouver, ISO, and other styles

47

Azar, Zeynep, and Aslı Özyürek. "Discourse management." Dutch Journal of Applied Linguistics 4, no. 2 (2015): 222–40. http://dx.doi.org/10.1075/dujal.4.2.06aza.

Full text

Abstract:

Speakers achieve coherence in discourse by alternating between differential lexical forms e.g. noun phrase, pronoun, and null form in accordance with the accessibility of the entities they refer to, i.e. whether they introduce an entity into discourse for the first time or continue referring to an entity they already mentioned before. Moreover, tracking of entities in discourse is a multimodal phenomenon. Studies show that speakers are sensitive to the informational structure of discourse and use fuller forms (e.g. full noun phrases) in speech and gesture more when re-introducing an entity whi

APA, Harvard, Vancouver, ISO, and other styles

48

Tung, Tony, and Takashi Matsuyama. "Visual Tracking Using Multimodal Particle Filter." International Journal of Natural Computing Research 4, no. 3 (2014): 69–84. http://dx.doi.org/10.4018/ijncr.2014070104.

Full text

Abstract:

Visual tracking of humans or objects in motion is a challenging problem when observed data undergo appearance changes (e.g., due to illumination variations, occlusion, cluttered background, etc.). Moreover, tracking systems are usually initialized with predefined target templates, or trained beforehand using known datasets. Hence, they are not always efficient to detect and track objects whose appearance changes over time. In this paper, we propose a multimodal framework based on particle filtering for visual tracking of objects under challenging conditions (e.g., tracking various human body p

APA, Harvard, Vancouver, ISO, and other styles

49

Rakhi Madhukararao Joshi. "Enhancing Vehicle Tracking and Recognition Across Multiple Cameras with Multimodal Contrastive Domain Sharing GAN and Topological Embeddings." Panamerican Mathematical Journal 34, no. 1 (2024): 114–27. http://dx.doi.org/10.52783/pmj.v34.i1.910.

Full text

Abstract:

Using Multimodal Contrastive Domain Sharing Generative Adversarial Networks (GAN) and topological embeddings, this study shows a new way to improve car tracking and classification across multiple camera feeds. Different camera angles and lighting conditions can make it hard for current car tracking systems to work correctly. This study tries to solve these problems. Common Objects in Context (COCO) and ImageNet are two datasets that are used in this method for training. Multimodal Contrastive Domain Sharing GAN is used for detection and tracking. It makes cross-modal learning easier by letting

APA, Harvard, Vancouver, ISO, and other styles

50

Waisberg, Ethan, Joshua Ong, Nasif Zaman, Sharif Amit Kamran, Andrew G. Lee, and Alireza Tavakkoli. "Head-Mounted Dynamic Visual Acuity for G-Transition Effects During Interplanetary Spaceflight: Technology Development and Results from an Early Validation Study." Aerospace Medicine and Human Performance 93, no. 11 (2022): 800–805. http://dx.doi.org/10.3357/amhp.6092.2022.

Full text

Abstract:

INTRODUCTION: Dynamic visual acuity (DVA) refers to the ability of the eye to discern detail in a moving object and plays an important role whenever rapid physical responses to environmental changes are required, such as while performing tasks onboard a space shuttle. A significant decrease in DVA has previously been noted after astronauts returned from long-duration spaceflight (0.75 eye chart lines, 24 h after returning from space). As part of a NASA-funded, head-mounted multimodal visual assessment system for monitoring vision changes in spaceflight, we elaborate upon the technical developm

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!