Log in

Relevant bibliographies by topics / Multimodal dataset / Journal articles

To see the other types of publications on this topic, follow the link: Multimodal dataset.

Journal articles on the topic 'Multimodal dataset'

Author: Grafiati

Published: 4 June 2021

Last updated: 27 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multimodal dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Lingeswari Sivagnanam. "Enhanced Ensemble Machine Learning Technique to detect Bipolar Disorder." Journal of Information Systems Engineering and Management 10, no. 38s (2025): 610–19. https://doi.org/10.52783/jisem.v10i38s.6932.

Full text

Abstract:

The study introduced a unique Enhanced Ensemble Machine Learning (EEML) method for the detection of bipolar disorder, a serious medical diagnoses problem that acquires an immediate and timely diagnosis for an effective and earlier treatment. The EEML approach combines several machine learning models with the data based on patient’s attitude on different perspectives such as real, signal based, textual and behavioural data and more. In order to train individual classifiers for identifying mental diseases, pertinent characteristics are taken from each data source using a thorough feature selecti

APA, Harvard, Vancouver, ISO, and other styles

2

Zhong, Baisheng. "Multimodal Emotion Cognition Method Based on Multi-Channel Graphic Interaction." International Journal of Cognitive Informatics and Natural Intelligence 18, no. 1 (2024): 1–17. http://dx.doi.org/10.4018/ijcini.349969.

Full text

Abstract:

The relationship between the emotional components associated with images and text is a crucial way of multimodal emotion analysis. However, most of the present multimodel affective cognitive models simply associate the features of images and texts without thoroughly investigating their interactions, resulting in poor recognition. Therefore, a multimodel emotion cognition method based on multi-channel graphic interaction is proposed. Text context features are extracted, scene and image information is encoded, and useful features are obtained. Based on these results, the modal alignment module b

APA, Harvard, Vancouver, ISO, and other styles

3

Cheng, Jingming, Wenjun Xie, Ziqi Shen, Lin Li, and Xiaoping Liu. "Multimodal Human Motion Synchronization Dataset." Journal of Computer-Aided Design & Computer Graphics 34, no. 11 (2022): 1713–22. http://dx.doi.org/10.3724/sp.j.1089.2022.19194.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Guo, Hao, Zihan Ma, Zhi Zeng, et al. "Each Fake News Is Fake in Its Own Way: An Attribution Multi-Granularity Benchmark for Multimodal Fake News Detection." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 1 (2025): 228–36. https://doi.org/10.1609/aaai.v39i1.31999.

Full text

Abstract:

Social platforms, while facilitating access to information, have also become saturated with a plethora of fake news, resulting in negative consequences. Automatic multimodal fake news detection is a worthwhile pursuit. Existing multimodal fake news datasets only provide binary labels of real or fake. However, real news is alike, while each fake news is fake in its own way. These datasets fail to reflect the mixed nature of various types of multimodal fake news. To bridge the gap, we construct an attributing multi-granularity multimodal fake news detection dataset AMG, revealing the inherent fa

APA, Harvard, Vancouver, ISO, and other styles

5

Zhang, Ziqi, Zhaohong Deng, Wei Zhang, and Lingchao Bu. "MMTD: A Multilingual and Multimodal Spam Detection Model Combining Text and Document Images." Applied Sciences 13, no. 21 (2023): 11783. http://dx.doi.org/10.3390/app132111783.

Full text

Abstract:

Spam detection has been a topic of extensive research; however, there has been limited focus on multimodal spam detection. In this study, we introduce a novel approach for multilingual multimodal spam detection, presenting the Multilingual and Multimodal Spam Detection Model combining Text and Document Images (MMTD). Unlike previous methods, our proposed model incorporates a document image encoder to extract image features from the entire email, providing a holistic understanding of both textual and visual content through a single image. Additionally, we employ a multilingual text encoder to e

APA, Harvard, Vancouver, ISO, and other styles

6

Arioz, Umut, Urška Smrke, Nejc Plohl, and Izidor Mlakar. "Scoping Review on the Multimodal Classification of Depression and Experimental Study on Existing Multimodal Models." Diagnostics 12, no. 11 (2022): 2683. http://dx.doi.org/10.3390/diagnostics12112683.

Full text

Abstract:

Depression is a prevalent comorbidity in patients with severe physical disorders, such as cancer, stroke, and coronary diseases. Although it can significantly impact the course of the primary disease, the signs of depression are often underestimated and overlooked. The aim of this paper was to review algorithms for the automatic, uniform, and multimodal classification of signs of depression from human conversations and to evaluate their accuracy. For the scoping review, the PRISMA guidelines for scoping reviews were followed. In the scoping review, the search yielded 1095 papers, out of which

APA, Harvard, Vancouver, ISO, and other styles

7

Ma, Shukui, Pengyuan Ma, Shuaichao Feng, Fei Ma, and Guangping Zhuo. "Multimodal Data-Based Text Generation Depression Classification Model." International Journal of Computer Science and Information Technology 5, no. 1 (2025): 175–93. https://doi.org/10.62051/ijcsit.v5n1.16.

Full text

Abstract:

Depression classification often relies on multimodal features, but existing models struggle to capture the similarity between multimodal features. Moreover, the social stigma surrounding depression leads to limited availability of datasets, which constrains model accuracy. This study aims to improve multimodal depression recognition methods by proposing a Multimodal Generation-Text Depression Classification Model. The model introduces a Multimodal-Deep-Extract-Feature Net to capture both long- and short-term sequential features. A Dual Text Contrastive Learning Module is employed to generate e

APA, Harvard, Vancouver, ISO, and other styles

8

Liu, K., A. Wu, X. Wan, and S. Li. "MRSSC: A BENCHMARK DATASET FOR MULTIMODAL REMOTE SENSING SCENE CLASSIFICATION." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2021 (June 28, 2021): 785–92. http://dx.doi.org/10.5194/isprs-archives-xliii-b2-2021-785-2021.

Full text

Abstract:

Abstract. Scene classification based on multi-source remote sensing image is important for image interpretation, and has many applications, such as change detection, visual navigation and image retrieval. Deep learning has become a research hotspot in the field of remote sensing scene classification, and dataset is an important driving force to promote its development. Most of the remote sensing scene classification datasets are optical images, and multimodal datasets are relatively rare. Existing datasets that contain both optical and SAR data, such as SARptical and WHU-SEN-City, which mainly

APA, Harvard, Vancouver, ISO, and other styles

9

Gao, Yuzhou, and Guoquan Ma. "Human Motion Recognition Based on Multimodal Characteristics of Learning Quality in Football Scene." Mathematical Problems in Engineering 2021 (August 30, 2021): 1–8. http://dx.doi.org/10.1155/2021/7963616.

Full text

Abstract:

The task of human motion recognition based on video is widely concerned, and its research results have been widely used in intelligent human-computer interaction, virtual reality, intelligent monitoring, security, multimedia content analysis, etc. The purpose of this study is to explore the human action recognition in the football scene combined with learning quality related multimodal features. The method used in this study is to select BN-Inception as the underlying feature extraction network and use uncontrolled environment and real world to capture datasets UCFl01 and HMDB51, and pretraini

APA, Harvard, Vancouver, ISO, and other styles

10

Nguyen, Ngoc-Hoang, Tran-Dac-Thinh Phan, Guee-Sang Lee, Soo-Hyung Kim, and Hyung-Jeong Yang. "Gesture Recognition Based on 3D Human Pose Estimation and Body Part Segmentation for RGB Data Input." Applied Sciences 10, no. 18 (2020): 6188. http://dx.doi.org/10.3390/app10186188.

Full text

Abstract:

This paper presents a novel approach for dynamic gesture recognition using multi-features extracted from RGB data input. Most of the challenges in gesture recognition revolve around the axis of the presence of multiple actors in the scene, occlusions, and viewpoint variations. In this paper, we develop a gesture recognition approach by hybrid deep learning where RGB frames, 3D skeleton joint information, and body part segmentation are used to overcome such problems. Extracted from the RGB images are the multimodal input observations, which are combined by multi-modal stream networks suited to

APA, Harvard, Vancouver, ISO, and other styles

11

Martínez-Villaseñor, Lourdes, Hiram Ponce, Jorge Brieva, Ernesto Moya-Albor, José Núñez-Martínez, and Carlos Peñafort-Asturiano. "UP-Fall Detection Dataset: A Multimodal Approach." Sensors 19, no. 9 (2019): 1988. http://dx.doi.org/10.3390/s19091988.

Full text

Abstract:

Falls, especially in elderly persons, are an important health problem worldwide. Reliable fall detection systems can mitigate negative consequences of falls. Among the important challenges and issues reported in literature is the difficulty of fair comparison between fall detection systems and machine learning techniques for detection. In this paper, we present UP-Fall Detection Dataset. The dataset comprises raw and feature sets retrieved from 17 healthy young individuals without any impairment that performed 11 activities and falls, with three attempts each. The dataset also summarizes more

APA, Harvard, Vancouver, ISO, and other styles

12

Chen, Jing, Kejun Wang, Cong Zhao, Chaoqun Yin, and Ziqiang Huang. "MED: multimodal emotion dataset in the wild." Journal of Image and Graphics 25, no. 11 (2020): 2349–60. http://dx.doi.org/10.11834/jig.200215.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Wu, Te-Lin, Shikhar Singh, Sayan Paul, Gully Burns, and Nanyun Peng. "MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14076–84. http://dx.doi.org/10.1609/aaai.v35i16.17657.

Full text

Abstract:

We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeNt methoD clAssification. The dataset is collected in a fully automated distant supervision manner, where the labels are obtained from an existing curated database, and the actual contents are extracted from papers associated with each of the records in the database. We benchmark various state-of-the-art NLP and computer vision models, including unimodal models which only take either caption texts or images as inputs, and multimodal models. Extensive experiments and analysis show that multimodal models, despite outperformin

APA, Harvard, Vancouver, ISO, and other styles

14

Guo, Xiaobo, and Soroush Vosoughi. "A Large-Scale Longitudinal Multimodal Dataset of State-Backed Information Operations on Twitter." Proceedings of the International AAAI Conference on Web and Social Media 16 (May 31, 2022): 1245–50. http://dx.doi.org/10.1609/icwsm.v16i1.19375.

Full text

Abstract:

This paper proposes a large-scale and comprehensive dataset of 28 sub-datasets of state-backed tweets and accounts affiliated with 14 different countries, spanning more than 3 years, and a corresponding "negative" dataset of background tweets from the same time period and on similar topics. To our knowledge, this is the first dataset that contains both state-sponsored propaganda tweets and carefully collected corresponding negative tweet datasets for so many countries spanning such a long period of time.

APA, Harvard, Vancouver, ISO, and other styles

15

Hajar Filali. "Comparative Study of Unimodal and Multimodal Systems Based on MNN." Journal of Information Systems Engineering and Management 10, no. 18s (2025): 354–62. https://doi.org/10.52783/jisem.v10i18s.2922.

Full text

Abstract:

Emotion recognition has emerged as a pivotal area in the development of emotionally intelligent systems, with research traditionally focusing on unimodal approaches. However, recent advancements have highlighted the advantages of multimodal systems, which leverage complementary inputs such as text, speech, and visual cues. This study conducts a comparative analysis of unimodal and multimodal emotion recognition systems based on the Meaningful Neural Network (MNN) architecture. Our approach integrates advanced feature extraction techniques, including a Graph Convolutional Network for acoustic d

APA, Harvard, Vancouver, ISO, and other styles

16

Steinert-Threlkeld, Zachary, and Jungseock Joo. "MMCHIVED: Multimodal Chile and Venezuela Protest Event Data." Proceedings of the International AAAI Conference on Web and Social Media 16 (May 31, 2022): 1332–41. http://dx.doi.org/10.1609/icwsm.v16i1.19385.

Full text

Abstract:

This paper introduces the Multimodal Chile & Venezuela Protest Event Dataset (MMCHIVED). MMCHIVED contains city-day event data using a new source of data, text and images shared on social media. These data enables the improved measurement of theoretically important variables such as protest size, protester and state violence, protester demographics, and emotions. In Venezuela, MMCHIVED records many more protests than existing datasets. In Chile, it records slightly more events than the Armed Conflict Location and Events Dataset (ACLED). These extra events are from small cities far from Car

APA, Harvard, Vancouver, ISO, and other styles

17

Suganya, R., M. Narmatha, and S. Vengatesh Kumar. "An Emotionally Intelligent System for Multimodal Sentiment Classification." Indian Journal Of Science And Technology 17, no. 42 (2024): 4386–94. http://dx.doi.org/10.17485/ijst/v17i42.2349.

Full text

Abstract:

Objectives: To develop a multimodal sentiment classification model by analyzing the impact of biological signals and examining the concatenation of various modalities in a marketing scenario. Methods: This paper proposes a new emotionally intelligent system for multimodal sentiment classification. Initially, a multimodal database is prepared by collecting text, speech, facial expression, posture, and biological signals for each individual in the user-machine interaction scenario. This database is preprocessed to remove unwanted noise or missing values. After preprocessing, the dataset is split

APA, Harvard, Vancouver, ISO, and other styles

18

Juluri, Samatha, and Madhavi G. "SecureSense: Enhancing Person Verification through Multimodal Biometrics for Robust Authentication." Scalable Computing: Practice and Experience 25, no. 2 (2024): 1040–54. http://dx.doi.org/10.12694/scpe.v25i2.2524.

Full text

Abstract:

Biometrics provide enhanced security and convenience compared to conventional methods of individual authentication. A more robust and effective method of individual authentication has emerged due to recent advancements in multimodal biometrics. Unimodal systems offer lower security and lack the robustness found in multimodal biometric systems. The research paper introduces a novel approach, employing multiple biometric modalities, including face, fingerprint, and iris, to authenticate users in a multimodal biometric system. The paper proposes the ”Secure Sense” framework, which combines multip

APA, Harvard, Vancouver, ISO, and other styles

19

Cañadas-Aránega, Fernando, Jose Luis Blanco-Claraco, Jose Carlos Moreno, and Francisco Rodriguez-Diaz. "Multimodal Mobile Robotic Dataset for a Typical Mediterranean Greenhouse: The GREENBOT Dataset." Sensors 24, no. 6 (2024): 1874. http://dx.doi.org/10.3390/s24061874.

Full text

Abstract:

This paper presents an innovative dataset designed explicitly for challenging agricultural environments, such as greenhouses, where precise location is crucial, but GNNS accuracy may be compromised by construction elements and the crop. The dataset was collected using a mobile platform equipped with a set of sensors typically used in mobile robots as it was moved through all the corridors of a typical Mediterranean greenhouse featuring tomato crops. This dataset presents a unique opportunity for constructing detailed 3D models of plants in such indoor-like spaces, with potential applications s

APA, Harvard, Vancouver, ISO, and other styles

20

Pratibha, Amandeep Kaur, Meenu Khurana, and Robertas Damaševičius. "Multimodal Hinglish Tweet Dataset for Deep Pragmatic Analysis." Data 9, no. 2 (2024): 38. http://dx.doi.org/10.3390/data9020038.

Full text

Abstract:

Wars, conflicts, and peace efforts have become inherent characteristics of regions, and understanding the prevailing sentiments related to these issues is crucial for finding long-lasting solutions. Twitter/`X’, with its vast user base and real-time nature, provides a valuable source to assess the raw emotions and opinions of people regarding war, conflict, and peace. This paper focuses on collecting and curating hinglish tweets specifically related to wars, conflicts, and associated taxonomy. The creation of said dataset addresses the existing gap in contemporary literature, which lacks compr

APA, Harvard, Vancouver, ISO, and other styles

21

Mitchell, John C., Abbas A. Dehghani-Sanij, Sheng Q. Xie, and Rory J. O’Connor. "Analysis of Multimodal Sensor Systems for Identifying Basic Walking Activities." Technologies 13, no. 4 (2025): 152. https://doi.org/10.3390/technologies13040152.

Full text

Abstract:

Falls are a major health issue in societies globally and the second leading cause of unintentional death worldwide. To address this issue, many studies aim to remotely monitor gait to prevent falls. However, these activity data collected in studies must be labelled with the appropriate environmental context through Human Activity Recognition (HAR). Multimodal HAR datasets often achieve high accuracies at the cost of cumbersome sensor systems, creating a need for these datasets to be analysed to identify the sensor types and locations that enable high-accuracy HAR. This paper analyses four data

APA, Harvard, Vancouver, ISO, and other styles

22

Rai, Hari Mohan, Joon Yoo, Saurabh Agarwal, and Neha Agarwal. "LightweightUNet: Multimodal Deep Learning with GAN-Augmented Imaging Data for Efficient Breast Cancer Detection." Bioengineering 12, no. 1 (2025): 73. https://doi.org/10.3390/bioengineering12010073.

Full text

Abstract:

Breast cancer ranks as the second most prevalent cancer globally and is the most frequently diagnosed cancer among women; therefore, early, automated, and precise detection is essential. Most AI-based techniques for breast cancer detection are complex and have high computational costs. Hence, to overcome this challenge, we have presented the innovative LightweightUNet hybrid deep learning (DL) classifier for the accurate classification of breast cancer. The proposed model boasts a low computational cost due to its smaller number of layers in its architecture, and its adaptive nature stems from

APA, Harvard, Vancouver, ISO, and other styles

23

Das, Avishek, Moumita Sen Sarma, Mohammed Moshiul Hoque, Nazmul Siddique, and M. Ali Akber Dewan. "AVaTER: Fusing Audio, Visual, and Textual Modalities Using Cross-Modal Attention for Emotion Recognition." Sensors 24, no. 18 (2024): 5862. http://dx.doi.org/10.3390/s24185862.

Full text

Abstract:

Multimodal emotion classification (MEC) involves analyzing and identifying human emotions by integrating data from multiple sources, such as audio, video, and text. This approach leverages the complementary strengths of each modality to enhance the accuracy and robustness of emotion recognition systems. However, one significant challenge is effectively integrating these diverse data sources, each with unique characteristics and levels of noise. Additionally, the scarcity of large, annotated multimodal datasets in Bangla limits the training and evaluation of models. In this work, we unveiled a

APA, Harvard, Vancouver, ISO, and other styles

24

Ha, Jongwoo, Joonhyuck Ryu, and Joonghoon Ko. "Multi-Modality Tensor Fusion Based Human Fatigue Detection." Electronics 12, no. 15 (2023): 3344. http://dx.doi.org/10.3390/electronics12153344.

Full text

Abstract:

Multimodal learning is an expanding research area and aims to pursue a better understanding of given data by regarding different modals. Multimodal approaches for qualitative data are used for the quantitative proofing of ground-truth datasets and discovering unexpected phenomena. In this paper, we investigate the effect of multimodal learning schemes of quantitative data to assess its qualitative state. We try to interpret human fatigue levels through analyzing video, thermal image and voice data together. The experiment showed that the multimodal approach using three types of data was more e

APA, Harvard, Vancouver, ISO, and other styles

25

You, Guibing, Kelei Guo, Jie Gao, Hanjie Feng, and Wei Zou. "F-TransR: A sports event revenue prediction model integrating multi-modal and time-series data." PLOS One 20, no. 7 (2025): e0327459. https://doi.org/10.1371/journal.pone.0327459.

Full text

Abstract:

Sports event revenue prediction is a complex, multimodal task that requires effective integration of diverse data sources. Traditional models struggle to combine real-time data streams with historical time-series data, resulting in limited prediction accuracy. To address this challenge, we propose F-TransR, a Transformer-based multimodal revenue prediction model. F-TransR introduces key innovations, including a real-time data stream processing module, a historical time-series modeling module, a novel multimodal fusion mechanism, and a cross-modal interaction modeling module. These modules enab

APA, Harvard, Vancouver, ISO, and other styles

26

Ren, Yi, Tianyi Zhang, Zhixiong Han, et al. "A Novel Adaptive Fine-Tuning Algorithm for Multimodal Models: Self-Optimizing Classification and Selection of High-Quality Datasets in Remote Sensing." Remote Sensing 17, no. 10 (2025): 1748. https://doi.org/10.3390/rs17101748.

Full text

Abstract:

The latest research indicates that Large Vision-Language Models (VLMs) have a wide range of applications in the field of remote sensing. However, the vast amount of image data in this field presents a challenge in selecting high-quality multimodal data, which are essential for saving computational resources and time. Therefore, we propose an adaptive fine-tuning algorithm for multimodal large models. The core steps of this algorithm involve two stages of data truncation. First, the vast dataset is projected into a semantic vector space, where the MiniBatchKMeans algorithm is used for automated

APA, Harvard, Vancouver, ISO, and other styles

27

Macfadyen, Craig, Ajay Duraiswamy, and David Harris-Birtill. "Classification of hyper-scale multimodal imaging datasets." PLOS Digital Health 2, no. 12 (2023): e0000191. http://dx.doi.org/10.1371/journal.pdig.0000191.

Full text

Abstract:

Algorithms that classify hyper-scale multi-modal datasets, comprising of millions of images, into constituent modality types can help researchers quickly retrieve and classify diagnostic imaging data, accelerating clinical outcomes. This research aims to demonstrate that a deep neural network that is trained on a hyper-scale dataset (4.5 million images) composed of heterogeneous multi-modal data can be used to obtain significant modality classification accuracy (96%). By combining 102 medical imaging datasets, a dataset of 4.5 million images was created. A ResNet-50, ResNet-18, and VGG16 were

APA, Harvard, Vancouver, ISO, and other styles

28

Ehab, Engy, Nahla Belal, and Yasser Omar. "Tri-FND: Multimodal Fake News Detection Using Triplet Transformer Models." Journal of Advanced Research in Applied Sciences and Engineering Technology 63, no. 1 (2025): 255–70. https://doi.org/10.37934/araset.63.1.255270.

Full text

Abstract:

The prevalence of fake news accompanied by multimedia content on the internet presents a significant challenge for users attempting to discern its authenticity. Automatically identifying and classifying fake news is a crucial way for combating misinformation and maintain the integrity of information dissemination. This paper proposes a fake news detection approach that exploits multimodality's potential and integrates textual and visual data to improve the fake news classification system. The novel multimodal learning approach to fake news detection, which has been termed Tri-FND, uses triplet

APA, Harvard, Vancouver, ISO, and other styles

29

Dashenkov, Dmytro, and Kirill Smelyakov. "Extending the ImageNET dataset for multimodal text and image learning." INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, no. 1(31) (March 31, 2025): 20–31. https://doi.org/10.30837/2522-9818.2025.1.020.

Full text

Abstract:

Subject matter: image processing methods for classification and other computer vision tasks using multimodal data, including text descriptions of classes and images. Goal: development of a multimodal dataset for image classification using textual meta-information analysis. The resulting dataset should consist of image data, image classes, namely 1000 classes of objects depicted in photos from the ImageNet set, textual descriptions of individual images, and textual descriptions of image classes as a whole. Tasks: 1) based on the images of the ImageNet dataset, compile a dataset for training cla

APA, Harvard, Vancouver, ISO, and other styles

30

Yoon, Jeewoo, Chaewon Kang, Seungbae Kim, and Jinyoung Han. "D-vlog: Multimodal Vlog Dataset for Depression Detection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 11 (2022): 12226–34. http://dx.doi.org/10.1609/aaai.v36i11.21483.

Full text

Abstract:

Detecting depression based on non-verbal behaviors has received great attention. However, most prior work on detecting depression mainly focused on detecting depressed individuals in laboratory settings, which are difficult to be generalized in practice. In addition, little attention has been paid to analyzing the non-verbal behaviors of depressed individuals in the wild. Therefore, in this paper, we present a multimodal depression dataset, D-Vlog, which consists of 961 vlogs (i.e., around 160 hours) collected from YouTube, which can be utilized in developing depression detection models based

APA, Harvard, Vancouver, ISO, and other styles

31

Lopes Silva, Pedro, Eduardo Luz, Gladston Moreira, Lauro Moraes, and David Menotti. "ChimericalDataset Creation Protocol Based on Doddington Zoo: A Biometric Application with Face, Eye, and ECG." Sensors 19, no. 13 (2019): 2968. http://dx.doi.org/10.3390/s19132968.

Full text

Abstract:

Multimodal systems are a workaround to enhance the robustness and effectiveness of biometric systems. A proper multimodal dataset is of the utmost importance to build such systems. The literature presents some multimodal datasets, although, to the best of our knowledge, there are no previous studies combining face, iris/eye, and vital signals such as the Electrocardiogram (ECG). Moreover, there is no methodology to guide the construction and evaluation of a chimeric dataset. Taking that fact into account, we propose to create a chimeric dataset from three modalities in this work: ECG, eye, and

APA, Harvard, Vancouver, ISO, and other styles

32

Deng, Wan-Yu, Dan Liu, and Ying-Ying Dong. "Feature Selection and Classification for High-Dimensional Incomplete Multimodal Data." Mathematical Problems in Engineering 2018 (August 12, 2018): 1–9. http://dx.doi.org/10.1155/2018/1583969.

Full text

Abstract:

Due to missing values, incomplete dataset is ubiquitous in multimodal scene. Complete data is a prerequisite of the most existing multimodality data fusion methods. For incomplete multimodal high-dimensional data, we propose a feature selection and classification method. Our method mainly focuses on extracting the most relevant features from the high-dimensional features and then improving the classification accuracy. The experimental results show that our method produces considerably better performance on incomplete multimodal data such as ADNI dataset and Office dataset, compared to the case

APA, Harvard, Vancouver, ISO, and other styles

33

Salekin, Md Sirajus, Ghada Zamzmi, Jacqueline Hausmann, et al. "Multimodal neonatal procedural and postoperative pain assessment dataset." Data in Brief 35 (April 2021): 106796. http://dx.doi.org/10.1016/j.dib.2021.106796.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Narkhede, Parag, Rahee Walambe, Pulkit Chandel, Shruti Mandaokar, and Ketan Kotecha. "MultimodalGasData: Multimodal Dataset for Gas Detection and Classification." Data 7, no. 8 (2022): 112. http://dx.doi.org/10.3390/data7080112.

Full text

Abstract:

The detection of gas leakages is a crucial aspect to be considered in the chemical industries, coal mines, home applications, etc. Early detection and identification of the type of gas is required to avoid damage to human lives and the environment. The MultimodalGasData presented in this paper is a novel collection of simultaneous data samples taken using seven different gas-detecting sensors and a thermal imaging camera. The low-cost sensors are generally less sensitive and less reliable; hence, they are unable to detect the gases from a longer distance. A thermal camera that can sense the te

APA, Harvard, Vancouver, ISO, and other styles

35

Cao, Houwei, David G. Cooper, Michael K. Keutmann, Ruben C. Gur, Ani Nenkova, and Ragini Verma. "CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset." IEEE Transactions on Affective Computing 5, no. 4 (2014): 377–90. http://dx.doi.org/10.1109/taffc.2014.2336244.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Oppelt, Maximilian P., Andreas Foltyn, Jessica Deuschel, et al. "ADABase: A Multimodal Dataset for Cognitive Load Estimation." Sensors 23, no. 1 (2022): 340. http://dx.doi.org/10.3390/s23010340.

Full text

Abstract:

Driver monitoring systems play an important role in lower to mid-level autonomous vehicles. Our work focuses on the detection of cognitive load as a component of driver-state estimation to improve traffic safety. By inducing single and dual-task workloads of increasing intensity on 51 subjects, while continuously measuring signals from multiple modalities, based on physiological measurements such as ECG, EDA, EMG, PPG, respiration rate, skin temperature and eye tracker data, as well as behavioral measurements such as action units extracted from facial videos, performance metrics like reaction

APA, Harvard, Vancouver, ISO, and other styles

37

Monteiro Rocha Lima, Bruno, Venkata Naga Sai Siddhartha Danyamraju, Thiago Eustaquio Alves de Oliveira, and Vinicius Prado da Fonseca. "A multimodal tactile dataset for dynamic texture classification." Data in Brief 50 (October 2023): 109590. http://dx.doi.org/10.1016/j.dib.2023.109590.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Taramasco, Carla, Miguel Pineiro, Pablo Ormeño-Arriagada, Diego Robles, and David Araya. "Multimodal dataset for sensor fusion in fall detection." PeerJ 13 (April 1, 2025): e19004. https://doi.org/10.7717/peerj.19004.

Full text

Abstract:

The necessity for effective automatic fall detection mechanisms in older adults is driven by the growing demographic of elderly individuals who are at substantial health risk from falls, particularly when residing alone. Despite the existence of numerous fall detection systems (FDSs) that utilize machine learning and predictive modeling, accurately distinguishing between everyday activities and genuine falls continues to pose significant challenges, exacerbated by the varied nature of residential settings. Adaptable solutions are essential to cater to the diverse conditions under which falls o

APA, Harvard, Vancouver, ISO, and other styles

39

Weiß, Christof, Frank Zalkow, Vlora Arifi-Müller, et al. "Schubert Winterreise Dataset." Journal on Computing and Cultural Heritage 14, no. 2 (2021): 1–18. http://dx.doi.org/10.1145/3429743.

Full text

Abstract:

This article presents a multimodal dataset comprising various representations and annotations of Franz Schubert’s song cycle Winterreise . Schubert’s seminal work constitutes an outstanding example of the Romantic song cycle—a central genre within Western classical music. Our dataset unifies several public sources and annotations carefully created by music experts, compiled in a comprehensive and consistent way. The multimodal representations comprise the singer’s lyrics, sheet music in different machine-readable formats, and audio recordings of nine performances, two of which are freely acces

APA, Harvard, Vancouver, ISO, and other styles

40

Liu, Kang, Jian Yang, and Shengyang Li. "Remote-Sensing Cross-Domain Scene Classification: A Dataset and Benchmark." Remote Sensing 14, no. 18 (2022): 4635. http://dx.doi.org/10.3390/rs14184635.

Full text

Abstract:

Domain adaptation for classification has achieved significant progress in natural images but not in remote-sensing images due to huge differences in data-imaging mechanisms between different modalities and inconsistencies in class labels among existing datasets. More importantly, the lack of cross-domain benchmark datasets has become a major obstacle to the development of scene classification in multimodal remote-sensing images. In this paper, we present a cross-domain dataset of multimodal remote-sensing scene classification (MRSSC). The proposed MRSSC dataset contains 26,710 images of 7 typi

APA, Harvard, Vancouver, ISO, and other styles

41

Liu, Kang, and Xin Gao. "Multiscale Efficient Channel Attention for Fusion Lane Line Segmentation." Complexity 2021 (December 7, 2021): 1–12. http://dx.doi.org/10.1155/2021/6791882.

Full text

Abstract:

The use of multimodal sensors for lane line segmentation has become a growing trend. To achieve robust multimodal fusion, we introduced a new multimodal fusion method and proved its effectiveness in an improved fusion network. Specifically, a multiscale fusion module is proposed to extract effective features from data of different modalities, and a channel attention module is used to adaptively calculate the contribution of the fused feature channels. We verified the effect of multimodal fusion on the KITTI benchmark dataset and A2D2 dataset and proved the effectiveness of the proposed method

APA, Harvard, Vancouver, ISO, and other styles

42

Sharma, Ashutosh, Rajeev Kumar, Isha Kansal, et al. "Fire Detection in Urban Areas Using Multimodal Data and Federated Learning." Fire 7, no. 4 (2024): 104. http://dx.doi.org/10.3390/fire7040104.

Full text

Abstract:

Fire chemical sensing for indoor detection of fire plays an essential role because it can detect chemical volatiles before smoke particles, providing a faster and more reliable method for early fire detection. A thermal imaging camera and seven distinct fire-detecting sensors were used simultaneously to acquire the multimodal fire data that is the subject of this paper. The low-cost sensors typically have lower sensitivity and reliability, making it impossible for them to detect fire at greater distances. To go beyond the limitation of using solely sensors for identifying fire, the multimodal

APA, Harvard, Vancouver, ISO, and other styles

43

Guo, Xiaoxu, Han Cao, Yachao Cui, and Haiyan Zhao. "Evaluation and Explanation of Post Quality Based on a Multimodal, Multilevel, and Multi-Scope Focused Fusion Mechanism." Electronics 14, no. 4 (2025): 656. https://doi.org/10.3390/electronics14040656.

Full text

Abstract:

Assessing the quality of multimodal posts is a challenging task that involves using multimodal data to evaluate the quality of posts’ responses to discussion topics. Providing evaluations and explanations plays a crucial role in promoting students’ individualized development. However, existing research on post quality faces the following challenges: (1) Most evaluation methods are classification tasks that lack explanations and guidance. (2) There is a lack of a fusion mechanism that focuses on each modality’s information, is multidimensional, and operates at multiple levels. Based on these ch

APA, Harvard, Vancouver, ISO, and other styles

44

Wei, Haoran, Pranav Chopada, and Nasser Kehtarnavaz. "C-MHAD: Continuous Multimodal Human Action Dataset of Simultaneous Video and Inertial Sensing." Sensors 20, no. 10 (2020): 2905. http://dx.doi.org/10.3390/s20102905.

Full text

Abstract:

Existing public domain multi-modal datasets for human action recognition only include actions of interest that have already been segmented from action streams. These datasets cannot be used to study a more realistic action recognition scenario where actions of interest occur randomly and continuously among actions of non-interest or no actions. It is more challenging to recognize actions of interest in continuous action streams since the starts and ends of these actions are not known and need to be determined in an on-the-fly manner. Furthermore, there exists no public domain multi-modal datas

APA, Harvard, Vancouver, ISO, and other styles

45

Zhang, Jingyu, Xinyi Yan, Yi Xiang, Yingyi Zhang, and Chengzhi Zhang. "Building a Multimodal Dataset of Academic Paper for Keyword Extraction." Proceedings of the Association for Information Science and Technology 61, no. 1 (2024): 435–46. http://dx.doi.org/10.1002/pra2.1040.

Full text

Abstract:

ABSTRACTUp to this point, keyword extraction task typically relies solely on textual data. Neglecting visual details and audio features from image and audio modalities leads to deficiencies in information richness and overlooks potential correlations, thereby constraining the model's ability to learn representations of the data and the accuracy of model predictions. Furthermore, the currently available multimodal datasets for keyword extraction task are particularly scarce, further hindering the progress of research on multimodal keyword extraction task. Therefore, this study constructs a mult

APA, Harvard, Vancouver, ISO, and other styles

46

Zhang, Liang, Anwen Hu, Jing Zhang, Shuo Hu, and Qin Jin. "MPMQA: Multimodal Question Answering on Product Manuals." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (2023): 13958–66. http://dx.doi.org/10.1609/aaai.v37i11.26634.

Full text

Abstract:

Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents and only retain textual parts. In this work, to emphasize the importance of multimodal contents, we propose a Multimodal Product Manual Question Answering (MPMQA) task. For each question, MPMQA requires the model not only to process multimodal contents but also to provide multimodal answers. To support MPMQA, a large-scale dataset PM209 is constructed with human annotations, which contains 209 product manua

APA, Harvard, Vancouver, ISO, and other styles

47

Peng, Jiao, Yue He, Yongjuan Chang, et al. "A Social Media Dataset and H-GNN-Based Contrastive Learning Scheme for Multimodal Sentiment Analysis." Applied Sciences 15, no. 2 (2025): 636. https://doi.org/10.3390/app15020636.

Full text

Abstract:

Multimodal sentiment analysis faces a number of challenges, including modality missing, modality heterogeneity gap, incomplete datasets, etc. Previous studies usually adopt schemes like meta-learning or multi-layer structures. Nevertheless, these methods lack interpretability for the interaction between modalities. In this paper, we constructed a new dataset, SM-MSD, for sentiment analysis in social media (SAS) that differs significantly from conventional corpora, comprising 10K instances of diverse data from Twitter, encompassing text, emoticons, emojis, and text embedded in images. This data

APA, Harvard, Vancouver, ISO, and other styles

48

Tang, Yunlong, Daiki Shimada, Jing Bi, Mingqian Feng, Hang Hua, and Chenliang Xu. "Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 7 (2025): 7293–301. https://doi.org/10.1609/aaai.v39i7.32784.

Full text

Abstract:

Large language models (LLMs) have demonstrated remarkable capabilities in natural language and multimodal domains. By fine-tuning multimodal LLMs with temporal annotations from well-annotated datasets, e.g., dense video captioning datasets, their temporal understanding capacity in video-language tasks can be obtained. However, there is a notable lack of untrimmed audio-visual video datasets with precise temporal annotations for events. This deficiency hinders LLMs from learning the alignment between time, audio-visual events, and text tokens, thus impairing their ability to localize audio-visu

APA, Harvard, Vancouver, ISO, and other styles

49

da Silva, Daniel Queirós, Filipe Neves dos Santos, Armando Jorge Sousa, Vítor Filipe, and José Boaventura-Cunha. "Unimodal and Multimodal Perception for Forest Management: Review and Dataset." Computation 9, no. 12 (2021): 127. http://dx.doi.org/10.3390/computation9120127.

Full text

Abstract:

Robotics navigation and perception for forest management are challenging due to the existence of many obstacles to detect and avoid and the sharp illumination changes. Advanced perception systems are needed because they can enable the development of robotic and machinery solutions to accomplish a smarter, more precise, and sustainable forestry. This article presents a state-of-the-art review about unimodal and multimodal perception in forests, detailing the current developed work about perception using a single type of sensors (unimodal) and by combining data from different kinds of sensors (m

APA, Harvard, Vancouver, ISO, and other styles

50

Peng, Haotian, Jiawei Liu, Jinsong Du, Jie Gao, and Wei Wang. "BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 19 (2025): 19866–74. https://doi.org/10.1609/aaai.v39i19.34188.

Full text

Abstract:

We propose a bearing health management framework leveraging large language models (BearLLM), a novel multimodal model that unifies multiple bearing-related tasks by processing user prompts and vibration signals. Specifically, we introduce a prior knowledge-enhanced unified vibration signal representation to handle various working conditions across multiple datasets. This involves adaptively sampling the vibration signals based on the sampling rate of the sensor, incorporating the frequency domain to unify input dimensions, and using a fault-free reference signal as an auxiliary input. To extra

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!