Academic literature on the topic 'Multi-modal dataset'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multi-modal dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multi-modal dataset"

1

Jeong, Changhoon, Sung-Eun Jang, Sanghyuck Na, and Juntae Kim. "Korean Tourist Spot Multi-Modal Dataset for Deep Learning Applications." Data 4, no. 4 (2019): 139. http://dx.doi.org/10.3390/data4040139.

Full text
Abstract:
Recently, deep learning-based methods for solving multi-modal tasks such as image captioning, multi-modal classification, and cross-modal retrieval have attracted much attention. To apply deep learning for such tasks, large amounts of data are needed for training. However, although there are several Korean single-modal datasets, there are not enough Korean multi-modal datasets. In this paper, we introduce a KTS (Korean tourist spot) dataset for Korean multi-modal deep-learning research. The KTS dataset has four modalities (image, text, hashtags, and likes) and consists of 10 classes related to
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Fang, Shenglin Yin, Xiaoying Bai, Minghao Hu, Tianwei Yan, and Yi Liang. "M^3EL: A Multi-task Multi-topic Dataset for Multi-modal Entity Linking." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 12 (2025): 12712–20. https://doi.org/10.1609/aaai.v39i12.33386.

Full text
Abstract:
Multi-modal Entity Linking (MEL) is a fundamental component for various downstream tasks. However, existing MEL datasets suffer from small scale, scarcity of topic types and limited coverage of tasks, making them incapable of effectively enhancing the entity linking capabilities of multi-modal models. To address these obstacles, we propose a dataset construction pipeline and publish M^3EL, a large-scale dataset for MEL. M^3EL includes 79,625 instances, covering 9 diverse multi-modal tasks, and 5 different topics. In addition, to further improve the model's adaptability to multi-modal tasks, We
APA, Harvard, Vancouver, ISO, and other styles
3

Ma’sum, Muhammad Anwar. "Intelligent Clustering and Dynamic Incremental Learning to Generate Multi-Codebook Fuzzy Neural Network for Multi-Modal Data Classification." Symmetry 12, no. 4 (2020): 679. http://dx.doi.org/10.3390/sym12040679.

Full text
Abstract:
Classification in multi-modal data is one of the challenges in the machine learning field. The multi-modal data need special treatment as its features are distributed in several areas. This study proposes multi-codebook fuzzy neural networks by using intelligent clustering and dynamic incremental learning for multi-modal data classification. In this study, we utilized intelligent K-means clustering based on anomalous patterns and intelligent K-means clustering based on histogram information. In this study, clustering is used to generate codebook candidates before the training process, while in
APA, Harvard, Vancouver, ISO, and other styles
4

Chen, Delong, Jianfeng Liu, Wenliang Dai, and Baoyuan Wang. "Visual Instruction Tuning with Polite Flamingo." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (2024): 17745–53. http://dx.doi.org/10.1609/aaai.v38i16.29727.

Full text
Abstract:
Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large Language Models (LLMs) using an assortment of annotated downstream vision-language datasets significantly enhances their performance. Yet, during this process, a side effect, which we termed as the "multi-modal alignment tax", surfaces. This side effect negatively impacts the model's ability to format responses appropriately - for instance, its "politeness" - due to the overly succinct and unformatted nature of raw annotations, resulting in reduced human preference. In this paper, we introduce Polite Flamingo
APA, Harvard, Vancouver, ISO, and other styles
5

Dai, Yin, Yumeng Song, Weibin Liu, et al. "Multi-Focus Image Fusion Based on Convolution Neural Network for Parkinson’s Disease Image Classification." Diagnostics 11, no. 12 (2021): 2379. http://dx.doi.org/10.3390/diagnostics11122379.

Full text
Abstract:
Parkinson’s disease (PD) is a common neurodegenerative disease that has a significant impact on people’s lives. Early diagnosis is imperative since proper treatment stops the disease’s progression. With the rapid development of CAD techniques, there have been numerous applications of computer-aided diagnostic (CAD) techniques in the diagnosis of PD. In recent years, image fusion has been applied in various fields and is valuable in medical diagnosis. This paper mainly adopts a multi-focus image fusion method primarily based on deep convolutional neural networks to fuse magnetic resonance image
APA, Harvard, Vancouver, ISO, and other styles
6

Ma’sum, Muhammad Anwar, Hadaiq Rolis Sanabila, Petrus Mursanto, and Wisnu Jatmiko. "Clustering versus Incremental Learning Multi-Codebook Fuzzy Neural Network for Multi-Modal Data Classification." Computation 8, no. 1 (2020): 6. http://dx.doi.org/10.3390/computation8010006.

Full text
Abstract:
One of the challenges in machine learning is a classification in multi-modal data. The problem needs a customized method as the data has a feature that spreads in several areas. This study proposed a multi-codebook fuzzy neural network classifiers using clustering and incremental learning approaches to deal with multi-modal data classification. The clustering methods used are K-Means and GMM clustering. Experiment result, on a synthetic dataset, the proposed method achieved the highest performance with 84.76% accuracy. Whereas on the benchmark dataset, the proposed method has the highest perfo
APA, Harvard, Vancouver, ISO, and other styles
7

Suryani, Dewi, Valentino Ekaputra, and Andry Chowanda. "Multi-modal Asian Conversation Mobile Video Dataset for Recognition Task." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 5 (2018): 4042. http://dx.doi.org/10.11591/ijece.v8i5.pp4042-4046.

Full text
Abstract:
Images, audio, and videos have been used by researchers for a long time to develop several tasks regarding human facial recognition and emotion detection. Most of the available datasets usually focus on either static expression, a short video of changing emotion from neutral to peak emotion, or difference in sounds to detect the current emotion of a person. Moreover, the common datasets were collected and processed in the United States (US) or Europe, and only several datasets were originated from Asia. In this paper, we present our effort to create a unique dataset that can fill in the gap by
APA, Harvard, Vancouver, ISO, and other styles
8

Dewi, Suryani, Ekaputra Valentino, and Chowanda Andry. "Multi-modal Asian Conversation Mobile Video Dataset for Recognition Task." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 5 (2018): 4042–46. https://doi.org/10.11591/ijece.v8i5.pp4042-4046.

Full text
Abstract:
Images, audio, and videos have been used by researchers for a long time to develop several tasks regarding human facial recognition and emotion detection. Most of the available datasets usually focus on either static expression, a short video of changing emotion from neutral to peak emotion, or difference in sounds to detect the current emotion of a person. Moreover, the common datasets were collected and processed in the United States (US) or Europe, and only several datasets were originated from Asia. In this paper, we present our effort to create a unique dataset that can fill in the gap by
APA, Harvard, Vancouver, ISO, and other styles
9

Guan, Wenhao, Yishuang Li, Tao Li, et al. "MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (2024): 18117–25. http://dx.doi.org/10.1609/aaai.v38i16.29769.

Full text
Abstract:
The style transfer task in Text-to-Speech (TTS) refers to the process of transferring style information into text content to generate corresponding speech with a specific style. However, most existing style transfer approaches are either based on fixed emotional labels or reference speech clips, which cannot achieve flexible style transfer. Recently, some methods have adopted text descriptions to guide style transfer. In this paper, we propose a more flexible multi-modal and style controllable TTS framework named MM-TTS. It can utilize any modality as the prompt in unified multi-modal prompt s
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Bingbing, Yiming Du, Bin Liang, et al. "A New Formula for Sticker Retrieval: Reply with Stickers in Multi-Modal and Multi-Session Conversation." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 24 (2025): 25327–35. https://doi.org/10.1609/aaai.v39i24.34720.

Full text
Abstract:
Stickers are widely used in online chatting, which can vividly express someone's intention, emotion, or attitude. Existing conversation research typically retrieves stickers based on a single session or the previous textual information, which can not adapt to the multi-modal and multi-session nature of the real-world conversation. To this end, we introduce MultiChat, a new dataset for sticker retrieval facing the multi-modal and multi-session conversation, comprising 1,542 sessions, featuring 50,192 utterances and 2,182 stickers. Based on the created dataset, we propose a novel Intent-Guided S
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "Multi-modal dataset"

1

Abdulla, Salwa, Suadad Muammar, and Khaled Shaalan. "Pre-training on Multi-modal for Improved Persona Detection Using Multi Datasets." In BUiD Doctoral Research Conference 2023. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-56121-4_22.

Full text
Abstract:
AbstractPersona identification helps AI-based communication systems provide personalized and situationally informed interactions. This paper introduces pre-training on CNN, BERT, and GPT models to improve persona detection on PMPC and ROCStories datasets. Two speakers with different personalities have dialogues in the PMPC dataset. The challenge is to match each speaker to their persona. The ROCStories dataset contains fictional character traits and activities. Our study uses transformer-based design to improve persona detection using ROCStories dataset external context. We compare our method
APA, Harvard, Vancouver, ISO, and other styles
2

Yamamoto, Shuhei, and Noriko Kando. "Temporal Closeness for Enhanced Cross-Modal Retrieval of Sensor and Image Data." In Lecture Notes in Computer Science. Springer Nature Singapore, 2025. https://doi.org/10.1007/978-981-96-2071-5_13.

Full text
Abstract:
AbstractThis paper presents a new approach to dense retrieval across multiple modalities, emphasizing the integration of images and sensor data. Traditional cross-modal retrieval techniques face significant challenges, particularly in processing non-linguistic modalities and creating effective training datasets. To address these issues, we propose a method that uses a shared vector space, optimized with contrastive loss, to enable efficient and accurate retrieval across diverse modalities. A key innovation of our approach is the introduction of a temporal closeness metric, which evaluates the
APA, Harvard, Vancouver, ISO, and other styles
3

Wei, Haolin, David S. Monaghan, Noel E. O’Connor, and Patricia Scanlon. "A New Multi-modal Dataset for Human Affect Analysis." In Human Behavior Understanding. Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-11839-0_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zhou, Chuhao, Yuanrong Xu, Fanglin Chen, and Guangming Lu. "Multi-modal Finger Feature Fusion Algorithms on Large-Scale Dataset." In Pattern Recognition and Computer Vision. Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-18910-4_42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Benchekroun, Mouna, Dan Istrate, Vincent Zalc, and Dominique Lenne. "A Multi-Modal Dataset (MMSD) for Acute Stress Bio-Markers." In Biomedical Engineering Systems and Technologies. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-38854-5_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ruosch, Florian, Rosni Vasu, Ruijie Wang, Luca Rossetto, and Abraham Bernstein. "Single-Label Multi-modal Field of Research Classification." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-65794-8_15.

Full text
Abstract:
AbstractThe automated field of research classification for scientific papers is still challenging, even with modern tools such as large language models. As part of a shared task tackling this problem, this paper presents our contribution SLAMFORC, an approach to single-label classification using multi-modal data. We combined the metadata of papers with their full text and, where available, images into a pipeline to predict their field of research with an ensemble voting on traditional classifiers and large language models. We evaluated our approach on the shared task dataset and scored the hig
APA, Harvard, Vancouver, ISO, and other styles
7

Cai, Zhongang, Daxuan Ren, Ailing Zeng, et al. "HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20071-7_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Xiong, Minghui Wang, Ming Zeng, Wenxiong Kang, and Feiqi Deng. "HuMoMM: A Multi-Modal Dataset and Benchmark for Human Motion Analysis." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-46305-1_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Kalanadhabhatta, Manasa, Chulhong Min, Alessandro Montanari, and Fahim Kawsar. "FatigueSet: A Multi-modal Dataset for Modeling Mental Fatigue and Fatigability." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-99194-4_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Liu, Zelong, Peyton Smith, Alexander Lautin, et al. "RadImageGAN – A Multi-modal Dataset-Scale Generative AI for Medical Imaging." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2025. https://doi.org/10.1007/978-3-031-82007-6_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multi-modal dataset"

1

Chen, Hao, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, and Jianbo Jiao. "$360+x$: A Panoptic Multi-modal Scene Understanding Dataset." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2024. http://dx.doi.org/10.1109/cvpr52733.2024.01833.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Schön, Markus, Jona Ruof, Thomas Wodtko, Michael Buchholz, and Klaus Dietmayer. "The ADUULM-360 Dataset - A Multi-Modal Dataset for Depth Estimation in Adverse Weather." In 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2024. https://doi.org/10.1109/itsc58415.2024.10920201.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Quesada, Jorge, Mohammad Alotaibi, Mohit Prabhushankar, and Ghassan AlRegib. "PointPrompt: A Multi-modal Prompting Dataset for Segment Anything Model." In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2024. http://dx.doi.org/10.1109/cvprw63382.2024.00167.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Xiao, Mingrui, Zijian Zeng, Yue Zheng, Shu Yang, Yali Li, and Shengjin Wang. "A Dataset with Multi-Modal Information and Multi-Granularity Descriptions for Video Captioning." In 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2024. http://dx.doi.org/10.1109/icme57554.2024.10688196.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Hu, Bo, Wei Wang, Chunyi Li, Lihuo He, Leida Li, and Xinbo Gao. "A Multi-annotated and Multi-modal Dataset for Wide-angle Video Quality Assessment." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10888534.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Schneider, Florian, and Chris Biemann. "WISMIR3: A Multi-Modal Dataset to Challenge Text-Image Retrieval Approaches." In Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR). Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.alvr-1.1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ho, Gia-Bao, Chang Tan, Zahra Darban, Mahsa Salehi, Reza Haf, and Wray Buntine. "MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations." In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.acl-short.30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gong, Chen, DeXin Kong, Suxian Zhao, Xingyu Li, and Guohong Fu. "MODDP: A Multi-modal Open-domain Chinese Dataset for Dialogue Discourse Parsing." In Findings of the Association for Computational Linguistics ACL 2024. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.findings-acl.628.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wu, Hao, Ke Lu, Yuqiu Li, Junhao Huang, and Jian Xue. "MISTA: A Large-Scale Dataset for Multi-Modal Instruction Tuning on Aerial Images." In 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2024. http://dx.doi.org/10.1109/icme57554.2024.10687493.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Yuan, Shenghai, Yizhuo Yang, Thien Hoang Nguyen, et al. "MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats." In 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024. http://dx.doi.org/10.1109/icra57147.2024.10610957.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Multi-modal dataset"

1

Ehiabhi, Jolly, and Haifeng Wang. A Systematic Review of Machine Learning Models in Mental Health Analysis Based on Multi-Channel Multi-Modal Biometric Signals. INPLASY - International Platform of Registered Systematic Review and Meta-analysis Protocols, 2023. http://dx.doi.org/10.37766/inplasy2023.2.0003.

Full text
Abstract:
Review question / Objective: A systematic review of Mental health diagnosis/prognoses of mental disorders using Machine Learning techniques with information from biometric signals. A review of the trend and status of these ML techniques in mental health diagnosis and an investigation of how these signals are used to help increase the efficiency of mental health disease diagnosis. Using Machine learning techniques to classify mental health diseases as against using only expert knowledge for diagnosis. Feature Extraction from signal gotten from biometric signals that help classify sleep disorder
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!