Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Multimodal Embeddings.

Статті в журналах з теми "Multimodal Embeddings"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Multimodal Embeddings".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev, and Alexander Panchenko. "On Isotropy of Multimodal Embeddings." Information 14, no. 7 (2023): 392. http://dx.doi.org/10.3390/info14070392.

Повний текст джерела
Анотація:
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Guo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi, and Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.

Повний текст джерела
Анотація:
The multimodal recommendation has gradually become the infrastructure of online media platforms, enabling them to provide personalized service to users through a joint modeling of user historical behaviors (e.g., purchases, clicks) and item various modalities (e.g., visual and textual). The majority of existing studies typically focus on utilizing modal features or modal-related graph structure to learn user local interests. Nevertheless, these approaches encounter two limitations: (1) Shared updates of user ID embeddings result in the consequential coupling between collaboration and multimoda
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Shang, Bin, Yinliang Zhao, Jun Liu, and Di Wang. "LAFA: Multimodal Knowledge Graph Completion with Link Aware Fusion and Aggregation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8957–65. http://dx.doi.org/10.1609/aaai.v38i8.28744.

Повний текст джерела
Анотація:
Recently, an enormous amount of research has emerged on multimodal knowledge graph completion (MKGC), which seeks to extract knowledge from multimodal data and predict the most plausible missing facts to complete a given multimodal knowledge graph (MKG). However, existing MKGC approaches largely ignore that visual information may introduce noise and lead to uncertainty when adding them to the traditional KG embeddings due to the contribution of each associated image to entity is different in diverse link scenarios. Moreover, treating each triple independently when learning entity embeddings le
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Sun, Zhongkai, Prathusha Sarma, William Sethares, and Yingyu Liang. "Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8992–99. http://dx.doi.org/10.1609/aaai.v34i05.6431.

Повний текст джерела
Анотація:
Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment anal
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Повний текст джерела
Анотація:
AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves re
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Mihail Mateev. "Comparative Analysis on Implementing Embeddings for Image Analysis." Journal of Information Systems Engineering and Management 10, no. 17s (2025): 89–102. https://doi.org/10.52783/jisem.v10i17s.2710.

Повний текст джерела
Анотація:
This research explores how artificial intelligence enhances construction maintenance and diagnostics, achieving 95% accuracy on a dataset of 10,000 cases. The findings highlight AI's potential to revolutionize predictive maintenance in the industry. The growing adoption of image embeddings has transformed visual data processing across AI applications. This study evaluates embedding implementations in major platforms, including Azure AI, OpenAI's GPT-4 Vision, and frameworks like Hugging Face, Replicate, and Eden AI. It assesses their scalability, accuracy, cost-effectiveness, and integration f
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Tang, Zhenchao, Jiehui Huang, Guanxing Chen, and Calvin Yu-Chian Chen. "Comprehensive View Embedding Learning for Single-Cell Multimodal Integration." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15292–300. http://dx.doi.org/10.1609/aaai.v38i14.29453.

Повний текст джерела
Анотація:
Motivation: Advances in single-cell measurement techniques provide rich multimodal data, which helps us to explore the life state of cells more deeply. However, multimodal integration, or, learning joint embeddings from multimodal data remains a current challenge. The difficulty in integrating unpaired single-cell multimodal data is that different modalities have different feature spaces, which easily leads to information loss in joint embedding. And few existing methods have fully exploited and fused the information in single-cell multimodal data. Result: In this study, we propose CoVEL, a de
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Zhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.

Повний текст джерела
Анотація:
Previous work has shown the effectiveness of using event representations for tasks such as script event prediction and stock market prediction. It is however still challenging to learn the subtle semantic differences between events based solely on textual descriptions of events often represented as (subject, predicate, object) triples. As an alternative, images offer a more intuitive way of understanding event semantics. We observe that event described in text and in images show different abstraction levels and therefore should be projected onto heterogeneous embedding spaces, as opposed to wh
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Sah, Shagan, Sabarish Gopalakishnan, and Raymond Ptucha. "Aligned attention for common multimodal embeddings." Journal of Electronic Imaging 29, no. 02 (2020): 1. http://dx.doi.org/10.1117/1.jei.29.2.023013.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Alkaabi, Hussein, Ali Kadhim Jasim, and Ali Darroudi. "From Static to Contextual: A Survey of Embedding Advances in NLP." PERFECT: Journal of Smart Algorithms 2, no. 2 (2025): 57–66. https://doi.org/10.62671/perfect.v2i2.77.

Повний текст джерела
Анотація:
Embedding techniques have been a cornerstone of Natural Language Processing (NLP), enabling machines to represent textual data in a form that captures semantic and syntactic relationships. Over the years, the field has witnessed a significant evolution—from static word embeddings, such as Word2Vec and GloVe, which represent words as fixed vectors, to dynamic, contextualized embeddings like BERT and GPT, which generate word representations based on their surrounding context. This survey provides a comprehensive overview of embedding techniques, tracing their development from early methods to st
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Zhang, Rongchao, Yiwei Lou, Dexuan Xu, Yongzhi Cao, Hanpin Wang, and Yu Huang. "A Learnable Discrete-Prior Fusion Autoencoder with Contrastive Learning for Tabular Data Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (2024): 16803–11. http://dx.doi.org/10.1609/aaai.v38i15.29621.

Повний текст джерела
Анотація:
The actual collection of tabular data for sharing involves confidentiality and privacy constraints, leaving the potential risks of machine learning for interventional data analysis unsafely averted. Synthetic data has emerged recently as a privacy-protecting solution to address this challenge. However, existing approaches regard discrete and continuous modal features as separate entities, thus falling short in properly capturing their inherent correlations. In this paper, we propose a novel contrastive learning guided Gaussian Transformer autoencoder, termed GTCoder, to synthesize photo-realis
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Повний текст джерела
Анотація:
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Khalifa, Omar Yasser Ibrahim, and Muhammad Zafran Muhammad Zaly Shah. "MultiPhishNet: A Multimodal Approach of QR Code Phishing Detection using Multi-Head Attention and Multilingual Embeddings." International Journal of Innovative Computing 15, no. 1 (2025): 53–61. https://doi.org/10.11113/ijic.v15n1.512.

Повний текст джерела
Анотація:
Phishing attacks leveraging QR codes have become a significant threat due to their increasing use in contactless services. These attacks are challenging to detect since QR codes typically encode URLs leading to phishing websites designed to steal sensitive information. Existing detection methods often rely on blacklists or handcrafted features, which are inadequate for handling obfuscated URLs and multilingual content. This paper proposes MultiPhishNet, a multimodal phishing detection model that integrates advanced embedding techniques, Convolutional Neural Networks (CNNs), and multi-head atte
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Waqas, Asim, Aakash Tripathi, Mia Naeini, Paul A. Stewart, Matthew B. Schabath, and Ghulam Rasool. "Abstract 991: PARADIGM: an embeddings-based multimodal learning framework with foundation models and graph neural networks." Cancer Research 85, no. 8_Supplement_1 (2025): 991. https://doi.org/10.1158/1538-7445.am2025-991.

Повний текст джерела
Анотація:
Abstract Introduction: Cancer research faces significant challenges in integrating heterogeneous data across varying spatial and temporal scales, limiting the ability to gain a comprehensive understanding of the disease. PARADIGM(Pan-Cancer Embeddings Representation using Advanced Multimodal Learning with Graph-based Modeling) addresses this challenge by providing a framework leveraging foundation models (FMs) and Graph Neural Networks (GNN). PARADIGM framework generates embeddings from multi-resolution datasets using modality-specific FMs, aggregates sample embeddings, fuses them into a unifi
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Li, Xiaolong, Yang Dong, Yunfei Yi, Zhixun Liang, and Shuqi Yan. "Hypergraph Neural Network for Multimodal Depression Recognition." Electronics 13, no. 22 (2024): 4544. http://dx.doi.org/10.3390/electronics13224544.

Повний текст джерела
Анотація:
Deep learning-based approaches for automatic depression recognition offer advantages of low cost and high efficiency. However, depression symptoms are challenging to detect and vary significantly between individuals. Traditional deep learning methods often struggle to capture and model these nuanced features effectively, leading to lower recognition accuracy. This paper introduces a novel multimodal depression recognition method, HYNMDR, which utilizes hypergraphs to represent the complex, high-order relationships among patients with depression. HYNMDR comprises two primary components: a tempo
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Zhu, Chaoyu, Zhihao Yang, Xiaoqiong Xia, Nan Li, Fan Zhong, and Lei Liu. "Multimodal reasoning based on knowledge graph embedding for specific diseases." Bioinformatics 38, no. 8 (2022): 2235–45. http://dx.doi.org/10.1093/bioinformatics/btac085.

Повний текст джерела
Анотація:
Abstract Motivation Knowledge Graph (KG) is becoming increasingly important in the biomedical field. Deriving new and reliable knowledge from existing knowledge by KG embedding technology is a cutting-edge method. Some add a variety of additional information to aid reasoning, namely multimodal reasoning. However, few works based on the existing biomedical KGs are focused on specific diseases. Results This work develops a construction and multimodal reasoning process of Specific Disease Knowledge Graphs (SDKGs). We construct SDKG-11, a SDKG set including five cancers, six non-cancer diseases, a
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Tripathi, Aakash Gireesh, Asim Waqas, Yasin Yilmaz, Matthew B. Schabath, and Ghulam Rasool. "Abstract 3641: Predicting treatment outcomes using cross-modality correlations in multimodal oncology data." Cancer Research 85, no. 8_Supplement_1 (2025): 3641. https://doi.org/10.1158/1538-7445.am2025-3641.

Повний текст джерела
Анотація:
Abstract Accurate prediction of treatment outcomes in oncology requires modeling the intricate relationships across diverse data modalities. This study investigates cross-modality correlations by leveraging imaging and clinical data curated through the Multimodal Integration of Oncology Data System (MINDS) and HoneyBee frameworks to uncover actionable patterns for personalized treatment strategies. Using data from over 10, 000 cancer patients, we developed a machine learning pipeline that employs advanced embedding techniques to capture associations between radiological imaging phenotypes and
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Tripathi, Aakash, Asim Waqas, Yasin Yilmaz, and Ghulam Rasool. "Abstract 4905: Multimodal transformer model improves survival prediction in lung cancer compared to unimodal approaches." Cancer Research 84, no. 6_Supplement (2024): 4905. http://dx.doi.org/10.1158/1538-7445.am2024-4905.

Повний текст джерела
Анотація:
Abstract Integrating multimodal lung data including clinical notes, medical images, and molecular data is critical for predictive modeling tasks like survival prediction, yet effectively aligning these disparate data types remains challenging. We present a novel method to integrate heterogeneous lung modalities by first thoroughly analyzing various domain-specific models and selecting the optimal model for embedding feature extraction per data type based on performance on representative pretrained tasks. For clinical notes, the GatorTron models showed the lowest regression loss on an initial e
Стилі APA, Harvard, Vancouver, ISO та ін.
19

Xu, Jinfeng, Zheyu Chen, Shuo Yang, Jinze Li, Hewei Wang, and Edith C. H. Ngai. "MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 12 (2025): 12908–17. https://doi.org/10.1609/aaai.v39i12.33408.

Повний текст джерела
Анотація:
As multimedia information proliferates, multimodal recommendation systems have garnered significant attention. These systems leverage multimodal information to alleviate the data sparsity issue inherent in recommendation systems, thereby enhancing the accuracy of recommendations. Due to the natural semantic disparities among multimodal features, recent research has primarily focused on cross-modal alignment using self-supervised learning to bridge these gaps. However, aligning different modal features might result in the loss of valuable interaction information, distancing them from ID embeddi
Стилі APA, Harvard, Vancouver, ISO та ін.
20

Ota, Kosuke, Keiichiro Shirai, Hidetoshi Miyao, and Minoru Maruyama. "Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings." Journal of Advanced Computational Intelligence and Intelligent Informatics 26, no. 6 (2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.

Повний текст джерела
Анотація:
In this work, we study the application of multimodal analogical reasoning to image retrieval. Multimodal analogy questions are given in a form of tuples of words and images, e.g., “cat”:“dog”::[an image of a cat sitting on a bench]:?, to search for an image of a dog sitting on a bench. Retrieving desired images given these tuples can be seen as a task of finding images whose relation between the query image is close to that of query words. One way to achieve the task is building a common vector space that exhibits analogical regularities. To learn such an embedding, we propose a quadruple neur
Стилі APA, Harvard, Vancouver, ISO та ін.
21

Yi, Moung-Ho, Keun-Chang Kwak, and Ju-Hyun Shin. "KoHMT: A Multimodal Emotion Recognition Model Integrating KoELECTRA, HuBERT with Multimodal Transformer." Electronics 13, no. 23 (2024): 4674. http://dx.doi.org/10.3390/electronics13234674.

Повний текст джерела
Анотація:
With the advancement of human-computer interaction, the role of emotion recognition has become increasingly significant. Emotion recognition technology provides practical benefits across various industries, including user experience enhancement, education, and organizational productivity. For instance, in educational settings, it enables real-time understanding of students’ emotional states, facilitating tailored feedback. In workplaces, monitoring employees’ emotions can contribute to improved job performance and satisfaction. Recently, emotion recognition has also gained attention in media a
Стилі APA, Harvard, Vancouver, ISO та ін.
22

Mai, Sijie, Haifeng Hu, and Songlong Xing. "Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (2020): 164–72. http://dx.doi.org/10.1609/aaai.v34i01.5347.

Повний текст джерела
Анотація:
Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore
Стилі APA, Harvard, Vancouver, ISO та ін.
23

Kapil Adhar Wagh. "A Review: Word Embedding Models with Machine Learning Based Context Depend and Context Independent Techniques." Advances in Nonlinear Variational Inequalities 28, no. 3s (2024): 251–58. https://doi.org/10.52783/anvi.v28.2928.

Повний текст джерела
Анотація:
Natural language processing (NLP) has been transformed by word embedding models, which convert text into meaningful numerical representations. These models fall into two general categories: context-dependent methods like ELMo, BERT, and GPT, and context-independent methods like Word2Vec, GloVe, and FastText. Although static word representations are provided by context-independent models, polysemy and contextual subtleties are difficult for them to capture. These issues are addressed by context-dependent approaches that make use of sophisticated deep learning architectures to produce dynamic em
Стилі APA, Harvard, Vancouver, ISO та ін.
24

Kim, Donghyun, Kuniaki Saito, Kate Saenko, Stan Sclaroff, and Bryan Plummer. "MULE: Multimodal Universal Language Embedding." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11254–61. http://dx.doi.org/10.1609/aaai.v34i07.6785.

Повний текст джерела
Анотація:
Existing vision-language methods typically support two languages at a time at most. In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages. We accomplish this by learning a single shared Multimodal Universal Language Embedding (MULE) which has been visually-semantically aligned across all languages. Then we learn to relate MULE to visual data as if it were a single language. Our method is not architecture specific, unlike prior work which typically learned separate branches for each language, enabli
Стилі APA, Harvard, Vancouver, ISO та ін.
25

Vijay Vaibhav Singh. "Vector Embeddings: The Mathematical Foundation of Modern AI Systems." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 11, no. 1 (2025): 2408–17. https://doi.org/10.32628/cseit251112257.

Повний текст джерела
Анотація:
This comprehensive article examines vector embeddings as a fundamental component of modern artificial intelligence systems, detailing their mathematical foundations, key properties, implementation techniques, and practical applications. The article traces the evolution from basic word embeddings to sophisticated transformer-based architectures, highlighting how these representations enable machines to capture and process semantic relationships in human language and visual data. The article encompasses both theoretical frameworks and practical implementations, from the groundbreaking Word2Vec a
Стилі APA, Harvard, Vancouver, ISO та ін.
26

Wehrmann, Jônatas, Anderson Mattjie, and Rodrigo C. Barros. "Order embeddings and character-level convolutions for multimodal alignment." Pattern Recognition Letters 102 (January 2018): 15–22. http://dx.doi.org/10.1016/j.patrec.2017.11.020.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
27

Mithun, Niluthpol C., Juncheng Li, Florian Metze, and Amit K. Roy-Chowdhury. "Joint embeddings with multimodal cues for video-text retrieval." International Journal of Multimedia Information Retrieval 8, no. 1 (2019): 3–18. http://dx.doi.org/10.1007/s13735-018-00166-3.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
28

Fodor, Ádám, András Lőrincz, and Rachid R. Saboundji. "Enhancing apparent personality trait analysis with cross-modal embeddings." Annales Universitatis Scientiarum Budapestinensis de Rolando Eötvös Nominatae. Sectio computatorica 57 (2024): 167–85. https://doi.org/10.71352/ac.57.167.

Повний текст джерела
Анотація:
utomatic personality trait assessment is essential for high-quality human-machine interactions. Systems capable of human behavior analysis could be used for self-driving cars, medical research, and surveillance, among many others. We present a multimodal deep neural network with a distance learning network extension for apparent personality trait prediction trained on short video recordings and exploiting modality invariant embeddings. Acoustic, visual, and textual information are utilized to reach high-performance solutions in this task. Due to the highly centralized target distribution of th
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Roshan, Nayak, S. Ullas Kannantha B, S. Kruthi, and Gururaj C. "Multimodal Offensive Meme Classification Using Transformers and BiLSTM." International Journal of Engineering and Advanced Technology (IJEAT) 11, no. 3 (2022): 96–102. https://doi.org/10.35940/ijeat.C3392.0211322.

Повний текст джерела
Анотація:
<strong>Abstract:</strong> Nowadays memes have become a way in which people express their ideas on social media. These memes can convey various views including offensive ones. Memes can be intended for a personal attack, homophobic abuse, racial abuse, attack on minority etc. The memes are implicit and multi-modal in nature. Here we analyze the meme by categorizing them as offensive or not offensive and this becomes a binary classification problem. We propose a novel offensive meme classification using the transformer-based image encoder, BiLSTM for text with mean pooling as text encoder and a
Стилі APA, Harvard, Vancouver, ISO та ін.
30

Nayak, Roshan, B. S. Ullas Kannantha, Kruthi S, and C. Gururaj. "Multimodal Offensive Meme Classification u sing Transformers and BiLSTM." International Journal of Engineering and Advanced Technology 11, no. 3 (2022): 96–102. http://dx.doi.org/10.35940/ijeat.c3392.0211322.

Повний текст джерела
Анотація:
Nowadays memes have become a way in which people express their ideas on social media. These memes can convey various views including offensive ones. Memes can be intended for a personal attack, homophobic abuse, racial abuse, attack on minority etc. The memes are implicit and multi-modal in nature. Here we analyze the meme by categorizing them as offensive or not offensive and this becomes a binary classification problem. We propose a novel offensive meme classification using the transformer-based image encoder, BiLSTM for text with mean pooling as text encoder and a Feed-Forward Network as a
Стилі APA, Harvard, Vancouver, ISO та ін.
31

Chen, Weijia, Zhijun Lu, Lijue You, Lingling Zhou, Jie Xu, and Ken Chen. "Artificial Intelligence–Based Multimodal Risk Assessment Model for Surgical Site Infection (AMRAMS): Development and Validation Study." JMIR Medical Informatics 8, no. 6 (2020): e18186. http://dx.doi.org/10.2196/18186.

Повний текст джерела
Анотація:
Background Surgical site infection (SSI) is one of the most common types of health care–associated infections. It increases mortality, prolongs hospital length of stay, and raises health care costs. Many institutions developed risk assessment models for SSI to help surgeons preoperatively identify high-risk patients and guide clinical intervention. However, most of these models had low accuracies. Objective We aimed to provide a solution in the form of an Artificial intelligence–based Multimodal Risk Assessment Model for Surgical site infection (AMRAMS) for inpatients undergoing operations, us
Стилі APA, Harvard, Vancouver, ISO та ін.
32

N.D., Smelik. "Multimodal topic model for texts and images utilizing their embeddings." Machine Learning and Data Analysis 2, no. 4 (2016): 421–41. http://dx.doi.org/10.21469/22233792.2.4.05.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Abdou, Ahmed, Ekta Sood, Philipp Müller, and Andreas Bulling. "Gaze-enhanced Crossmodal Embeddings for Emotion Recognition." Proceedings of the ACM on Human-Computer Interaction 6, ETRA (2022): 1–18. http://dx.doi.org/10.1145/3530879.

Повний текст джерела
Анотація:
Emotional expressions are inherently multimodal -- integrating facial behavior, speech, and gaze -- but their automatic recognition is often limited to a single modality, e.g. speech during a phone call. While previous work proposed crossmodal emotion embeddings to improve monomodal recognition performance, despite its importance, an explicit representation of gaze was not included. We propose a new approach to emotion recognition that incorporates an explicit representation of gaze in a crossmodal emotion embedding framework. We show that our method outperforms the previous state of the art f
Стилі APA, Harvard, Vancouver, ISO та ін.
34

Hu, Wenbo, Yifan Xu, Yi Li, Weiyue Li, Zeyuan Chen, and Zhuowen Tu. "BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (2024): 2256–64. http://dx.doi.org/10.1609/aaai.v38i3.27999.

Повний текст джерела
Анотація:
Vision Language Models (VLMs), which extend Large Language Models (LLM) by incorporating visual understanding capability, have demonstrated significant advancements in addressing open-ended visual question-answering (VQA) tasks. However, these models cannot accurately interpret images infused with text, a common occurrence in real-world scenarios. Standard procedures for extracting information from images often involve learning a fixed set of query embeddings. These embeddings are designed to encapsulate image contexts and are later used as soft prompt inputs in LLMs. Yet, this process is limi
Стилі APA, Harvard, Vancouver, ISO та ін.
35

Chen, Qihua, Xuejin Chen, Chenxuan Wang, Yixiong Liu, Zhiwei Xiong, and Feng Wu. "Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 2 (2024): 1174–82. http://dx.doi.org/10.1609/aaai.v38i2.27879.

Повний текст джерела
Анотація:
The current neuron reconstruction pipeline for electron microscopy (EM) data usually includes automatic image segmentation followed by extensive human expert proofreading. In this work, we aim to reduce human workload by predicting connectivity between over-segmented neuron pieces, taking both microscopy image and 3D morphology features into account, similar to human proofreading workflow. To this end, we first construct a dataset, named FlyTracing, that contains millions of pairwise connections of segments expanding the whole fly brain, which is three orders of magnitude larger than existing
Стилі APA, Harvard, Vancouver, ISO та ін.
36

Shen, Aili, Bahar Salehi, Jianzhong Qi, and Timothy Baldwin. "A General Approach to Multimodal Document Quality Assessment." Journal of Artificial Intelligence Research 68 (July 22, 2020): 607–32. http://dx.doi.org/10.1613/jair.1.11647.

Повний текст джерела
Анотація:
&#x0D; &#x0D; &#x0D; The perceived quality of a document is affected by various factors, including grammat- icality, readability, stylistics, and expertise depth, making the task of document quality assessment a complex one. In this paper, we explore this task in the context of assessing the quality of Wikipedia articles and academic papers. Observing that the visual rendering of a document can capture implicit quality indicators that are not present in the document text — such as images, font choices, and visual layout — we propose a joint model that combines the text content with a visual re
Стилі APA, Harvard, Vancouver, ISO та ін.
37

Sata, Ikumi, Motoki Amagasaki, and Masato Kiyama. "Multimodal Retrieval Method for Images and Diagnostic Reports Using Cross-Attention." AI 6, no. 2 (2025): 38. https://doi.org/10.3390/ai6020038.

Повний текст джерела
Анотація:
Background: Conventional medical image retrieval methods treat images and text as independent embeddings, limiting their ability to fully utilize the complementary information from both modalities. This separation often results in suboptimal retrieval performance, as the intricate relationships between images and text remain underexplored. Methods: To address this limitation, we propose a novel retrieval method that integrates medical image and text embeddings using a cross-attention mechanism. Our approach creates a unified representation by directly modeling the interactions between the two
Стилі APA, Harvard, Vancouver, ISO та ін.
38

Kiran Chitturi. "Demystifying Multimodal AI: A Technical Deep Dive." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 10, no. 6 (2024): 2011–17. https://doi.org/10.32628/cseit2410612394.

Повний текст джерела
Анотація:
This article explores the transformative impact of multimodal AI systems in bridging diverse data types and processing capabilities. It examines how these systems have revolutionized various domains through their ability to handle multiple modalities simultaneously, from visual-linguistic understanding to complex search operations. The article delves into the technical foundations of multimodal embeddings, analyzes leading models like CLIP and MUM, and investigates their real-world applications across different sectors. Through a detailed examination of current implementations, challenges, and
Стилі APA, Harvard, Vancouver, ISO та ін.
39

Tokar, Tomas, and Scott Sanner. "ICE-T: Interactions-aware Cross-column Contrastive Embedding for Heterogeneous Tabular Datasets." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 20 (2025): 20904–11. https://doi.org/10.1609/aaai.v39i20.35385.

Повний текст джерела
Анотація:
Finding high-quality representations of heterogeneous tabular datasets is crucial for their effective use in downstream machine learning tasks. Contrastive representation learning (CRL) methods have been previously shown to provide a straightforward way to learn such representations across various data domains. Current tabular CRL methods learn joint embeddings of data instances (tabular rows) by minimizing a contrastive loss between the original instance and its perturbations. Unlike existing tabular CRL methods, we propose leveraging frameworks established in multimodal representation learni
Стилі APA, Harvard, Vancouver, ISO та ін.
40

Ma, Shukui, Pengyuan Ma, Shuaichao Feng, Fei Ma, and Guangping Zhuo. "Multimodal Data-Based Text Generation Depression Classification Model." International Journal of Computer Science and Information Technology 5, no. 1 (2025): 175–93. https://doi.org/10.62051/ijcsit.v5n1.16.

Повний текст джерела
Анотація:
Depression classification often relies on multimodal features, but existing models struggle to capture the similarity between multimodal features. Moreover, the social stigma surrounding depression leads to limited availability of datasets, which constrains model accuracy. This study aims to improve multimodal depression recognition methods by proposing a Multimodal Generation-Text Depression Classification Model. The model introduces a Multimodal-Deep-Extract-Feature Net to capture both long- and short-term sequential features. A Dual Text Contrastive Learning Module is employed to generate e
Стилі APA, Harvard, Vancouver, ISO та ін.
41

Zhang, Jianqiang, Renyao Chen, Shengwen Li, Tailong Li, and Hong Yao. "MGKGR: Multimodal Semantic Fusion for Geographic Knowledge Graph Representation." Algorithms 17, no. 12 (2024): 593. https://doi.org/10.3390/a17120593.

Повний текст джерела
Анотація:
Geographic knowledge graph representation learning embeds entities and relationships in geographic knowledge graphs into a low-dimensional continuous vector space, which serves as a basic method that bridges geographic knowledge graphs and geographic applications. Previous geographic knowledge graph representation methods primarily learn the vectors of entities and their relationships from their spatial attributes and relationships, which ignores various semantics of entities, resulting in poor embeddings on geographic knowledge graphs. This study proposes a two-stage multimodal geographic kno
Стилі APA, Harvard, Vancouver, ISO та ін.
42

Jyoti, Arora, Khapekar Priyal, and Pal Rakhi. "Multimodal Sentiment Analysis using LSTM and RoBerta." Advanced Innovations in Computer Programming Languages 5, no. 2 (2023): 24–35. https://doi.org/10.5281/zenodo.8130701.

Повний текст джерела
Анотація:
<em>Social media is a valuable data source for understanding people&#39;s thoughts and feelings. Sentiment analysis and affective computing help analyze sentiment and emotions in social media posts. Our research paper proposes a model for tweet emotions analysis using LSTM, GloVe embeddings, and RoBERTa. This model captures sequential dependencies in tweets, leverages semantic representations, and enhances contextual understanding. We evaluate the model on a tweet emotions dataset, demonstrating its effectiveness in accurately classifying emotions in tweets.Through evaluation on a tweet emotio
Стилі APA, Harvard, Vancouver, ISO та ін.
43

Tseng, Shao-Yen, Shrikanth Narayanan, and Panayiotis Georgiou. "Multimodal Embeddings From Language Models for Emotion Recognition in the Wild." IEEE Signal Processing Letters 28 (2021): 608–12. http://dx.doi.org/10.1109/lsp.2021.3065598.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
44

Jing, Xuebin, Liang He, Zhida Song, and Shaolei Wang. "Audio–Visual Fusion Based on Interactive Attention for Person Verification." Sensors 23, no. 24 (2023): 9845. http://dx.doi.org/10.3390/s23249845.

Повний текст джерела
Анотація:
With the rapid development of multimedia technology, personnel verification systems have become increasingly important in the security field and identity verification. However, unimodal verification systems have performance bottlenecks in complex scenarios, thus triggering the need for multimodal feature fusion methods. The main problem with audio–visual multimodal feature fusion is how to effectively integrate information from different modalities to improve the accuracy and robustness of the system for individual identity. In this paper, we focus on how to improve multimodal person verificat
Стилі APA, Harvard, Vancouver, ISO та ін.
45

Azeroual, Saadia, Zakaria Hamane, Rajaa Sebihi, and Fatima-Ezzahraa Ben-Bouazza. "Toward Improved Glioma Mortality Prediction: A Multimodal Framework Combining Radiomic and Clinical Features." International Journal of Online and Biomedical Engineering (iJOE) 21, no. 05 (2025): 31–46. https://doi.org/10.3991/ijoe.v21i05.52691.

Повний текст джерела
Анотація:
Gliomas, especially diffuse gliomas, remain a major challenge in neuro-oncology due to their highly heterogeneous nature and poor prognosis. Accurately predicting patient mortality is essential for improving treatment strategies and outcomes, yet current models often fail to fully utilize the wealth of available multimodal data. To address this, we developed a novel multimodal predictive model that integrates diverse magnetic resonance imaging (MRI) sequences—T1, T2, FLAIR, DWI, SWI, and advanced diffusion metrics such as high angular resolution diffusion imaging (HARDI)—with detailed clinical
Стилі APA, Harvard, Vancouver, ISO та ін.
46

Salin, Emmanuelle, Badreddine Farah, Stéphane Ayache, and Benoit Favre. "Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (2022): 11248–57. http://dx.doi.org/10.1609/aaai.v36i10.21375.

Повний текст джерела
Анотація:
In recent years, joint text-image embeddings have significantly improved thanks to the development of transformer-based Vision-Language models. Despite these advances, we still need to better understand the representations produced by those models. In this paper, we compare pre-trained and fine-tuned representations at a vision, language and multimodal level. To that end, we use a set of probing tasks to evaluate the performance of state-of-the-art Vision-Language models and introduce new datasets specifically for multimodal probing. These datasets are carefully designed to address a range of
Стилі APA, Harvard, Vancouver, ISO та ін.
47

Bikshapathy Peruka. "Sentemonet: A Comprehensive Framework for Multimodal Sentiment Analysis from Text and Emotions." Journal of Information Systems Engineering and Management 10, no. 34s (2025): 569–87. https://doi.org/10.52783/jisem.v10i34s.5852.

Повний текст джерела
Анотація:
Sentiment analysis, a crucial aspect of Natural Language Processing (NLP), plays a pivotal role in understanding public opinion, customer feedback, and user sentiments in various domains. In this study, we present a comprehensive approach to sentiment analysis that incorporates both textual and emoji data, leveraging diverse datasets from sources such as social media, customer reviews, and surveys. Our methodology consists of several key steps, including data collection, pre-processing, feature extraction, feature fusion, and feature selection. For data pre-processing, we apply techniques such
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Skantze, Gabriel, and Bram Willemsen. "CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings." Journal of Artificial Intelligence Research 74 (July 9, 2022): 1201–23. http://dx.doi.org/10.1613/jair.1.13689.

Повний текст джерела
Анотація:
This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision. Given a pre-trained multimodal embedding model, where language and images are projected in the same semantic space (in this case CLIP by OpenAI), CoLLIE learns a transformation function that adjusts the language embeddings when needed to accommodate new language use. This is done by predicting the difference vector that needs to be applied, as well as a scaling factor for this vector, so that the adjustment is only applied when needed. Unlike traditional few-shot learning, th
Стилі APA, Harvard, Vancouver, ISO та ін.
49

Li, Wenxiang, Longyuan Ding, Yuliang Zhang, and Ziyuan Pu. "Understanding multimodal travel patterns based on semantic embeddings of human mobility trajectories." Journal of Transport Geography 124 (April 2025): 104169. https://doi.org/10.1016/j.jtrangeo.2025.104169.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
50

Wang, Jenq-Haur, Mehdi Norouzi, and Shu Ming Tsai. "Augmenting Multimodal Content Representation with Transformers for Misinformation Detection." Big Data and Cognitive Computing 8, no. 10 (2024): 134. http://dx.doi.org/10.3390/bdcc8100134.

Повний текст джерела
Анотація:
Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most of the social platforms identify rumors through manual fact-checking, which is very inefficient. On the other hand, with an emerging form of misinformation that contains inconsistent image–text pairs, it would be beneficial if we could compare the meaning of multimodal content within the same post for detecting image–text in
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!