Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Multimodal Embeddings.

Artykuły w czasopismach na temat „Multimodal Embeddings”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Multimodal Embeddings”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev, and Alexander Panchenko. "On Isotropy of Multimodal Embeddings." Information 14, no. 7 (2023): 392. http://dx.doi.org/10.3390/info14070392.

Pełny tekst źródła
Streszczenie:
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for
Style APA, Harvard, Vancouver, ISO itp.
2

Guo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi, and Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.

Pełny tekst źródła
Streszczenie:
The multimodal recommendation has gradually become the infrastructure of online media platforms, enabling them to provide personalized service to users through a joint modeling of user historical behaviors (e.g., purchases, clicks) and item various modalities (e.g., visual and textual). The majority of existing studies typically focus on utilizing modal features or modal-related graph structure to learn user local interests. Nevertheless, these approaches encounter two limitations: (1) Shared updates of user ID embeddings result in the consequential coupling between collaboration and multimoda
Style APA, Harvard, Vancouver, ISO itp.
3

Shang, Bin, Yinliang Zhao, Jun Liu, and Di Wang. "LAFA: Multimodal Knowledge Graph Completion with Link Aware Fusion and Aggregation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8957–65. http://dx.doi.org/10.1609/aaai.v38i8.28744.

Pełny tekst źródła
Streszczenie:
Recently, an enormous amount of research has emerged on multimodal knowledge graph completion (MKGC), which seeks to extract knowledge from multimodal data and predict the most plausible missing facts to complete a given multimodal knowledge graph (MKG). However, existing MKGC approaches largely ignore that visual information may introduce noise and lead to uncertainty when adding them to the traditional KG embeddings due to the contribution of each associated image to entity is different in diverse link scenarios. Moreover, treating each triple independently when learning entity embeddings le
Style APA, Harvard, Vancouver, ISO itp.
4

Sun, Zhongkai, Prathusha Sarma, William Sethares, and Yingyu Liang. "Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8992–99. http://dx.doi.org/10.1609/aaai.v34i05.6431.

Pełny tekst źródła
Streszczenie:
Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment anal
Style APA, Harvard, Vancouver, ISO itp.
5

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Pełny tekst źródła
Streszczenie:
AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves re
Style APA, Harvard, Vancouver, ISO itp.
6

Mihail Mateev. "Comparative Analysis on Implementing Embeddings for Image Analysis." Journal of Information Systems Engineering and Management 10, no. 17s (2025): 89–102. https://doi.org/10.52783/jisem.v10i17s.2710.

Pełny tekst źródła
Streszczenie:
This research explores how artificial intelligence enhances construction maintenance and diagnostics, achieving 95% accuracy on a dataset of 10,000 cases. The findings highlight AI's potential to revolutionize predictive maintenance in the industry. The growing adoption of image embeddings has transformed visual data processing across AI applications. This study evaluates embedding implementations in major platforms, including Azure AI, OpenAI's GPT-4 Vision, and frameworks like Hugging Face, Replicate, and Eden AI. It assesses their scalability, accuracy, cost-effectiveness, and integration f
Style APA, Harvard, Vancouver, ISO itp.
7

Tang, Zhenchao, Jiehui Huang, Guanxing Chen, and Calvin Yu-Chian Chen. "Comprehensive View Embedding Learning for Single-Cell Multimodal Integration." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15292–300. http://dx.doi.org/10.1609/aaai.v38i14.29453.

Pełny tekst źródła
Streszczenie:
Motivation: Advances in single-cell measurement techniques provide rich multimodal data, which helps us to explore the life state of cells more deeply. However, multimodal integration, or, learning joint embeddings from multimodal data remains a current challenge. The difficulty in integrating unpaired single-cell multimodal data is that different modalities have different feature spaces, which easily leads to information loss in joint embedding. And few existing methods have fully exploited and fused the information in single-cell multimodal data. Result: In this study, we propose CoVEL, a de
Style APA, Harvard, Vancouver, ISO itp.
8

Zhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.

Pełny tekst źródła
Streszczenie:
Previous work has shown the effectiveness of using event representations for tasks such as script event prediction and stock market prediction. It is however still challenging to learn the subtle semantic differences between events based solely on textual descriptions of events often represented as (subject, predicate, object) triples. As an alternative, images offer a more intuitive way of understanding event semantics. We observe that event described in text and in images show different abstraction levels and therefore should be projected onto heterogeneous embedding spaces, as opposed to wh
Style APA, Harvard, Vancouver, ISO itp.
9

Sah, Shagan, Sabarish Gopalakishnan, and Raymond Ptucha. "Aligned attention for common multimodal embeddings." Journal of Electronic Imaging 29, no. 02 (2020): 1. http://dx.doi.org/10.1117/1.jei.29.2.023013.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Alkaabi, Hussein, Ali Kadhim Jasim, and Ali Darroudi. "From Static to Contextual: A Survey of Embedding Advances in NLP." PERFECT: Journal of Smart Algorithms 2, no. 2 (2025): 57–66. https://doi.org/10.62671/perfect.v2i2.77.

Pełny tekst źródła
Streszczenie:
Embedding techniques have been a cornerstone of Natural Language Processing (NLP), enabling machines to represent textual data in a form that captures semantic and syntactic relationships. Over the years, the field has witnessed a significant evolution—from static word embeddings, such as Word2Vec and GloVe, which represent words as fixed vectors, to dynamic, contextualized embeddings like BERT and GPT, which generate word representations based on their surrounding context. This survey provides a comprehensive overview of embedding techniques, tracing their development from early methods to st
Style APA, Harvard, Vancouver, ISO itp.
11

Zhang, Rongchao, Yiwei Lou, Dexuan Xu, Yongzhi Cao, Hanpin Wang, and Yu Huang. "A Learnable Discrete-Prior Fusion Autoencoder with Contrastive Learning for Tabular Data Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (2024): 16803–11. http://dx.doi.org/10.1609/aaai.v38i15.29621.

Pełny tekst źródła
Streszczenie:
The actual collection of tabular data for sharing involves confidentiality and privacy constraints, leaving the potential risks of machine learning for interventional data analysis unsafely averted. Synthetic data has emerged recently as a privacy-protecting solution to address this challenge. However, existing approaches regard discrete and continuous modal features as separate entities, thus falling short in properly capturing their inherent correlations. In this paper, we propose a novel contrastive learning guided Gaussian Transformer autoencoder, termed GTCoder, to synthesize photo-realis
Style APA, Harvard, Vancouver, ISO itp.
12

Lin, Kaiyi, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. "Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11515–22. http://dx.doi.org/10.1609/aaai.v34i07.6817.

Pełny tekst źródła
Streszczenie:
Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as
Style APA, Harvard, Vancouver, ISO itp.
13

Khalifa, Omar Yasser Ibrahim, and Muhammad Zafran Muhammad Zaly Shah. "MultiPhishNet: A Multimodal Approach of QR Code Phishing Detection using Multi-Head Attention and Multilingual Embeddings." International Journal of Innovative Computing 15, no. 1 (2025): 53–61. https://doi.org/10.11113/ijic.v15n1.512.

Pełny tekst źródła
Streszczenie:
Phishing attacks leveraging QR codes have become a significant threat due to their increasing use in contactless services. These attacks are challenging to detect since QR codes typically encode URLs leading to phishing websites designed to steal sensitive information. Existing detection methods often rely on blacklists or handcrafted features, which are inadequate for handling obfuscated URLs and multilingual content. This paper proposes MultiPhishNet, a multimodal phishing detection model that integrates advanced embedding techniques, Convolutional Neural Networks (CNNs), and multi-head atte
Style APA, Harvard, Vancouver, ISO itp.
14

Waqas, Asim, Aakash Tripathi, Mia Naeini, Paul A. Stewart, Matthew B. Schabath, and Ghulam Rasool. "Abstract 991: PARADIGM: an embeddings-based multimodal learning framework with foundation models and graph neural networks." Cancer Research 85, no. 8_Supplement_1 (2025): 991. https://doi.org/10.1158/1538-7445.am2025-991.

Pełny tekst źródła
Streszczenie:
Abstract Introduction: Cancer research faces significant challenges in integrating heterogeneous data across varying spatial and temporal scales, limiting the ability to gain a comprehensive understanding of the disease. PARADIGM(Pan-Cancer Embeddings Representation using Advanced Multimodal Learning with Graph-based Modeling) addresses this challenge by providing a framework leveraging foundation models (FMs) and Graph Neural Networks (GNN). PARADIGM framework generates embeddings from multi-resolution datasets using modality-specific FMs, aggregates sample embeddings, fuses them into a unifi
Style APA, Harvard, Vancouver, ISO itp.
15

Li, Xiaolong, Yang Dong, Yunfei Yi, Zhixun Liang, and Shuqi Yan. "Hypergraph Neural Network for Multimodal Depression Recognition." Electronics 13, no. 22 (2024): 4544. http://dx.doi.org/10.3390/electronics13224544.

Pełny tekst źródła
Streszczenie:
Deep learning-based approaches for automatic depression recognition offer advantages of low cost and high efficiency. However, depression symptoms are challenging to detect and vary significantly between individuals. Traditional deep learning methods often struggle to capture and model these nuanced features effectively, leading to lower recognition accuracy. This paper introduces a novel multimodal depression recognition method, HYNMDR, which utilizes hypergraphs to represent the complex, high-order relationships among patients with depression. HYNMDR comprises two primary components: a tempo
Style APA, Harvard, Vancouver, ISO itp.
16

Zhu, Chaoyu, Zhihao Yang, Xiaoqiong Xia, Nan Li, Fan Zhong, and Lei Liu. "Multimodal reasoning based on knowledge graph embedding for specific diseases." Bioinformatics 38, no. 8 (2022): 2235–45. http://dx.doi.org/10.1093/bioinformatics/btac085.

Pełny tekst źródła
Streszczenie:
Abstract Motivation Knowledge Graph (KG) is becoming increasingly important in the biomedical field. Deriving new and reliable knowledge from existing knowledge by KG embedding technology is a cutting-edge method. Some add a variety of additional information to aid reasoning, namely multimodal reasoning. However, few works based on the existing biomedical KGs are focused on specific diseases. Results This work develops a construction and multimodal reasoning process of Specific Disease Knowledge Graphs (SDKGs). We construct SDKG-11, a SDKG set including five cancers, six non-cancer diseases, a
Style APA, Harvard, Vancouver, ISO itp.
17

Tripathi, Aakash Gireesh, Asim Waqas, Yasin Yilmaz, Matthew B. Schabath, and Ghulam Rasool. "Abstract 3641: Predicting treatment outcomes using cross-modality correlations in multimodal oncology data." Cancer Research 85, no. 8_Supplement_1 (2025): 3641. https://doi.org/10.1158/1538-7445.am2025-3641.

Pełny tekst źródła
Streszczenie:
Abstract Accurate prediction of treatment outcomes in oncology requires modeling the intricate relationships across diverse data modalities. This study investigates cross-modality correlations by leveraging imaging and clinical data curated through the Multimodal Integration of Oncology Data System (MINDS) and HoneyBee frameworks to uncover actionable patterns for personalized treatment strategies. Using data from over 10, 000 cancer patients, we developed a machine learning pipeline that employs advanced embedding techniques to capture associations between radiological imaging phenotypes and
Style APA, Harvard, Vancouver, ISO itp.
18

Tripathi, Aakash, Asim Waqas, Yasin Yilmaz, and Ghulam Rasool. "Abstract 4905: Multimodal transformer model improves survival prediction in lung cancer compared to unimodal approaches." Cancer Research 84, no. 6_Supplement (2024): 4905. http://dx.doi.org/10.1158/1538-7445.am2024-4905.

Pełny tekst źródła
Streszczenie:
Abstract Integrating multimodal lung data including clinical notes, medical images, and molecular data is critical for predictive modeling tasks like survival prediction, yet effectively aligning these disparate data types remains challenging. We present a novel method to integrate heterogeneous lung modalities by first thoroughly analyzing various domain-specific models and selecting the optimal model for embedding feature extraction per data type based on performance on representative pretrained tasks. For clinical notes, the GatorTron models showed the lowest regression loss on an initial e
Style APA, Harvard, Vancouver, ISO itp.
19

Xu, Jinfeng, Zheyu Chen, Shuo Yang, Jinze Li, Hewei Wang, and Edith C. H. Ngai. "MENTOR: Multi-level Self-supervised Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 12 (2025): 12908–17. https://doi.org/10.1609/aaai.v39i12.33408.

Pełny tekst źródła
Streszczenie:
As multimedia information proliferates, multimodal recommendation systems have garnered significant attention. These systems leverage multimodal information to alleviate the data sparsity issue inherent in recommendation systems, thereby enhancing the accuracy of recommendations. Due to the natural semantic disparities among multimodal features, recent research has primarily focused on cross-modal alignment using self-supervised learning to bridge these gaps. However, aligning different modal features might result in the loss of valuable interaction information, distancing them from ID embeddi
Style APA, Harvard, Vancouver, ISO itp.
20

Ota, Kosuke, Keiichiro Shirai, Hidetoshi Miyao, and Minoru Maruyama. "Multimodal Analogy-Based Image Retrieval by Improving Semantic Embeddings." Journal of Advanced Computational Intelligence and Intelligent Informatics 26, no. 6 (2022): 995–1003. http://dx.doi.org/10.20965/jaciii.2022.p0995.

Pełny tekst źródła
Streszczenie:
In this work, we study the application of multimodal analogical reasoning to image retrieval. Multimodal analogy questions are given in a form of tuples of words and images, e.g., “cat”:“dog”::[an image of a cat sitting on a bench]:?, to search for an image of a dog sitting on a bench. Retrieving desired images given these tuples can be seen as a task of finding images whose relation between the query image is close to that of query words. One way to achieve the task is building a common vector space that exhibits analogical regularities. To learn such an embedding, we propose a quadruple neur
Style APA, Harvard, Vancouver, ISO itp.
21

Yi, Moung-Ho, Keun-Chang Kwak, and Ju-Hyun Shin. "KoHMT: A Multimodal Emotion Recognition Model Integrating KoELECTRA, HuBERT with Multimodal Transformer." Electronics 13, no. 23 (2024): 4674. http://dx.doi.org/10.3390/electronics13234674.

Pełny tekst źródła
Streszczenie:
With the advancement of human-computer interaction, the role of emotion recognition has become increasingly significant. Emotion recognition technology provides practical benefits across various industries, including user experience enhancement, education, and organizational productivity. For instance, in educational settings, it enables real-time understanding of students’ emotional states, facilitating tailored feedback. In workplaces, monitoring employees’ emotions can contribute to improved job performance and satisfaction. Recently, emotion recognition has also gained attention in media a
Style APA, Harvard, Vancouver, ISO itp.
22

Mai, Sijie, Haifeng Hu, and Songlong Xing. "Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (2020): 164–72. http://dx.doi.org/10.1609/aaai.v34i01.5347.

Pełny tekst źródła
Streszczenie:
Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore
Style APA, Harvard, Vancouver, ISO itp.
23

Kapil Adhar Wagh. "A Review: Word Embedding Models with Machine Learning Based Context Depend and Context Independent Techniques." Advances in Nonlinear Variational Inequalities 28, no. 3s (2024): 251–58. https://doi.org/10.52783/anvi.v28.2928.

Pełny tekst źródła
Streszczenie:
Natural language processing (NLP) has been transformed by word embedding models, which convert text into meaningful numerical representations. These models fall into two general categories: context-dependent methods like ELMo, BERT, and GPT, and context-independent methods like Word2Vec, GloVe, and FastText. Although static word representations are provided by context-independent models, polysemy and contextual subtleties are difficult for them to capture. These issues are addressed by context-dependent approaches that make use of sophisticated deep learning architectures to produce dynamic em
Style APA, Harvard, Vancouver, ISO itp.
24

Kim, Donghyun, Kuniaki Saito, Kate Saenko, Stan Sclaroff, and Bryan Plummer. "MULE: Multimodal Universal Language Embedding." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11254–61. http://dx.doi.org/10.1609/aaai.v34i07.6785.

Pełny tekst źródła
Streszczenie:
Existing vision-language methods typically support two languages at a time at most. In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages. We accomplish this by learning a single shared Multimodal Universal Language Embedding (MULE) which has been visually-semantically aligned across all languages. Then we learn to relate MULE to visual data as if it were a single language. Our method is not architecture specific, unlike prior work which typically learned separate branches for each language, enabli
Style APA, Harvard, Vancouver, ISO itp.
25

Vijay Vaibhav Singh. "Vector Embeddings: The Mathematical Foundation of Modern AI Systems." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 11, no. 1 (2025): 2408–17. https://doi.org/10.32628/cseit251112257.

Pełny tekst źródła
Streszczenie:
This comprehensive article examines vector embeddings as a fundamental component of modern artificial intelligence systems, detailing their mathematical foundations, key properties, implementation techniques, and practical applications. The article traces the evolution from basic word embeddings to sophisticated transformer-based architectures, highlighting how these representations enable machines to capture and process semantic relationships in human language and visual data. The article encompasses both theoretical frameworks and practical implementations, from the groundbreaking Word2Vec a
Style APA, Harvard, Vancouver, ISO itp.
26

Wehrmann, Jônatas, Anderson Mattjie, and Rodrigo C. Barros. "Order embeddings and character-level convolutions for multimodal alignment." Pattern Recognition Letters 102 (January 2018): 15–22. http://dx.doi.org/10.1016/j.patrec.2017.11.020.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
27

Mithun, Niluthpol C., Juncheng Li, Florian Metze, and Amit K. Roy-Chowdhury. "Joint embeddings with multimodal cues for video-text retrieval." International Journal of Multimedia Information Retrieval 8, no. 1 (2019): 3–18. http://dx.doi.org/10.1007/s13735-018-00166-3.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
28

Fodor, Ádám, András Lőrincz, and Rachid R. Saboundji. "Enhancing apparent personality trait analysis with cross-modal embeddings." Annales Universitatis Scientiarum Budapestinensis de Rolando Eötvös Nominatae. Sectio computatorica 57 (2024): 167–85. https://doi.org/10.71352/ac.57.167.

Pełny tekst źródła
Streszczenie:
utomatic personality trait assessment is essential for high-quality human-machine interactions. Systems capable of human behavior analysis could be used for self-driving cars, medical research, and surveillance, among many others. We present a multimodal deep neural network with a distance learning network extension for apparent personality trait prediction trained on short video recordings and exploiting modality invariant embeddings. Acoustic, visual, and textual information are utilized to reach high-performance solutions in this task. Due to the highly centralized target distribution of th
Style APA, Harvard, Vancouver, ISO itp.
29

Roshan, Nayak, S. Ullas Kannantha B, S. Kruthi, and Gururaj C. "Multimodal Offensive Meme Classification Using Transformers and BiLSTM." International Journal of Engineering and Advanced Technology (IJEAT) 11, no. 3 (2022): 96–102. https://doi.org/10.35940/ijeat.C3392.0211322.

Pełny tekst źródła
Streszczenie:
<strong>Abstract:</strong> Nowadays memes have become a way in which people express their ideas on social media. These memes can convey various views including offensive ones. Memes can be intended for a personal attack, homophobic abuse, racial abuse, attack on minority etc. The memes are implicit and multi-modal in nature. Here we analyze the meme by categorizing them as offensive or not offensive and this becomes a binary classification problem. We propose a novel offensive meme classification using the transformer-based image encoder, BiLSTM for text with mean pooling as text encoder and a
Style APA, Harvard, Vancouver, ISO itp.
30

Nayak, Roshan, B. S. Ullas Kannantha, Kruthi S, and C. Gururaj. "Multimodal Offensive Meme Classification u sing Transformers and BiLSTM." International Journal of Engineering and Advanced Technology 11, no. 3 (2022): 96–102. http://dx.doi.org/10.35940/ijeat.c3392.0211322.

Pełny tekst źródła
Streszczenie:
Nowadays memes have become a way in which people express their ideas on social media. These memes can convey various views including offensive ones. Memes can be intended for a personal attack, homophobic abuse, racial abuse, attack on minority etc. The memes are implicit and multi-modal in nature. Here we analyze the meme by categorizing them as offensive or not offensive and this becomes a binary classification problem. We propose a novel offensive meme classification using the transformer-based image encoder, BiLSTM for text with mean pooling as text encoder and a Feed-Forward Network as a
Style APA, Harvard, Vancouver, ISO itp.
31

Chen, Weijia, Zhijun Lu, Lijue You, Lingling Zhou, Jie Xu, and Ken Chen. "Artificial Intelligence–Based Multimodal Risk Assessment Model for Surgical Site Infection (AMRAMS): Development and Validation Study." JMIR Medical Informatics 8, no. 6 (2020): e18186. http://dx.doi.org/10.2196/18186.

Pełny tekst źródła
Streszczenie:
Background Surgical site infection (SSI) is one of the most common types of health care–associated infections. It increases mortality, prolongs hospital length of stay, and raises health care costs. Many institutions developed risk assessment models for SSI to help surgeons preoperatively identify high-risk patients and guide clinical intervention. However, most of these models had low accuracies. Objective We aimed to provide a solution in the form of an Artificial intelligence–based Multimodal Risk Assessment Model for Surgical site infection (AMRAMS) for inpatients undergoing operations, us
Style APA, Harvard, Vancouver, ISO itp.
32

N.D., Smelik. "Multimodal topic model for texts and images utilizing their embeddings." Machine Learning and Data Analysis 2, no. 4 (2016): 421–41. http://dx.doi.org/10.21469/22233792.2.4.05.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
33

Abdou, Ahmed, Ekta Sood, Philipp Müller, and Andreas Bulling. "Gaze-enhanced Crossmodal Embeddings for Emotion Recognition." Proceedings of the ACM on Human-Computer Interaction 6, ETRA (2022): 1–18. http://dx.doi.org/10.1145/3530879.

Pełny tekst źródła
Streszczenie:
Emotional expressions are inherently multimodal -- integrating facial behavior, speech, and gaze -- but their automatic recognition is often limited to a single modality, e.g. speech during a phone call. While previous work proposed crossmodal emotion embeddings to improve monomodal recognition performance, despite its importance, an explicit representation of gaze was not included. We propose a new approach to emotion recognition that incorporates an explicit representation of gaze in a crossmodal emotion embedding framework. We show that our method outperforms the previous state of the art f
Style APA, Harvard, Vancouver, ISO itp.
34

Hu, Wenbo, Yifan Xu, Yi Li, Weiyue Li, Zeyuan Chen, and Zhuowen Tu. "BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (2024): 2256–64. http://dx.doi.org/10.1609/aaai.v38i3.27999.

Pełny tekst źródła
Streszczenie:
Vision Language Models (VLMs), which extend Large Language Models (LLM) by incorporating visual understanding capability, have demonstrated significant advancements in addressing open-ended visual question-answering (VQA) tasks. However, these models cannot accurately interpret images infused with text, a common occurrence in real-world scenarios. Standard procedures for extracting information from images often involve learning a fixed set of query embeddings. These embeddings are designed to encapsulate image contexts and are later used as soft prompt inputs in LLMs. Yet, this process is limi
Style APA, Harvard, Vancouver, ISO itp.
35

Chen, Qihua, Xuejin Chen, Chenxuan Wang, Yixiong Liu, Zhiwei Xiong, and Feng Wu. "Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 2 (2024): 1174–82. http://dx.doi.org/10.1609/aaai.v38i2.27879.

Pełny tekst źródła
Streszczenie:
The current neuron reconstruction pipeline for electron microscopy (EM) data usually includes automatic image segmentation followed by extensive human expert proofreading. In this work, we aim to reduce human workload by predicting connectivity between over-segmented neuron pieces, taking both microscopy image and 3D morphology features into account, similar to human proofreading workflow. To this end, we first construct a dataset, named FlyTracing, that contains millions of pairwise connections of segments expanding the whole fly brain, which is three orders of magnitude larger than existing
Style APA, Harvard, Vancouver, ISO itp.
36

Shen, Aili, Bahar Salehi, Jianzhong Qi, and Timothy Baldwin. "A General Approach to Multimodal Document Quality Assessment." Journal of Artificial Intelligence Research 68 (July 22, 2020): 607–32. http://dx.doi.org/10.1613/jair.1.11647.

Pełny tekst źródła
Streszczenie:
&#x0D; &#x0D; &#x0D; The perceived quality of a document is affected by various factors, including grammat- icality, readability, stylistics, and expertise depth, making the task of document quality assessment a complex one. In this paper, we explore this task in the context of assessing the quality of Wikipedia articles and academic papers. Observing that the visual rendering of a document can capture implicit quality indicators that are not present in the document text — such as images, font choices, and visual layout — we propose a joint model that combines the text content with a visual re
Style APA, Harvard, Vancouver, ISO itp.
37

Sata, Ikumi, Motoki Amagasaki, and Masato Kiyama. "Multimodal Retrieval Method for Images and Diagnostic Reports Using Cross-Attention." AI 6, no. 2 (2025): 38. https://doi.org/10.3390/ai6020038.

Pełny tekst źródła
Streszczenie:
Background: Conventional medical image retrieval methods treat images and text as independent embeddings, limiting their ability to fully utilize the complementary information from both modalities. This separation often results in suboptimal retrieval performance, as the intricate relationships between images and text remain underexplored. Methods: To address this limitation, we propose a novel retrieval method that integrates medical image and text embeddings using a cross-attention mechanism. Our approach creates a unified representation by directly modeling the interactions between the two
Style APA, Harvard, Vancouver, ISO itp.
38

Kiran Chitturi. "Demystifying Multimodal AI: A Technical Deep Dive." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 10, no. 6 (2024): 2011–17. https://doi.org/10.32628/cseit2410612394.

Pełny tekst źródła
Streszczenie:
This article explores the transformative impact of multimodal AI systems in bridging diverse data types and processing capabilities. It examines how these systems have revolutionized various domains through their ability to handle multiple modalities simultaneously, from visual-linguistic understanding to complex search operations. The article delves into the technical foundations of multimodal embeddings, analyzes leading models like CLIP and MUM, and investigates their real-world applications across different sectors. Through a detailed examination of current implementations, challenges, and
Style APA, Harvard, Vancouver, ISO itp.
39

Tokar, Tomas, and Scott Sanner. "ICE-T: Interactions-aware Cross-column Contrastive Embedding for Heterogeneous Tabular Datasets." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 20 (2025): 20904–11. https://doi.org/10.1609/aaai.v39i20.35385.

Pełny tekst źródła
Streszczenie:
Finding high-quality representations of heterogeneous tabular datasets is crucial for their effective use in downstream machine learning tasks. Contrastive representation learning (CRL) methods have been previously shown to provide a straightforward way to learn such representations across various data domains. Current tabular CRL methods learn joint embeddings of data instances (tabular rows) by minimizing a contrastive loss between the original instance and its perturbations. Unlike existing tabular CRL methods, we propose leveraging frameworks established in multimodal representation learni
Style APA, Harvard, Vancouver, ISO itp.
40

Ma, Shukui, Pengyuan Ma, Shuaichao Feng, Fei Ma, and Guangping Zhuo. "Multimodal Data-Based Text Generation Depression Classification Model." International Journal of Computer Science and Information Technology 5, no. 1 (2025): 175–93. https://doi.org/10.62051/ijcsit.v5n1.16.

Pełny tekst źródła
Streszczenie:
Depression classification often relies on multimodal features, but existing models struggle to capture the similarity between multimodal features. Moreover, the social stigma surrounding depression leads to limited availability of datasets, which constrains model accuracy. This study aims to improve multimodal depression recognition methods by proposing a Multimodal Generation-Text Depression Classification Model. The model introduces a Multimodal-Deep-Extract-Feature Net to capture both long- and short-term sequential features. A Dual Text Contrastive Learning Module is employed to generate e
Style APA, Harvard, Vancouver, ISO itp.
41

Zhang, Jianqiang, Renyao Chen, Shengwen Li, Tailong Li, and Hong Yao. "MGKGR: Multimodal Semantic Fusion for Geographic Knowledge Graph Representation." Algorithms 17, no. 12 (2024): 593. https://doi.org/10.3390/a17120593.

Pełny tekst źródła
Streszczenie:
Geographic knowledge graph representation learning embeds entities and relationships in geographic knowledge graphs into a low-dimensional continuous vector space, which serves as a basic method that bridges geographic knowledge graphs and geographic applications. Previous geographic knowledge graph representation methods primarily learn the vectors of entities and their relationships from their spatial attributes and relationships, which ignores various semantics of entities, resulting in poor embeddings on geographic knowledge graphs. This study proposes a two-stage multimodal geographic kno
Style APA, Harvard, Vancouver, ISO itp.
42

Jyoti, Arora, Khapekar Priyal, and Pal Rakhi. "Multimodal Sentiment Analysis using LSTM and RoBerta." Advanced Innovations in Computer Programming Languages 5, no. 2 (2023): 24–35. https://doi.org/10.5281/zenodo.8130701.

Pełny tekst źródła
Streszczenie:
<em>Social media is a valuable data source for understanding people&#39;s thoughts and feelings. Sentiment analysis and affective computing help analyze sentiment and emotions in social media posts. Our research paper proposes a model for tweet emotions analysis using LSTM, GloVe embeddings, and RoBERTa. This model captures sequential dependencies in tweets, leverages semantic representations, and enhances contextual understanding. We evaluate the model on a tweet emotions dataset, demonstrating its effectiveness in accurately classifying emotions in tweets.Through evaluation on a tweet emotio
Style APA, Harvard, Vancouver, ISO itp.
43

Tseng, Shao-Yen, Shrikanth Narayanan, and Panayiotis Georgiou. "Multimodal Embeddings From Language Models for Emotion Recognition in the Wild." IEEE Signal Processing Letters 28 (2021): 608–12. http://dx.doi.org/10.1109/lsp.2021.3065598.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
44

Jing, Xuebin, Liang He, Zhida Song, and Shaolei Wang. "Audio–Visual Fusion Based on Interactive Attention for Person Verification." Sensors 23, no. 24 (2023): 9845. http://dx.doi.org/10.3390/s23249845.

Pełny tekst źródła
Streszczenie:
With the rapid development of multimedia technology, personnel verification systems have become increasingly important in the security field and identity verification. However, unimodal verification systems have performance bottlenecks in complex scenarios, thus triggering the need for multimodal feature fusion methods. The main problem with audio–visual multimodal feature fusion is how to effectively integrate information from different modalities to improve the accuracy and robustness of the system for individual identity. In this paper, we focus on how to improve multimodal person verificat
Style APA, Harvard, Vancouver, ISO itp.
45

Azeroual, Saadia, Zakaria Hamane, Rajaa Sebihi, and Fatima-Ezzahraa Ben-Bouazza. "Toward Improved Glioma Mortality Prediction: A Multimodal Framework Combining Radiomic and Clinical Features." International Journal of Online and Biomedical Engineering (iJOE) 21, no. 05 (2025): 31–46. https://doi.org/10.3991/ijoe.v21i05.52691.

Pełny tekst źródła
Streszczenie:
Gliomas, especially diffuse gliomas, remain a major challenge in neuro-oncology due to their highly heterogeneous nature and poor prognosis. Accurately predicting patient mortality is essential for improving treatment strategies and outcomes, yet current models often fail to fully utilize the wealth of available multimodal data. To address this, we developed a novel multimodal predictive model that integrates diverse magnetic resonance imaging (MRI) sequences—T1, T2, FLAIR, DWI, SWI, and advanced diffusion metrics such as high angular resolution diffusion imaging (HARDI)—with detailed clinical
Style APA, Harvard, Vancouver, ISO itp.
46

Salin, Emmanuelle, Badreddine Farah, Stéphane Ayache, and Benoit Favre. "Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (2022): 11248–57. http://dx.doi.org/10.1609/aaai.v36i10.21375.

Pełny tekst źródła
Streszczenie:
In recent years, joint text-image embeddings have significantly improved thanks to the development of transformer-based Vision-Language models. Despite these advances, we still need to better understand the representations produced by those models. In this paper, we compare pre-trained and fine-tuned representations at a vision, language and multimodal level. To that end, we use a set of probing tasks to evaluate the performance of state-of-the-art Vision-Language models and introduce new datasets specifically for multimodal probing. These datasets are carefully designed to address a range of
Style APA, Harvard, Vancouver, ISO itp.
47

Bikshapathy Peruka. "Sentemonet: A Comprehensive Framework for Multimodal Sentiment Analysis from Text and Emotions." Journal of Information Systems Engineering and Management 10, no. 34s (2025): 569–87. https://doi.org/10.52783/jisem.v10i34s.5852.

Pełny tekst źródła
Streszczenie:
Sentiment analysis, a crucial aspect of Natural Language Processing (NLP), plays a pivotal role in understanding public opinion, customer feedback, and user sentiments in various domains. In this study, we present a comprehensive approach to sentiment analysis that incorporates both textual and emoji data, leveraging diverse datasets from sources such as social media, customer reviews, and surveys. Our methodology consists of several key steps, including data collection, pre-processing, feature extraction, feature fusion, and feature selection. For data pre-processing, we apply techniques such
Style APA, Harvard, Vancouver, ISO itp.
48

Skantze, Gabriel, and Bram Willemsen. "CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings." Journal of Artificial Intelligence Research 74 (July 9, 2022): 1201–23. http://dx.doi.org/10.1613/jair.1.13689.

Pełny tekst źródła
Streszczenie:
This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision. Given a pre-trained multimodal embedding model, where language and images are projected in the same semantic space (in this case CLIP by OpenAI), CoLLIE learns a transformation function that adjusts the language embeddings when needed to accommodate new language use. This is done by predicting the difference vector that needs to be applied, as well as a scaling factor for this vector, so that the adjustment is only applied when needed. Unlike traditional few-shot learning, th
Style APA, Harvard, Vancouver, ISO itp.
49

Li, Wenxiang, Longyuan Ding, Yuliang Zhang, and Ziyuan Pu. "Understanding multimodal travel patterns based on semantic embeddings of human mobility trajectories." Journal of Transport Geography 124 (April 2025): 104169. https://doi.org/10.1016/j.jtrangeo.2025.104169.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
50

Wang, Jenq-Haur, Mehdi Norouzi, and Shu Ming Tsai. "Augmenting Multimodal Content Representation with Transformers for Misinformation Detection." Big Data and Cognitive Computing 8, no. 10 (2024): 134. http://dx.doi.org/10.3390/bdcc8100134.

Pełny tekst źródła
Streszczenie:
Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most of the social platforms identify rumors through manual fact-checking, which is very inefficient. On the other hand, with an emerging form of misinformation that contains inconsistent image–text pairs, it would be beneficial if we could compare the meaning of multimodal content within the same post for detecting image–text in
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!