Добірка наукової літератури з теми "Multimodal Embeddings"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Multimodal Embeddings".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Multimodal Embeddings"

1

Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev, and Alexander Panchenko. "On Isotropy of Multimodal Embeddings." Information 14, no. 7 (2023): 392. http://dx.doi.org/10.3390/info14070392.

Повний текст джерела
Анотація:
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Guo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi, and Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.

Повний текст джерела
Анотація:
The multimodal recommendation has gradually become the infrastructure of online media platforms, enabling them to provide personalized service to users through a joint modeling of user historical behaviors (e.g., purchases, clicks) and item various modalities (e.g., visual and textual). The majority of existing studies typically focus on utilizing modal features or modal-related graph structure to learn user local interests. Nevertheless, these approaches encounter two limitations: (1) Shared updates of user ID embeddings result in the consequential coupling between collaboration and multimoda
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Shang, Bin, Yinliang Zhao, Jun Liu, and Di Wang. "LAFA: Multimodal Knowledge Graph Completion with Link Aware Fusion and Aggregation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8957–65. http://dx.doi.org/10.1609/aaai.v38i8.28744.

Повний текст джерела
Анотація:
Recently, an enormous amount of research has emerged on multimodal knowledge graph completion (MKGC), which seeks to extract knowledge from multimodal data and predict the most plausible missing facts to complete a given multimodal knowledge graph (MKG). However, existing MKGC approaches largely ignore that visual information may introduce noise and lead to uncertainty when adding them to the traditional KG embeddings due to the contribution of each associated image to entity is different in diverse link scenarios. Moreover, treating each triple independently when learning entity embeddings le
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Sun, Zhongkai, Prathusha Sarma, William Sethares, and Yingyu Liang. "Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8992–99. http://dx.doi.org/10.1609/aaai.v34i05.6431.

Повний текст джерела
Анотація:
Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment anal
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Повний текст джерела
Анотація:
AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves re
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Mihail Mateev. "Comparative Analysis on Implementing Embeddings for Image Analysis." Journal of Information Systems Engineering and Management 10, no. 17s (2025): 89–102. https://doi.org/10.52783/jisem.v10i17s.2710.

Повний текст джерела
Анотація:
This research explores how artificial intelligence enhances construction maintenance and diagnostics, achieving 95% accuracy on a dataset of 10,000 cases. The findings highlight AI's potential to revolutionize predictive maintenance in the industry. The growing adoption of image embeddings has transformed visual data processing across AI applications. This study evaluates embedding implementations in major platforms, including Azure AI, OpenAI's GPT-4 Vision, and frameworks like Hugging Face, Replicate, and Eden AI. It assesses their scalability, accuracy, cost-effectiveness, and integration f
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Tang, Zhenchao, Jiehui Huang, Guanxing Chen, and Calvin Yu-Chian Chen. "Comprehensive View Embedding Learning for Single-Cell Multimodal Integration." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15292–300. http://dx.doi.org/10.1609/aaai.v38i14.29453.

Повний текст джерела
Анотація:
Motivation: Advances in single-cell measurement techniques provide rich multimodal data, which helps us to explore the life state of cells more deeply. However, multimodal integration, or, learning joint embeddings from multimodal data remains a current challenge. The difficulty in integrating unpaired single-cell multimodal data is that different modalities have different feature spaces, which easily leads to information loss in joint embedding. And few existing methods have fully exploited and fused the information in single-cell multimodal data. Result: In this study, we propose CoVEL, a de
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Zhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.

Повний текст джерела
Анотація:
Previous work has shown the effectiveness of using event representations for tasks such as script event prediction and stock market prediction. It is however still challenging to learn the subtle semantic differences between events based solely on textual descriptions of events often represented as (subject, predicate, object) triples. As an alternative, images offer a more intuitive way of understanding event semantics. We observe that event described in text and in images show different abstraction levels and therefore should be projected onto heterogeneous embedding spaces, as opposed to wh
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Sah, Shagan, Sabarish Gopalakishnan, and Raymond Ptucha. "Aligned attention for common multimodal embeddings." Journal of Electronic Imaging 29, no. 02 (2020): 1. http://dx.doi.org/10.1117/1.jei.29.2.023013.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Alkaabi, Hussein, Ali Kadhim Jasim, and Ali Darroudi. "From Static to Contextual: A Survey of Embedding Advances in NLP." PERFECT: Journal of Smart Algorithms 2, no. 2 (2025): 57–66. https://doi.org/10.62671/perfect.v2i2.77.

Повний текст джерела
Анотація:
Embedding techniques have been a cornerstone of Natural Language Processing (NLP), enabling machines to represent textual data in a form that captures semantic and syntactic relationships. Over the years, the field has witnessed a significant evolution—from static word embeddings, such as Word2Vec and GloVe, which represent words as fixed vectors, to dynamic, contextualized embeddings like BERT and GPT, which generate word representations based on their surrounding context. This survey provides a comprehensive overview of embedding techniques, tracing their development from early methods to st
Стилі APA, Harvard, Vancouver, ISO та ін.
Більше джерел

Дисертації з теми "Multimodal Embeddings"

1

Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.

Повний текст джерела
Анотація:
De nos jours l’Intelligence artificielle (IA) est omniprésente dans notre société. Le récent développement des méthodes d’apprentissage basé sur les réseaux de neurones profonds aussi appelé “Deep Learning” a permis une nette amélioration des modèles de représentation visuelle et textuelle. Cette thèse aborde la question de l’apprentissage de plongements multimodaux pour représenter conjointement des données visuelles et sémantiques. C’est une problématique centrale dans le contexte actuel de l’IA et du deep learning, qui présente notamment un très fort potentiel pour l’interprétabilité des mo
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Deschamps-Berger, Théo. "Social Emotion Recognition with multimodal deep learning architecture in emergency call centers." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG036.

Повний текст джерела
Анотація:
Cette thèse porte sur les systèmes de reconnaissance automatique des émotions dans la parole, dans un contexte d'urgence médicale. Elle aborde certains des défis rencontrés lors de l'étude des émotions dans les interactions sociales et est ancrée dans les théories modernes des émotions, en particulier celles de Lisa Feldman Barrett sur la construction des émotions. En effet, la manifestation des émotions spontanées dans les interactions humaines est complexe et souvent caractérisée par des nuances, des mélanges et étroitement liée au contexte. Cette étude est fondée sur le corpus CEMO, composé
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.

Повний текст джерела
Анотація:
La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Rubio, Romano Antonio. "Fashion discovery : a computer vision approach." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.

Повний текст джерела
Анотація:
Performing semantic interpretation of fashion images is undeniably one of the most challenging domains for computer vision. Subtle variations in color and shape might confer different meanings or interpretations to an image. Not only is it a domain tightly coupled with human understanding, but also with scene interpretation and context. Being able to extract fashion-specific information from images and interpret that information in a proper manner can be useful in many situations and help understanding the underlying information in an image. Fashion is also one of the most important bus
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Couairon, Guillaume. "Text-Based Semantic Image Editing." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.

Повний текст джерела
Анотація:
L’objectif de cette thèse est de proposer des algorithmes pour la tâche d’édition d’images basée sur le texte (TIE), qui consiste à éditer des images numériques selon une instruction formulée en langage naturel. Par exemple, étant donné une image d’un chien et la requête "Changez le chien en un chat", nous voulons produire une nouvelle image où le chien a été remplacé par un chat, en gardant tous les autres aspects de l’image inchangés (couleur et pose de l’animal, arrière- plan). L’objectif de l’étoile du nord est de permettre à tout un chacun de modifier ses images en util
Стилі APA, Harvard, Vancouver, ISO та ін.
6

ur, Réhman Shafiq. "Expressing emotions through vibration for perception and control." Doctoral thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-32990.

Повний текст джерела
Анотація:
This thesis addresses a challenging problem: “how to let the visually impaired ‘see’ others emotions”. We, human beings, are heavily dependent on facial expressions to express ourselves. A smile shows that the person you are talking to is pleased, amused, relieved etc. People use emotional information from facial expressions to switch between conversation topics and to determine attitudes of individuals. Missing emotional information from facial expressions and head gestures makes the visually impaired extremely difficult to interact with others in social events. To enhance the visually impair
Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Multimodal Embeddings"

1

Zhao, Xiang, Weixin Zeng, and Jiuyang Tang. "Multimodal Entity Alignment." In Entity Alignment. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4250-3_9.

Повний текст джерела
Анотація:
AbstractIn various tasks related to artificial intelligence, data is often present in multiple forms or modalities. Recently, it has become a popular approach to combine these different forms of information into a knowledge graph, creating a multi-modal knowledge graph (MMKG). However, multi-modal knowledge graphs (MMKGs) often face issues of insufficient data coverage and incompleteness. In order to address this issue, a possible strategy is to incorporate supplemental information from other multi-modal knowledge graphs (MMKGs). To achieve this goal, current methods for aligning entities could be utilized; however, these approaches work within the Euclidean space, and the resulting entity representations can distort the hierarchical structure of the knowledge graph. Additionally, the potential benefits of visual information have not been fully utilized.To address these concerns, we present a new approach for aligning entities across multiple modalities, which we call hyperbolic multi-modal entity alignment (). This method expands upon the conventional Euclidean representation by incorporating a hyperboloid manifold. Initially, we utilize hyperbolic graph convolutional networks() to acquire structural representations of entities. In terms of visual data, we create image embeddings using the model and subsequently map them into the hyperbolic space utilizing . Lastly, we merge the structural and visual representations within the hyperbolic space and utilize the combined embeddings to forecast potential entity alignment outcomes. Through a series of thorough experiments and ablation studies, we validate the efficacy of our proposed model and its individual components.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Gao, Yuan, Sangwook Kim, David E. Austin, and Chris McIntosh. "MEDBind: Unifying Language and Multimodal Medical Data Embeddings." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-72390-2_21.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Dolphin, Rian, Barry Smyth, and Ruihai Dong. "A Machine Learning Approach to Industry Classification in Financial Markets." In Communications in Computer and Information Science. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_7.

Повний текст джерела
Анотація:
AbstractIndustry classification schemes provide a taxonomy for segmenting companies based on their business activities. They are relied upon in industry and academia as an integral component of many types of financial and economic analysis. However, even modern classification schemes have failed to embrace the era of big data and remain a largely subjective undertaking prone to inconsistency and misclassification. To address this, we propose a multimodal neural model for training company embeddings, which harnesses the dynamics of both historical pricing data and financial news to learn objective company representations that capture nuanced relationships. We explain our approach in detail and highlight the utility of the embeddings through several case studies and application to the downstream task of industry classification.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Gornishka, Iva, Stevan Rudinac, and Marcel Worring. "Interactive Search and Exploration in Discussion Forums Using Multimodal Embeddings." In MultiMedia Modeling. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-37734-2_32.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Dadwal, Rajjat, Ran Yu, and Elena Demidova. "A Multimodal and Multitask Approach for Adaptive Geospatial Region Embeddings." In Advances in Knowledge Discovery and Data Mining. Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-2262-4_29.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Pandey, Sandeep Kumar, Hanumant Singh Shekhawat, Shalendar Bhasin, Ravi Jasuja, and S. R. M. Prasanna. "Alzheimer’s Dementia Recognition Using Multimodal Fusion of Speech and Text Embeddings." In Intelligent Human Computer Interaction. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98404-5_64.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Choe, Subeen, Jihyeon Oh, and Jihoon Yang. "Multimodal Contrastive Learning for Dialogue Embeddings with Global and Local Views." In Lecture Notes in Computer Science. Springer Nature Singapore, 2025. https://doi.org/10.1007/978-981-96-8180-8_13.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Gerber, Jonathan, Bruno Kreiner, Jasmin Saxer, and Andreas Weiler. "Towards Website X-Ray for Europe’s Municipalities: Unveiling Digital Transformation with Multimodal Embeddings." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. https://doi.org/10.1007/978-3-031-78090-5_11.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Praveen Kumar, T., and Lavanya Pamulaparty. "Enhancing Sentiment Analysis with Deep Learning Models and BERT Word Embeddings for Multimodal Reviews." In Cognitive Science and Technology. Springer Nature Singapore, 2025. https://doi.org/10.1007/978-981-97-9266-5_6.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Zhou, Liting, and Cathal Gurrin. "Multimodal Embedding for Lifelog Retrieval." In MultiMedia Modeling. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98358-1_33.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Multimodal Embeddings"

1

Liu, Ruizhou, Zongsheng Cao, Zhe Wu, Qianqian Xu, and Qingming Huang. "Multimodal Knowledge Graph Embeddings via Lorentz-based Contrastive Learning." In 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2024. http://dx.doi.org/10.1109/icme57554.2024.10687608.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Heo, Serin, Jehyun Kyung, and Joon-Hyuk Chang. "Multimodal Emotion Recognition with Target Speaker-Based Facial Embeddings." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10888205.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Dai, Wenliang, Zihan Liu, Tiezheng Yu, and Pascale Fung. "Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition." In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.aacl-main.30.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Takemaru, Lina, Shu Yang, Ruiming Wu, et al. "Mapping Alzheimer’s Disease Pseudo-Progression With Multimodal Biomarker Trajectory Embeddings." In 2024 IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, 2024. http://dx.doi.org/10.1109/isbi56570.2024.10635249.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Oliveira, Artur, Mateus Espadoto, Roberto Hirata Jr., and Roberto Cesar Jr. "Improving Image Classification Tasks Using Fused Embeddings and Multimodal Models." In 20th International Conference on Computer Vision Theory and Applications. SCITEPRESS - Science and Technology Publications, 2025. https://doi.org/10.5220/0013365600003912.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Zhong, Jiayang, Fuyao Chen, Lihui Chen, Dennis Shung, and John A. Onofrey. "Conditional Convolution of Clinical Data Embeddings for Multimodal Prostate Cancer Classification." In 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). IEEE, 2025. https://doi.org/10.1109/isbi60581.2025.10981307.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Arshad, Aresha, Momina Moetesum, Adnan Ul Hasan, and Faisal Shafait. "Enhancing Multimodal Information Extraction from Visually Rich Documents with 2D Positional Embeddings." In 2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2024. https://doi.org/10.1109/dicta63115.2024.00087.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Garaiman, Florian Enrico, and Anamaria Radoi. "Multimodal Emotion Recognition System based on X-Vector Embeddings and Convolutional Neural Networks." In 2024 15th International Conference on Communications (COMM). IEEE, 2024. http://dx.doi.org/10.1109/comm62355.2024.10741406.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Adiputra, Andro Aprila, Ahmada Yusril Kadiptya, Thi-Thu-Huong Le, JunYoung Son, and Howon Kim. "Enhancing Contextual Understanding with Multimodal Siamese Networks Using Contrastive Loss and Text Embeddings." In 2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 2025. https://doi.org/10.1109/icaiic64266.2025.10920874.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Lewis, Nora, Charles C. Cavalcante, Zois Boukouvalas, and Roberto Corizzo. "On the Effectiveness of Text and Image Embeddings in Multimodal Hate Speech Detection." In 2024 IEEE International Conference on Big Data (BigData). IEEE, 2024. https://doi.org/10.1109/bigdata62323.2024.10826088.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!