Gotowe bibliografie tematyczne / Multimodal Embeddings

Spis treści

Artykuły w czasopismach
Rozprawy doktorskie
Części książek
Streszczenia konferencji

Gotowa bibliografia na temat „Multimodal Embeddings”

Autor: Grafiati

Data publikacji: 26 października 2024

Data aktualizacji: 31 lipca 2025

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Multimodal Embeddings”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Artykuły w czasopismach na temat "Multimodal Embeddings"

Tyshchuk, Kirill, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev, and Alexander Panchenko. "On Isotropy of Multimodal Embeddings." Information 14, no. 7 (2023): 392. http://dx.doi.org/10.3390/info14070392.

Pełny tekst źródła

Streszczenie:

Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for

Style APA, Harvard, Vancouver, ISO itp.

Guo, Zhiqiang, Jianjun Li, Guohui Li, Chaoyang Wang, Si Shi, and Bin Ruan. "LGMRec: Local and Global Graph Learning for Multimodal Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8454–62. http://dx.doi.org/10.1609/aaai.v38i8.28688.

Pełny tekst źródła

Streszczenie:

The multimodal recommendation has gradually become the infrastructure of online media platforms, enabling them to provide personalized service to users through a joint modeling of user historical behaviors (e.g., purchases, clicks) and item various modalities (e.g., visual and textual). The majority of existing studies typically focus on utilizing modal features or modal-related graph structure to learn user local interests. Nevertheless, these approaches encounter two limitations: (1) Shared updates of user ID embeddings result in the consequential coupling between collaboration and multimoda

Style APA, Harvard, Vancouver, ISO itp.

Shang, Bin, Yinliang Zhao, Jun Liu, and Di Wang. "LAFA: Multimodal Knowledge Graph Completion with Link Aware Fusion and Aggregation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (2024): 8957–65. http://dx.doi.org/10.1609/aaai.v38i8.28744.

Pełny tekst źródła

Streszczenie:

Recently, an enormous amount of research has emerged on multimodal knowledge graph completion (MKGC), which seeks to extract knowledge from multimodal data and predict the most plausible missing facts to complete a given multimodal knowledge graph (MKG). However, existing MKGC approaches largely ignore that visual information may introduce noise and lead to uncertainty when adding them to the traditional KG embeddings due to the contribution of each associated image to entity is different in diverse link scenarios. Moreover, treating each triple independently when learning entity embeddings le

Style APA, Harvard, Vancouver, ISO itp.

Sun, Zhongkai, Prathusha Sarma, William Sethares, and Yingyu Liang. "Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8992–99. http://dx.doi.org/10.1609/aaai.v34i05.6431.

Pełny tekst źródła

Streszczenie:

Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment anal

Style APA, Harvard, Vancouver, ISO itp.

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Pełny tekst źródła

Streszczenie:

AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves re

Style APA, Harvard, Vancouver, ISO itp.

Mihail Mateev. "Comparative Analysis on Implementing Embeddings for Image Analysis." Journal of Information Systems Engineering and Management 10, no. 17s (2025): 89–102. https://doi.org/10.52783/jisem.v10i17s.2710.

Pełny tekst źródła

Streszczenie:

This research explores how artificial intelligence enhances construction maintenance and diagnostics, achieving 95% accuracy on a dataset of 10,000 cases. The findings highlight AI's potential to revolutionize predictive maintenance in the industry. The growing adoption of image embeddings has transformed visual data processing across AI applications. This study evaluates embedding implementations in major platforms, including Azure AI, OpenAI's GPT-4 Vision, and frameworks like Hugging Face, Replicate, and Eden AI. It assesses their scalability, accuracy, cost-effectiveness, and integration f

Style APA, Harvard, Vancouver, ISO itp.

Tang, Zhenchao, Jiehui Huang, Guanxing Chen, and Calvin Yu-Chian Chen. "Comprehensive View Embedding Learning for Single-Cell Multimodal Integration." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15292–300. http://dx.doi.org/10.1609/aaai.v38i14.29453.

Pełny tekst źródła

Streszczenie:

Motivation: Advances in single-cell measurement techniques provide rich multimodal data, which helps us to explore the life state of cells more deeply. However, multimodal integration, or, learning joint embeddings from multimodal data remains a current challenge. The difficulty in integrating unpaired single-cell multimodal data is that different modalities have different feature spaces, which easily leads to information loss in joint embedding. And few existing methods have fully exploited and fused the information in single-cell multimodal data. Result: In this study, we propose CoVEL, a de

Style APA, Harvard, Vancouver, ISO itp.

Zhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.

Pełny tekst źródła

Streszczenie:

Previous work has shown the effectiveness of using event representations for tasks such as script event prediction and stock market prediction. It is however still challenging to learn the subtle semantic differences between events based solely on textual descriptions of events often represented as (subject, predicate, object) triples. As an alternative, images offer a more intuitive way of understanding event semantics. We observe that event described in text and in images show different abstraction levels and therefore should be projected onto heterogeneous embedding spaces, as opposed to wh

Style APA, Harvard, Vancouver, ISO itp.

Sah, Shagan, Sabarish Gopalakishnan, and Raymond Ptucha. "Aligned attention for common multimodal embeddings." Journal of Electronic Imaging 29, no. 02 (2020): 1. http://dx.doi.org/10.1117/1.jei.29.2.023013.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Alkaabi, Hussein, Ali Kadhim Jasim, and Ali Darroudi. "From Static to Contextual: A Survey of Embedding Advances in NLP." PERFECT: Journal of Smart Algorithms 2, no. 2 (2025): 57–66. https://doi.org/10.62671/perfect.v2i2.77.

Pełny tekst źródła

Streszczenie:

Embedding techniques have been a cornerstone of Natural Language Processing (NLP), enabling machines to represent textual data in a form that captures semantic and syntactic relationships. Over the years, the field has witnessed a significant evolution—from static word embeddings, such as Word2Vec and GloVe, which represent words as fixed vectors, to dynamic, contextualized embeddings like BERT and GPT, which generate word representations based on their surrounding context. This survey provides a comprehensive overview of embedding techniques, tracing their development from early methods to st

Style APA, Harvard, Vancouver, ISO itp.

Więcej źródeł

Rozprawy doktorskie na temat "Multimodal Embeddings"

Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.

Pełny tekst źródła

Streszczenie:

De nos jours l’Intelligence artificielle (IA) est omniprésente dans notre société. Le récent développement des méthodes d’apprentissage basé sur les réseaux de neurones profonds aussi appelé “Deep Learning” a permis une nette amélioration des modèles de représentation visuelle et textuelle. Cette thèse aborde la question de l’apprentissage de plongements multimodaux pour représenter conjointement des données visuelles et sémantiques. C’est une problématique centrale dans le contexte actuel de l’IA et du deep learning, qui présente notamment un très fort potentiel pour l’interprétabilité des mo

Style APA, Harvard, Vancouver, ISO itp.

Deschamps-Berger, Théo. "Social Emotion Recognition with multimodal deep learning architecture in emergency call centers." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG036.

Pełny tekst źródła

Streszczenie:

Cette thèse porte sur les systèmes de reconnaissance automatique des émotions dans la parole, dans un contexte d'urgence médicale. Elle aborde certains des défis rencontrés lors de l'étude des émotions dans les interactions sociales et est ancrée dans les théories modernes des émotions, en particulier celles de Lisa Feldman Barrett sur la construction des émotions. En effet, la manifestation des émotions spontanées dans les interactions humaines est complexe et souvent caractérisée par des nuances, des mélanges et étroitement liée au contexte. Cette étude est fondée sur le corpus CEMO, composé

Style APA, Harvard, Vancouver, ISO itp.

Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.

Pełny tekst źródła

Streszczenie:

La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et

Style APA, Harvard, Vancouver, ISO itp.

Rubio, Romano Antonio. "Fashion discovery : a computer vision approach." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.

Pełny tekst źródła

Streszczenie:

Performing semantic interpretation of fashion images is undeniably one of the most challenging domains for computer vision. Subtle variations in color and shape might confer different meanings or interpretations to an image. Not only is it a domain tightly coupled with human understanding, but also with scene interpretation and context. Being able to extract fashion-specific information from images and interpret that information in a proper manner can be useful in many situations and help understanding the underlying information in an image. Fashion is also one of the most important bus

Style APA, Harvard, Vancouver, ISO itp.

Couairon, Guillaume. "Text-Based Semantic Image Editing." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.

Pełny tekst źródła

Streszczenie:

L’objectif de cette thèse est de proposer des algorithmes pour la tâche d’édition d’images basée sur le texte (TIE), qui consiste à éditer des images numériques selon une instruction formulée en langage naturel. Par exemple, étant donné une image d’un chien et la requête "Changez le chien en un chat", nous voulons produire une nouvelle image où le chien a été remplacé par un chat, en gardant tous les autres aspects de l’image inchangés (couleur et pose de l’animal, arrière- plan). L’objectif de l’étoile du nord est de permettre à tout un chacun de modifier ses images en util

Style APA, Harvard, Vancouver, ISO itp.

ur, Réhman Shafiq. "Expressing emotions through vibration for perception and control." Doctoral thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-32990.

Pełny tekst źródła

Streszczenie:

This thesis addresses a challenging problem: “how to let the visually impaired ‘see’ others emotions”. We, human beings, are heavily dependent on facial expressions to express ourselves. A smile shows that the person you are talking to is pleased, amused, relieved etc. People use emotional information from facial expressions to switch between conversation topics and to determine attitudes of individuals. Missing emotional information from facial expressions and head gestures makes the visually impaired extremely difficult to interact with others in social events. To enhance the visually impair

Style APA, Harvard, Vancouver, ISO itp.

Części książek na temat "Multimodal Embeddings"

Zhao, Xiang, Weixin Zeng, and Jiuyang Tang. "Multimodal Entity Alignment." In Entity Alignment. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-4250-3_9.

Pełny tekst źródła

Streszczenie:

AbstractIn various tasks related to artificial intelligence, data is often present in multiple forms or modalities. Recently, it has become a popular approach to combine these different forms of information into a knowledge graph, creating a multi-modal knowledge graph (MMKG). However, multi-modal knowledge graphs (MMKGs) often face issues of insufficient data coverage and incompleteness. In order to address this issue, a possible strategy is to incorporate supplemental information from other multi-modal knowledge graphs (MMKGs). To achieve this goal, current methods for aligning entities coul

Style APA, Harvard, Vancouver, ISO itp.

Gao, Yuan, Sangwook Kim, David E. Austin, and Chris McIntosh. "MEDBind: Unifying Language and Multimodal Medical Data Embeddings." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-72390-2_21.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Dolphin, Rian, Barry Smyth, and Ruihai Dong. "A Machine Learning Approach to Industry Classification in Financial Markets." In Communications in Computer and Information Science. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_7.

Pełny tekst źródła

Streszczenie:

AbstractIndustry classification schemes provide a taxonomy for segmenting companies based on their business activities. They are relied upon in industry and academia as an integral component of many types of financial and economic analysis. However, even modern classification schemes have failed to embrace the era of big data and remain a largely subjective undertaking prone to inconsistency and misclassification. To address this, we propose a multimodal neural model for training company embeddings, which harnesses the dynamics of both historical pricing data and financial news to learn object

Style APA, Harvard, Vancouver, ISO itp.

Gornishka, Iva, Stevan Rudinac, and Marcel Worring. "Interactive Search and Exploration in Discussion Forums Using Multimodal Embeddings." In MultiMedia Modeling. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-37734-2_32.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Dadwal, Rajjat, Ran Yu, and Elena Demidova. "A Multimodal and Multitask Approach for Adaptive Geospatial Region Embeddings." In Advances in Knowledge Discovery and Data Mining. Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-2262-4_29.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Pandey, Sandeep Kumar, Hanumant Singh Shekhawat, Shalendar Bhasin, Ravi Jasuja, and S. R. M. Prasanna. "Alzheimer’s Dementia Recognition Using Multimodal Fusion of Speech and Text Embeddings." In Intelligent Human Computer Interaction. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98404-5_64.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Choe, Subeen, Jihyeon Oh, and Jihoon Yang. "Multimodal Contrastive Learning for Dialogue Embeddings with Global and Local Views." In Lecture Notes in Computer Science. Springer Nature Singapore, 2025. https://doi.org/10.1007/978-981-96-8180-8_13.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Gerber, Jonathan, Bruno Kreiner, Jasmin Saxer, and Andreas Weiler. "Towards Website X-Ray for Europe’s Municipalities: Unveiling Digital Transformation with Multimodal Embeddings." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2024. https://doi.org/10.1007/978-3-031-78090-5_11.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Praveen Kumar, T., and Lavanya Pamulaparty. "Enhancing Sentiment Analysis with Deep Learning Models and BERT Word Embeddings for Multimodal Reviews." In Cognitive Science and Technology. Springer Nature Singapore, 2025. https://doi.org/10.1007/978-981-97-9266-5_6.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Zhou, Liting, and Cathal Gurrin. "Multimodal Embedding for Lifelog Retrieval." In MultiMedia Modeling. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98358-1_33.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Streszczenia konferencji na temat "Multimodal Embeddings"

Liu, Ruizhou, Zongsheng Cao, Zhe Wu, Qianqian Xu, and Qingming Huang. "Multimodal Knowledge Graph Embeddings via Lorentz-based Contrastive Learning." In 2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2024. http://dx.doi.org/10.1109/icme57554.2024.10687608.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Heo, Serin, Jehyun Kyung, and Joon-Hyuk Chang. "Multimodal Emotion Recognition with Target Speaker-Based Facial Embeddings." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10888205.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Dai, Wenliang, Zihan Liu, Tiezheng Yu, and Pascale Fung. "Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition." In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.aacl-main.30.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Takemaru, Lina, Shu Yang, Ruiming Wu, et al. "Mapping Alzheimer’s Disease Pseudo-Progression With Multimodal Biomarker Trajectory Embeddings." In 2024 IEEE International Symposium on Biomedical Imaging (ISBI). IEEE, 2024. http://dx.doi.org/10.1109/isbi56570.2024.10635249.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Oliveira, Artur, Mateus Espadoto, Roberto Hirata Jr., and Roberto Cesar Jr. "Improving Image Classification Tasks Using Fused Embeddings and Multimodal Models." In 20th International Conference on Computer Vision Theory and Applications. SCITEPRESS - Science and Technology Publications, 2025. https://doi.org/10.5220/0013365600003912.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Zhong, Jiayang, Fuyao Chen, Lihui Chen, Dennis Shung, and John A. Onofrey. "Conditional Convolution of Clinical Data Embeddings for Multimodal Prostate Cancer Classification." In 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI). IEEE, 2025. https://doi.org/10.1109/isbi60581.2025.10981307.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Arshad, Aresha, Momina Moetesum, Adnan Ul Hasan, and Faisal Shafait. "Enhancing Multimodal Information Extraction from Visually Rich Documents with 2D Positional Embeddings." In 2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2024. https://doi.org/10.1109/dicta63115.2024.00087.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Garaiman, Florian Enrico, and Anamaria Radoi. "Multimodal Emotion Recognition System based on X-Vector Embeddings and Convolutional Neural Networks." In 2024 15th International Conference on Communications (COMM). IEEE, 2024. http://dx.doi.org/10.1109/comm62355.2024.10741406.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Adiputra, Andro Aprila, Ahmada Yusril Kadiptya, Thi-Thu-Huong Le, JunYoung Son, and Howon Kim. "Enhancing Contextual Understanding with Multimodal Siamese Networks Using Contrastive Loss and Text Embeddings." In 2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). IEEE, 2025. https://doi.org/10.1109/icaiic64266.2025.10920874.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Lewis, Nora, Charles C. Cavalcante, Zois Boukouvalas, and Roberto Corizzo. "On the Effectiveness of Text and Image Embeddings in Multimodal Hate Speech Detection." In 2024 IEEE International Conference on Big Data (BigData). IEEE, 2024. https://doi.org/10.1109/bigdata62323.2024.10826088.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!