Acceder

Bibliografías temáticas / Visual and semantic embedding / Artículos de revistas

Siga este enlace para ver otros tipos de publicaciones sobre el tema: Visual and semantic embedding.

Artículos de revistas sobre el tema "Visual and semantic embedding"

Autor: Grafiati

Publicado: 25 de mayo de 2024

Última modificación: 31 de julio de 2025

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Visual and semantic embedding".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Zhang, Yuanpeng, Jingye Guan, Haobo Wang, Kaiming Li, Ying Luo, and Qun Zhang. "Generalized Zero-Shot Space Target Recognition Based on Global-Local Visual Feature Embedding Network." Remote Sensing 15, no. 21 (2023): 5156. http://dx.doi.org/10.3390/rs15215156.

Texto completo

Resumen

Existing deep learning-based space target recognition methods rely on abundantly labeled samples and are not capable of recognizing samples from unseen classes without training. In this article, based on generalized zero-shot learning (GZSL), we propose a space target recognition framework to simultaneously recognize space targets from both seen and unseen classes. First, we defined semantic attributes to describe the characteristics of different categories of space targets. Second, we constructed a dual-branch neural network, termed the global-local visual feature embedding network (GLVFENet)

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Yeh, Mei-Chen, and Yi-Nan Li. "Multilabel Deep Visual-Semantic Embedding." IEEE Transactions on Pattern Analysis and Machine Intelligence 42, no. 6 (2020): 1530–36. http://dx.doi.org/10.1109/tpami.2019.2911065.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Liu, Yang, Mengyuan Liu, Shudong Huang, and Jiancheng Lv. "Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 6 (2025): 5676–84. https://doi.org/10.1609/aaai.v39i6.32605.

Texto completo

Resumen

Learning visual semantic similarity is a critical challenge in bridging the gap between images and texts. However, there exist inherent variations between vision and language data, such as information density, i.e., images can contain textual information from multiple different views, which makes it difficult to compute the similarity between these two modalities accurately and efficiently. In this paper, we propose a novel framework called Asymmetric Visual Semantic Embedding (AVSE) to dynamically select features from various regions of images tailored to different textual inputs for similari

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Merkx, Danny, and Stefan L. Frank. "Learning semantic sentence representations from visually grounded language without lexical knowledge." Natural Language Engineering 25, no. 4 (2019): 451–66. http://dx.doi.org/10.1017/s1351324919000196.

Texto completo

Resumen

AbstractCurrent approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves re

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Ge, Jiannan, Hongtao Xie, Shaobo Min, and Yongdong Zhang. "Semantic-guided Reinforced Region Embedding for Generalized Zero-Shot Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (2021): 1406–14. http://dx.doi.org/10.1609/aaai.v35i2.16230.

Texto completo

Resumen

Generalized zero-shot Learning (GZSL) aims to recognize images from either seen or unseen domain, mainly by learning a joint embedding space to associate image features with the corresponding category descriptions. Recent methods have proved that localizing important object regions can effectively bridge the semantic-visual gap. However, these are all based on one-off visual localizers, lacking of interpretability and flexibility. In this paper, we propose a novel Semantic-guided Reinforced Region Embedding (SR2E) network that can localize important objects in the long-term interests to constr

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Zhou, Mo, Zhenxing Niu, Le Wang, Zhanning Gao, Qilin Zhang, and Gang Hua. "Ladder Loss for Coherent Visual-Semantic Embedding." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 13050–57. http://dx.doi.org/10.1609/aaai.v34i07.7006.

Texto completo

Resumen

For visual-semantic embedding, the existing methods normally treat the relevance between queries and candidates in a bipolar way – relevant or irrelevant, and all “irrelevant” candidates are uniformly pushed away from the query by an equal margin in the embedding space, regardless of their various proximity to the query. This practice disregards relatively discriminative information and could lead to suboptimal ranking in the retrieval results and poorer user experience, especially in the long-tail query scenario where a matching candidate may not necessarily exist. In this paper, we introduce

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Nguyen, Huy Manh, Tomo Miyazaki, Yoshihiro Sugaya, and Shinichiro Omachi. "Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence." Applied Sciences 11, no. 7 (2021): 3214. http://dx.doi.org/10.3390/app11073214.

Texto completo

Resumen

Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos and sentences. In this paper, we propose a novel framework that maps instances into multiple individual embedding spaces so that we can capture multiple relationships between instances, leading to com

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

MATSUBARA, Takashi. "Target-Oriented Deformation of Visual-Semantic Embedding Space." IEICE Transactions on Information and Systems E104.D, no. 1 (2021): 24–33. http://dx.doi.org/10.1587/transinf.2020mup0003.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Keller, Patrick, Abdoul Kader Kaboré, Laura Plein, Jacques Klein, Yves Le Traon, and Tegawendé F. Bissyandé. "What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning." ACM Transactions on Software Engineering and Methodology 31, no. 2 (2022): 1–34. http://dx.doi.org/10.1145/3485135.

Texto completo

Resumen

Recent successes in training word embeddings for Natural Language Processing ( NLP ) tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robust or non-generalizable. In this work, we investigat

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Tang, Qi, Yao Zhao, Meiqin Liu, Jian Jin, and Chao Yao. "Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 6 (2024): 5154–61. http://dx.doi.org/10.1609/aaai.v38i6.28321.

Texto completo

Resumen

As a critical clue of video super-resolution (VSR), inter-frame alignment significantly impacts overall performance. However, accurate pixel-level alignment is a challenging task due to the intricate motion interweaving in the video. In response to this issue, we introduce a novel paradigm for VSR named Semantic Lens, predicated on semantic priors drawn from degraded videos. Specifically, video is modeled as instances, events, and scenes via a Semantic Extractor. Those semantics assist the Pixel Enhancer in understanding the recovered contents and generating more realistic visual results. The

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

He, Hai, and Haibo Yang. "Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization." Mathematical Problems in Engineering 2021 (May 28, 2021): 1–8. http://dx.doi.org/10.1155/2021/6654071.

Texto completo

Resumen

Language and vision are the two most essential parts of human intelligence for interpreting the real world around us. How to make connections between language and vision is the key point in current research. Multimodality methods like visual semantic embedding have been widely studied recently, which unify images and corresponding texts into the same feature space. Inspired by the recent development of text data augmentation and a simple but powerful technique proposed called EDA (easy data augmentation), we can expand the information with given data using EDA to improve the performance of mod

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Chen, Shiming, Ziming Hong, Yang Liu, et al. "TransZero: Attribute-Guided Transformer for Zero-Shot Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (2022): 330–38. http://dx.doi.org/10.1609/aaai.v36i1.19909.

Texto completo

Resumen

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is learned from attribute descriptions shared between different classes, which are strong prior for localization of object attribute for representing discriminative region features enabling significant visual-semantic interaction. Although few attention-based models have attempted to learn such region features in a single image, the transferability and discriminative attribute localization of visual features are typically neglected. In this paper, we

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Seo, Sanghyun, and Juntae Kim. "Hierarchical Semantic Loss and Confidence Estimator for Visual-Semantic Embedding-Based Zero-Shot Learning." Applied Sciences 9, no. 15 (2019): 3133. http://dx.doi.org/10.3390/app9153133.

Texto completo

Resumen

Traditional supervised learning is dependent on the label of the training data, so there is a limitation that the class label which is not included in the training data cannot be recognized properly. Therefore, zero-shot learning, which can recognize unseen-classes that are not used in training, is gaining research interest. One approach to zero-shot learning is to embed visual data such as images and rich semantic data related to text labels of visual data into a common vector space to perform zero-shot cross-modal retrieval on newly input unseen-class data. This paper proposes a hierarchical

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Liu, Huixia, and Zhihong Qin. "Deep quantization network with visual-semantic alignment for zero-shot image retrieval." Electronic Research Archive 31, no. 7 (2023): 4232–47. http://dx.doi.org/10.3934/era.2023215.

Texto completo

Resumen

<abstract><p>Approximate nearest neighbor (ANN) search has become an essential paradigm for large-scale image retrieval. Conventional ANN search requires the categories of query images to been seen in the training set. However, facing the rapid evolution of newly-emerging concepts on the web, it is too expensive to retrain the model via collecting labeled data with the new (unseen) concepts. Existing zero-shot hashing methods choose the semantic space or intermediate space as the embedding space, which ignore the inconsistency of visual space and semantic space and suffer from the

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Ma, Peirong, and Xiao Hu. "A Variational Autoencoder with Deep Embedding Model for Generalized Zero-Shot Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11733–40. http://dx.doi.org/10.1609/aaai.v34i07.6844.

Texto completo

Resumen

Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize not only unseen classes unavailable during training, but also seen classes used at training stage. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute space). Most existing GZSL methods usually learn a cross-modal mapping between the visual feature space and the semantic space. However, the mapping model learned only from the seen classes will produce an inherent bias when used in the unseen classes. In order to tackle such a problem, this pape

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

K. Dinesh Kumar, Et al. "Visual Storytelling: A Generative Adversarial Networks (GANs) and Graph Embedding Framework." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 9 (2023): 1899–906. http://dx.doi.org/10.17762/ijritcc.v11i9.9184.

Texto completo

Resumen

Visual storytelling is a powerful educational tool, using image sequences to convey complex ideas and establish emotional connections with the audience. A study at the Chinese University of Hong Kong found that 92.7% of students prefer visual storytelling through animation over text alone [21]. Our approach integrates dual coding and propositional theory to generate visual representations of text, such as graphs and images, thereby enhancing students' memory retention and visualization skills. We use Generative Adversarial Networks (GANs) with graph data to generate images while preserving sem

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Gorniak, P., and D. Roy. "Grounded Semantic Composition for Visual Scenes." Journal of Artificial Intelligence Research 21 (April 1, 2004): 429–70. http://dx.doi.org/10.1613/jair.1327.

Texto completo

Resumen

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Deutsch, Shay, Andrea Bertozzi, and Stefano Soatto. "Zero Shot Learning with the Isoperimetric Loss." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 10704–12. http://dx.doi.org/10.1609/aaai.v34i07.6698.

Texto completo

Resumen

We introduce the isoperimetric loss as a regularization criterion for learning the map from a visual representation to a semantic embedding, to be used to transfer knowledge to unknown classes in a zero-shot learning setting. We use a pre-trained deep neural network model as a visual representation of image data, a Word2Vec embedding of class labels, and linear maps between the visual and semantic embedding spaces. However, the spaces themselves are not linear, and we postulate the sample embedding to be populated by noisy samples near otherwise smooth manifolds. We exploit the graph structure

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Yang, Guan, Ayou Han, Xiaoming Liu, Yang Liu, Tao Wei, and Zhiyuan Zhang. "Enhancing Semantic-Consistent Features and Transforming Discriminative Features for Generalized Zero-Shot Classifications." Applied Sciences 12, no. 24 (2022): 12642. http://dx.doi.org/10.3390/app122412642.

Texto completo

Resumen

Generalized zero-shot learning (GZSL) aims to classify classes that do not appear during training. Recent state-of-the-art approaches rely on generative models, which use correlating semantic embeddings to synthesize unseen classes visual features; however, these approaches ignore the semantic and visual relevance, and visual features synthesized by generative models do not represent their semantics well. Although existing GZSL methods based on generative model disentanglement consider consistency between visual and semantic models, these methods consider semantic consistency only in the train

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Liu, Zijing, and Chenggang Wang. "A Few-shot Learning Method Using Relation Graph." Chinese Journal of Information Fusion 2, no. 1 (2025): 70–78. https://doi.org/10.62762/cjif.2025.146072.

Texto completo

Resumen

Few-shot learning aims to recognize new-class items under the circumstances with a few labeled support samples. However, many methods may suffer from poor guidance of limited new-class samples that are not suitable for being regarded as class centers. Recent works use word embedding to enrich the new-class distribution message but only use simple mapping between visual and semantic features during training. To solve the aforementioned problems, we propose a method that constructs a class relation graph by semantic meaning as guidance for feature extraction and fusion, to help the learning of t

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Wan, Ziyu, Yan Li, Min Yang, and Junge Zhang. "Transductive Zero-Shot Learning via Visual Center Adaptation." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 10059–60. http://dx.doi.org/10.1609/aaai.v33i01.330110059.

Texto completo

Resumen

In this paper, we propose a Visual Center Adaptation Method (VCAM) to address the domain shift problem in zero-shot learning. For the seen classes in the training data, VCAM builds an embedding space by learning the mapping from semantic space to some visual centers. While for unseen classes in the test data, the construction of embedding space is constrained by a symmetric Chamfer-distance term, aiming to adapt the distribution of the synthetic visual centers to that of the real cluster centers. Therefore the learned embedding space can generalize the unseen classes well. Experiments on two w

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Liu, Fangyu, Rongtian Ye, Xun Wang, and Shuaipeng Li. "HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11563–71. http://dx.doi.org/10.1609/aaai.v34i07.6823.

Texto completo

Resumen

The hubness problem widely exists in high-dimensional embedding space and is a fundamental source of error for cross-modal matching tasks. In this work, we study the emergence of hubs in Visual Semantic Embeddings (VSE) with application to text-image matching. We analyze the pros and cons of two widely adopted optimization objectives for training VSE and propose a novel hubness-aware loss function (Hal) that addresses previous methods' defects. Unlike (Faghri et al. 2018) which simply takes the hardest sample within a mini-batch, Hal takes all samples into account, using both local and global

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Qin, Xue-Yang, Li-Shuang Li, Jing-Yao Tang, Fei Hao, Mei-Ling Ge, and Guang-Yao Pang. "Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval." Journal of Computer Science and Technology 39, no. 4 (2024): 811–26. http://dx.doi.org/10.1007/s11390-024-4125-1.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

Zhang, Weifeng, Hua Hu, and Haiyang Hu. "Training Visual-Semantic Embedding Network for Boosting Automatic Image Annotation." Neural Processing Letters 48, no. 3 (2018): 1503–19. http://dx.doi.org/10.1007/s11063-017-9753-9.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

An, Rongqiao, Zhenjiang Miao, Qingyu Li, Wanru Xu, and Qiang Zhang. "Spatiotemporal visual-semantic embedding network for zero-shot action recognition." Journal of Electronic Imaging 28, no. 02 (2019): 1. http://dx.doi.org/10.1117/1.jei.28.2.023007.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Lin, Jiayi, Jiabo Huang, Jian Hu, and Shaogang Gong. "InvSeg: Test-Time Prompt Inversion for Semantic Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 5 (2025): 5245–53. https://doi.org/10.1609/aaai.v39i5.32557.

Texto completo

Resumen

Visual-textual correlations in the attention maps derived from text-to-image diffusion models are proven beneficial to dense visual prediction tasks, e.g., semantic segmentation. However, a significant challenge arises due to the input distributional discrepancy between the context-rich sentences used for image generation and the isolated class names typically used in semantic segmentation. This discrepancy hinders diffusion models from capturing accurate visual-textual correlations. To solve this, we propose InvSeg, a test-time prompt inversion method that tackles open-vocabulary semantic seg

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Zhang, Linhai, Deyu Zhou, Yulan He, and Zeng Yang. "MERL: Multimodal Event Representation Learning in Heterogeneous Embedding Spaces." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14420–27. http://dx.doi.org/10.1609/aaai.v35i16.17695.

Texto completo

Resumen

Previous work has shown the effectiveness of using event representations for tasks such as script event prediction and stock market prediction. It is however still challenging to learn the subtle semantic differences between events based solely on textual descriptions of events often represented as (subject, predicate, object) triples. As an alternative, images offer a more intuitive way of understanding event semantics. We observe that event described in text and in images show different abstraction levels and therefore should be projected onto heterogeneous embedding spaces, as opposed to wh

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Pan, Lizhi, Chengtian Song, Xiaozheng Gan, Keyu Xu, and Yue Xie. "Military Image Captioning for Low-Altitude UAV or UGV Perspectives." Drones 8, no. 9 (2024): 421. http://dx.doi.org/10.3390/drones8090421.

Texto completo

Resumen

Low-altitude unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), which boast high-resolution imaging and agile maneuvering capabilities, are widely utilized in military scenarios and generate a vast amount of image data that can be leveraged for textual intelligence generation to support military decision making. Military image captioning (MilitIC), as a visual-language learning task, provides innovative solutions for military image understanding and intelligence generation. However, the scarcity of military image datasets hinders the advancement of MilitIC methods, especially

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

Bi, Bei, Yaojun Wang, Haicang Zhang, and Yang Gao. "Microblog-HAN: A micro-blog rumor detection model based on heterogeneous graph attention network." PLOS ONE 17, no. 4 (2022): e0266598. http://dx.doi.org/10.1371/journal.pone.0266598.

Texto completo

Resumen

Although social media has highly facilitated people’s daily communication and dissemination of information, it has unfortunately been an ideal hotbed for the breeding and dissemination of Internet rumors. Therefore, automatically monitoring rumor dissemination in the early stage is of great practical significance. However, the existing detection methods fail to take full advantage of the semantics of the microblog information propagation graph. To address this shortcoming, this study models the information transmission network of a microblog as a heterogeneous graph with a variety of semantic

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

Bai, Haoyue, Haofeng Zhang, and Qiong Wang. "Dual discriminative auto-encoder network for zero shot image recognition." Journal of Intelligent & Fuzzy Systems 40, no. 3 (2021): 5159–70. http://dx.doi.org/10.3233/jifs-201920.

Texto completo

Resumen

Zero Shot learning (ZSL) aims to use the information of seen classes to recognize unseen classes, which is achieved by transferring knowledge of the seen classes from the semantic embeddings. Since the domains of the seen and unseen classes do not overlap, most ZSL algorithms often suffer from domain shift problem. In this paper, we propose a Dual Discriminative Auto-encoder Network (DDANet), in which visual features and semantic attributes are self-encoded by using the high dimensional latent space instead of the feature space or the low dimensional semantic space. In the embedded latent spac

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Fethfulwar, Sujal. "Semantic Based Image Indexing." International Journal for Research in Applied Science and Engineering Technology 13, no. 6 (2025): 1811–18. https://doi.org/10.22214/ijraset.2025.72528.

Texto completo

Resumen

This paper introduces a semantic-based image indexing system utilizing a custom Convolutional Neural Network (CNN) for feature extraction and semantic embedding techniques for understanding image content. Traditional image indexing methods rely heavily on low-level visual features, often resulting in inaccurate or irrelevant results. By leveraging deep learning, our proposed system bridges this gap, allowing high-level semantic features to guide indexing and retrieval. Tested on the CIFAR-10 dataset, our approach demonstrates a significant improvement in precision, recall, and overall retrieva

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Huang, Yan, Yang Long, and Liang Wang. "Few-Shot Image and Sentence Matching via Gated Visual-Semantic Embedding." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8489–96. http://dx.doi.org/10.1609/aaai.v33i01.33018489.

Texto completo

Resumen

Although image and sentence matching has been widely studied, its intrinsic few-shot problem is commonly ignored, which has become a bottleneck for further performance improvement. In this work, we focus on this challenging problem of few-shot image and sentence matching, and propose a Gated Visual-Semantic Embedding (GVSE) model to deal with it. The model consists of three corporative modules in terms of uncommon VSE, common VSE, and gated metric fusion. The uncommon VSE exploits external auxiliary resources to extract generic features for representing uncommon instances and words in images a

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Luo, Minnan, Xiaojun Chang, and Chen Gong. "Reliable shot identification for complex event detection via visual-semantic embedding." Computer Vision and Image Understanding 213 (December 2021): 103300. http://dx.doi.org/10.1016/j.cviu.2021.103300.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

Zgaren, Ahmed, Wassim Bouachir, and Nizar Bouguila. "SAVE: Self-Attention on Visual Embedding for Zero-Shot Generic Object Counting." Journal of Imaging 11, no. 2 (2025): 52. https://doi.org/10.3390/jimaging11020052.

Texto completo

Resumen

Zero-shot counting is a subcategory of Generic Visual Object Counting, which aims to count objects from an arbitrary class in a given image. While few-shot counting relies on delivering exemplars to the model to count similar class objects, zero-shot counting automates the operation for faster processing. This paper proposes a fully automated zero-shot method outperforming both zero-shot and few-shot methods. By exploiting feature maps from a pre-trained detection-based backbone, we introduce a new Visual Embedding Module designed to generate semantic embeddings within object contextual inform

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Yu, Beibei, Cheng Xie, Peng Tang, and Bin Li. "Semantic-visual shared knowledge graph for zero-shot learning." PeerJ Computer Science 9 (March 22, 2023): e1260. http://dx.doi.org/10.7717/peerj-cs.1260.

Texto completo

Resumen

Almost all existing zero-shot learning methods work only on benchmark datasets (e.g., CUB, SUN, AwA, FLO and aPY) which have already provided pre-defined attributes for all the classes. These methods thus are hard to apply on real-world datasets (like ImageNet) since there are no such pre-defined attributes in the data environment. The latest works have explored to use semantic-rich knowledge graphs (such as WordNet) to substitute pre-defined attributes. However, these methods encounter a serious “role=“presentation”>domain shift” problem because such a knowledge graph cannot provide detail

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

Suo, Xinhua, Bing Guo, Yan Shen, Wei Wang, Yaosen Chen, and Zhen Zhang. "Embodying the Number of an Entity’s Relations for Knowledge Representation Learning." International Journal of Software Engineering and Knowledge Engineering 31, no. 10 (2021): 1495–515. http://dx.doi.org/10.1142/s0218194021500509.

Texto completo

Resumen

Knowledge representation learning (knowledge graph embedding) plays a critical role in the application of knowledge graph construction. The multi-source information knowledge representation learning, which is one class of the most promising knowledge representation learning at present, mainly focuses on learning a large number of useful additional information of entities and relations in the knowledge graph into their embeddings, such as the text description information, entity type information, visual information, graph structure information, etc. However, there is a kind of simple but very c

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

Li, Qiaozhe, Xin Zhao, Ran He, and Kaiqi Huang. "Visual-Semantic Graph Reasoning for Pedestrian Attribute Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8634–41. http://dx.doi.org/10.1609/aaai.v33i01.33018634.

Texto completo

Resumen

Pedestrian attribute recognition in surveillance is a challenging task due to poor image quality, significant appearance variations and diverse spatial distribution of different attributes. This paper treats pedestrian attribute recognition as a sequential attribute prediction problem and proposes a novel visual-semantic graph reasoning framework to address this problem. Our framework contains a spatial graph and a directed semantic graph. By performing reasoning using the Graph Convolutional Network (GCN), one graph captures spatial relations between regions and the other learns potential sem

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

Bai, Jing, Mengjie Wang, and Dexin Kong. "Deep Common Semantic Space Embedding for Sketch-Based 3D Model Retrieval." Entropy 21, no. 4 (2019): 369. http://dx.doi.org/10.3390/e21040369.

Texto completo

Resumen

Sketch-based 3D model retrieval has become an important research topic in many applications, such as computer graphics and computer-aided design. Although sketches and 3D models have huge interdomain visual perception discrepancies, and sketches of the same object have remarkable intradomain visual perception diversity, the 3D models and sketches of the same class share common semantic content. Motivated by these findings, we propose a novel approach for sketch-based 3D model retrieval by constructing a deep common semantic space embedding using triplet network. First, a common data space is c

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

Ye, Jingwen, Ruonan Yu, Songhua Liu, and Xinchao Wang. "Mutual-Modality Adversarial Attack with Semantic Perturbation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (2024): 6657–65. http://dx.doi.org/10.1609/aaai.v38i7.28488.

Texto completo

Resumen

Adversarial attacks constitute a notable threat to machine learning systems, given their potential to induce erroneous predictions and classifications. However, within real-world contexts, the essential specifics of the deployed model are frequently treated as a black box, consequently mitigating the vulnerability to such attacks. Thus, enhancing the transferability of the adversarial samples has become a crucial area of research, which heavily relies on selecting appropriate surrogate models. To address this challenge, we propose a novel approach that generates adversarial attacks in a mutual

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

Xiao, Linlin, Huahu Xu, Junsheng Xiao, and Yuzhe Huang. "Few-Shot Object Detection with Memory Contrastive Proposal Based on Semantic Priors." Electronics 12, no. 18 (2023): 3835. http://dx.doi.org/10.3390/electronics12183835.

Texto completo

Resumen

Few-shot object detection (FSOD) aims to detect objects belonging to novel classes with few training samples. With the small number of novel class samples, the visual information extracted is insufficient to accurately represent the object itself, presenting significant intra-class variance and confusion between classes of similar samples, resulting in large errors in the detection results of the novel class samples. We propose a few-shot object detection framework to achieve effective classification and detection by embedding semantic information and contrastive learning. Firstly, we introduc

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

Ma, Jinlin, Yuetong Wan, and Ziping Ma. "Memory-Based Learning and Fusion Attention for Few-Shot Food Image Generation Method." Applied Sciences 14, no. 18 (2024): 8347. http://dx.doi.org/10.3390/app14188347.

Texto completo

Resumen

Generating food images aims to convert textual food ingredients into corresponding images for the visualization of color and shape adjustments, dietary guidance, and the creation of new dishes. It has a wide range of applications, including food recommendation, recipe development, and health management. However, existing food image generation models, predominantly based on GANs (Generative Adversarial Networks), face challenges in maintaining semantic consistency between image and text, as well as achieving visual realism in the generated images. These limitations are attributed to the constra

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

Cai, Jiyan, Libing Wu, Dan Wu, Jianxin Li, and Xianfeng Wu. "Multi-Dimensional Information Alignment in Different Modalities for Generalized Zero-Shot and Few-Shot Learning." Information 14, no. 3 (2023): 148. http://dx.doi.org/10.3390/info14030148.

Texto completo

Resumen

Generalized zero-shot learning (GZSL) aims to solve the category recognition tasks for unseen categories under the setting that training samples only contain seen classes while unseen classes are not available. This research is vital as there are always existing new categories and large amounts of unlabeled data in realistic scenarios. Previous work for GZSL usually maps the visual information of the visible classes and the semantic description of the invisible classes into the identical embedding space to bridge the gap between the disjointed visible and invisible classes, while ignoring the

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

Zhou, Hao, Tingjin Luo, and Zhangqi Jiang. "Core-to-Global Reasoning for Compositional Visual Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 10 (2025): 10770–78. https://doi.org/10.1609/aaai.v39i10.33170.

Texto completo

Resumen

Compositional visual question answering (Compositional VQA) needs to provide an answer to a compositional question, which requires the model to have advanced capabilities of multi-modal semantic understanding and logical reasoning. However, current VQA models mainly concentrate on enriching the visual representations of images and neglect the redundancy in the enriched information to bring some negative impacts. To enhance the value and availability of semantic features, we propose a novel core-to-global reasoning (CTGR) model for compositional VQA. The model first extracts both global feature

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

Wei, Longhui, Lingxi Xie, Jianzhong He, Xiaopeng Zhang, and Qi Tian. "Can Semantic Labels Assist Self-Supervised Visual Representation Learning?" Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (2022): 2642–50. http://dx.doi.org/10.1609/aaai.v36i3.20166.

Texto completo

Resumen

Recently, contrastive learning has largely advanced the progress of unsupervised visual representation learning. Pre-trained on ImageNet, some self-supervised algorithms reported higher transfer learning performance compared to fully-supervised methods, seeming to deliver the message that human labels hardly contribute to learning transferrable visual features. In this paper, we defend the usefulness of semantic labels but point out that fully-supervised and self-supervised methods are pursuing different kinds of features. To alleviate this issue, we present a new algorithm named Supervised Co

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Chen, J., X. Du, J. Zhang, Y. Wan, and W. Zhao. "SEMANTIC KNOWLEDGE EMBEDDING DEEP LEARNING NETWORK FOR LAND COVER CLASSIFICATION." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-1/W2-2023 (December 13, 2023): 85–90. http://dx.doi.org/10.5194/isprs-archives-xlviii-1-w2-2023-85-2023.

Texto completo

Resumen

Abstract. Land cover classification is essential basic information and key parameters for environmental change research, geographical and national monitoring, and sustainable development planning. Deep learning can automatically and multi-level extract the features of complex features, which has been proven to be an effective method for information extraction. However, one of the major challenges of deep learning is its poor interpret-ability, which makes it difficult to understand and explain the reasoning behind its classification results. This paper proposes a deep cross-modal coupling mode

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Gong, Yan, Georgina Cosma, and Hui Fang. "On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval." Journal of Imaging 7, no. 8 (2021): 125. http://dx.doi.org/10.3390/jimaging7080125.

Texto completo

Resumen

Visual-semantic embedding (VSE) networks create joint image–text representations to map images and texts in a shared embedding space to enable various information retrieval-related tasks, such as image–text retrieval, image captioning, and visual question answering. The most recent state-of-the-art VSE-based networks are: VSE++, SCAN, VSRN, and UNITER. This study evaluates the performance of those VSE networks for the task of image-to-text retrieval and identifies and analyses their strengths and limitations to guide future research on the topic. The experimental results on Flickr30K revealed

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

Yang, Guang, Manling Li, Jiajie Zhang, Xudong Lin, Heng Ji, and Shih-Fu Chang. "Video Event Extraction via Tracking Visual States of Arguments." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 3 (2023): 3136–44. http://dx.doi.org/10.1609/aaai.v37i3.25418.

Texto completo

Resumen

Video event extraction aims to detect salient events from a video and identify the arguments for each event as well as their semantic roles. Existing methods focus on capturing the overall visual scene of each frame, ignoring fine-grained argument-level information. Inspired by the definition of events as changes of states, we propose a novel framework to detect video events by tracking the changes in the visual states of all involved arguments, which are expected to provide the most informative evidence for the extraction of video events. In order to capture the visual state changes of argume

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Li, Wei, Haiyu Song, Hongda Zhang, Houjie Li, and Pengjie Wang. "The Image Annotation Refinement in Embedding Feature Space based on Mutual Information." International Journal of Circuits, Systems and Signal Processing 16 (January 10, 2022): 191–201. http://dx.doi.org/10.46300/9106.2022.16.23.

Texto completo

Resumen

The ever-increasing size of images has made automatic image annotation one of the most important tasks in the fields of machine learning and computer vision. Despite continuous efforts in inventing new annotation algorithms and new models, results of the state-of-the-art image annotation methods are often unsatisfactory. In this paper, to further improve annotation refinement performance, a novel approach based on weighted mutual information to automatically refine the original annotations of images is proposed. Unlike the traditional refinement model using only visual feature, the proposed mo

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Eyharabide, Victoria, Imad Eddine Ibrahim Bekkouch, and Nicolae Dragoș Constantin. "Knowledge Graph Embedding-Based Domain Adaptation for Musical Instrument Recognition." Computers 10, no. 8 (2021): 94. http://dx.doi.org/10.3390/computers10080094.

Texto completo

Resumen

Convolutional neural networks raised the bar for machine learning and artificial intelligence applications, mainly due to the abundance of data and computations. However, there is not always enough data for training, especially when it comes to historical collections of cultural heritage where the original artworks have been destroyed or damaged over time. Transfer Learning and domain adaptation techniques are possible solutions to tackle the issue of data scarcity. This article presents a new method for domain adaptation based on Knowledge graph embeddings. Knowledge Graph embedding forms a p

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Jadhav, Mrunal, and Matthew Guzdial. "Tile Embedding: A General Representation for Level Generation." Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 17, no. 1 (2021): 34–41. http://dx.doi.org/10.1609/aiide.v17i1.18888.

Texto completo

Resumen

In recent years, Procedural Level Generation via Machine Learning (PLGML) techniques have been applied to generate game levels with machine learning. These approaches rely on human-annotated representations of game levels. Creating annotated datasets for games requires domain knowledge and is time-consuming. Hence, though a large number of video games exist, annotated datasets are curated only for a small handful. Thus current PLGML techniques have been explored in limited domains, with Super Mario Bros. as the most common example. To address this problem, we present tile embeddings, a unified

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!