Log in

Relevant bibliographies by topics / Explainable Image Captioning (XIC)

Contents

Journal articles
Dissertations / Theses
Book chapters
Conference papers

Academic literature on the topic 'Explainable Image Captioning (XIC)'

Author: Grafiati

Published: 7 July 2024

Last updated: 31 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Explainable Image Captioning (XIC).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Explainable Image Captioning (XIC)"

1

Han, Seung-Ho, Min-Su Kwon, and Ho-Jin Choi. "EXplainable AI (XAI) approach to image captioning." Journal of Engineering 2020, no. 13 (2020): 589–94. http://dx.doi.org/10.1049/joe.2019.1217.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Fei, Zhengcong, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, and Xiaolin Wei. "Uncertainty-Aware Image Captioning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (2023): 614–22. http://dx.doi.org/10.1609/aaai.v37i1.25137.

Full text

Abstract:

It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it. However, current image captioning methods usually consider the generation of all words in a sentence sequentially and equally. In this paper, we propose an uncertainty-aware image captioning framework, which parallelly and iteratively operates insertion of discontinuous candidate words between existing words from easy to difficult until converged. We hypothesize that high-uncertainty words in a sentence need more prior information to make a correc

APA, Harvard, Vancouver, ISO, and other styles

3

Liu, Haixia, and Tim Brailsford. "Reproducing “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”." Journal of Physics: Conference Series 2589, no. 1 (2023): 012012. http://dx.doi.org/10.1088/1742-6596/2589/1/012012.

Full text

Abstract:

Abstract This paper replicates the experiment presented in the work of Xu et al. [1], and examines errors in the generated captions. The analysis of the identified errors aims to provide deeper insight into the underlying causes. This study also encompasses subsequent experiments aiming at investigating the feasibility of rectifying these errors via a post-processing stage. Image recognition and object detection models, as well as a language probability computational model were explored. The findings presented in this paper aim to contribute towards the overarching objective of Explainable Art

APA, Harvard, Vancouver, ISO, and other styles

4

Biswas, Rajarshi, Michael Barz, and Daniel Sonntag. "Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking." KI - Künstliche Intelligenz 34, no. 4 (2020): 571–84. http://dx.doi.org/10.1007/s13218-020-00679-2.

Full text

Abstract:

AbstractImage captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level fea

APA, Harvard, Vancouver, ISO, and other styles

5

Ghosh, Swarnendu, Teresa Gonçalves, and Nibaran Das. "Im2Graph: A Weakly Supervised Approach for Generating Holistic Scene Graphs from Regional Dependencies." Future Internet 15, no. 2 (2023): 70. http://dx.doi.org/10.3390/fi15020070.

Full text

Abstract:

Conceptual representations of images involving descriptions of entities and their relations are often represented using scene graphs. Such scene graphs can express relational concepts by using sets of triplets ⟨subject—predicate—object⟩. Instead of building dedicated models for scene graph generation, our model tends to extract the latent relational information implicitly encoded in image captioning models. We explored dependency parsing to build grammatically sound parse trees from captions. We used detection algorithms for the region propositions to generate dense region-based concept graphs

APA, Harvard, Vancouver, ISO, and other styles

6

Naresh, Naresh, Gunikhan .., and V. Balaji. "AMR-XAI-DWT: Age-Related Macular Regenerated Classification using X-AI with Dual Tree CWT." Fusion: Practice and Applications 15, no. 2 (2024): 17–35. http://dx.doi.org/10.54216/fpa.150202.

Full text

Abstract:

Age-related macular degeneration (AMD) is the leading cause of permanent vision loss, and drusen is an early clinical sign in the progression of AMD. Early detection is key since that's when treatment is most effective. The eyes of someone with AMD need to be checked often. Ophthalmologists may detect illness by looking at a color picture of the fundus taken using a fundus camera. Ophthalmologists need a system to help them diagnose illness since the global elderly population is growing rapidly and there are not enough specialists to go around. Since drusen vary in size, form, degree of conver

APA, Harvard, Vancouver, ISO, and other styles

7

Christine, Dewi, Chen Rung-Ching, Yu Hui, and Jiang Xiaoyi. "XAI for Image Captioning using SHAP." August 11, 2023. https://doi.org/10.6688/JISE.202307_39(4).0001.

Full text

Abstract:

In the fields of computer vision (CV) and natural language processing (NLP), they attempt to create a textual description of a given image is known as image captioning. Captioning is the process of creating an explanation for an image. Recognizing the significant items in an image, their qualities, and their connections are required for image captioning. It must also be able to construct phrases that are valid in both syntax and semantics. Deep-learning-based approaches are deal with the intricacies and problems of image captioning. This article provides a simple and effective Explainable Arti

APA, Harvard, Vancouver, ISO, and other styles

8

Yong, Gunwoo, Meiyin Liu, and SangHyun Lee. "Explainable Image Captioning to Identify Ergonomic Problems and Solutions for Construction Workers." Journal of Computing in Civil Engineering 38, no. 4 (2024). http://dx.doi.org/10.1061/jccee5.cpeng-5744.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Pan, Yingwei, Yehao Li, Ting Yao, and Tao Mei. "Bottom-up and Top-down Object Inference Networks for Image Captioning." ACM Transactions on Multimedia Computing, Communications, and Applications, January 19, 2023. http://dx.doi.org/10.1145/3580366.

Full text

Abstract:

Bottom-up and top-down attention mechanism has led to the revolutionizing of image captioning techniques, which enables object-level attention for multi-step reasoning over all the detected objects. However, when humans describe an image, they often apply their own subjective experience to focus on only a few salient objects that are worthy of mention, rather than all objects in this image. The focused objects are further allocated in linguistic order, yielding the “object sequence of interest” to compose an enriched description. In this work, we present Bottom-up and Top-down Object inference

APA, Harvard, Vancouver, ISO, and other styles

10

Abed, Enas Abbas, and Taoufik Aguili. "Automated Medical Image Captioning Using the BLIP Model: Enhancing Diagnostic Support with AI-Driven Language Generation." Diyala Journal of Engineering Sciences, June 1, 2025, 228–48. https://doi.org/10.24237/djes.2025.18215.

Full text

Abstract:

Medical diagnostics Interpretation of images is a important activity: the number of images is growing continuously, and the number of specialist radiologists is limited globally, which often results in late diagnosis and possible clinical misinformation. The paper under analysis analyzes the BLIP model, which is an automatic medical image clinical captioning model. To refine the BLIP model, a methodology was designed based on more than 81,000 radiology images with Unified Medical Language System (UMLS) identifiers, which were obtained by the ROCO (Radiology Objects in Context) dataset. A repre

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Explainable Image Captioning (XIC)"

1

Elguendouze, Sofiane. "Explainable Artificial Intelligence approaches for Image Captioning." Electronic Thesis or Diss., Orléans, 2024. http://www.theses.fr/2024ORLE1003.

Full text

Abstract:

L'évolution rapide des modèles de sous-titrage d'images, impulsée par l'intégration de techniques d'apprentissage profond combinant les modalités image et texte, a conduit à des systèmes de plus en plus complexes. Cependant, ces modèles fonctionnent souvent comme des boîtes noires, incapables de fournir des explications transparentes de leurs décisions. Cette thèse aborde l'explicabilité des systèmes de sous-titrage d'images basés sur des architectures Encodeur-Attention-Décodeur, et ce à travers quatre aspects. Premièrement, elle explore le concept d'espace latent, s'éloignant ainsi des appro

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Explainable Image Captioning (XIC)"

1

Beddiar, Romaissa, and Mourad Oussalah. "Explainability in medical image captioning." In Explainable Deep Learning AI. Elsevier, 2023. http://dx.doi.org/10.1016/b978-0-32-396098-4.00018-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Khan, M. Jaleed, Filip Ilievski, and Edward Curry. "Neurosymbolic Visual Reasoning with Scene Graphs and Multimodal LLMs." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2025. https://doi.org/10.3233/faia250227.

Full text

Abstract:

This chapter explores the advancements and challenges in achieving comprehensive scene understanding and visual reasoning through neurosymbolic integration and Multimodal Large Language Models (MLLMs). It begins by highlighting the limitations of basic vision tasks in extracting contextual and relational information from scenes, introducing scene graphs as a structured representation to bridge this gap. The chapter delves into Scene Graph Generation (SGG) methods, emphasising the importance of incorporating common sense knowledge from knowledge graphs to enhance the accuracy and expressiveness

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Explainable Image Captioning (XIC)"

1

Swathi, Y., and A. S. Kavitha Bai. "Explainable Deep Learning for Medical Image Captioning in Chest X-ray Analysis." In 2025 International Conference on Knowledge Engineering and Communication Systems (ICKECS). IEEE, 2025. https://doi.org/10.1109/ickecs65700.2025.11035291.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Lee, Yebin, Imseong Park, and Myungjoo Kang. "FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model." In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.acl-long.205.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Kamal, Md Sarwar, Sonia Farhana Nimmy, Md Rafiqul Islam, and Usman Naseem. "Explainable Medical Image Captioning." In WWW '25: The ACM Web Conference 2025. ACM, 2025. https://doi.org/10.1145/3701716.3717549.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Tseng, Ching-Shan, Ying-Jia Lin, and Hung-Yu Kao. "Relation-Aware Image Captioning for Explainable Visual Question Answering." In 2022 International Conference on Technologies and Applications of Artificial Intelligence (TAAI). IEEE, 2022. http://dx.doi.org/10.1109/taai57707.2022.00035.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Elguendouze, Sofiane, Marcilio C. P. de Souto, Adel Hafiane, and Anais Halftermeyer. "Towards Explainable Deep Learning for Image Captioning through Representation Space Perturbation." In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. http://dx.doi.org/10.1109/ijcnn55064.2022.9892275.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!