Academic literature on the topic 'Image caption evaluation metric'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Image caption evaluation metric.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Image caption evaluation metric"

1

Mohamad Nezami, Omid, Mark Dras, Stephen Wan, and Cecile Paris. "Image Captioning using Facial Expression and Attention." Journal of Artificial Intelligence Research 68 (August 6, 2020): 661–89. http://dx.doi.org/10.1613/jair.1.12025.

Full text
Abstract:
Benefiting from advances in machine vision and natural language processing techniques, current image captioning systems are able to generate detailed visual descriptions. For the most part, these descriptions represent an objective characterisation of the image, although some models do incorporate subjective aspects related to the observer’s view of the image, such as sentiment; current models, however, usually do not consider the emotional content of images during the caption generation process. This paper addresses this issue by proposing novel image captioning models which use facial expression features to generate image captions. The models generate image captions using long short-term memory networks applying facial features in addition to other visual features at different time steps. We compare a comprehensive collection of image captioning models with and without facial features using all standard evaluation metrics. The evaluation metrics indicate that applying facial features with an attention mechanism achieves the best performance, showing more expressive and more correlated image captions, on an image caption dataset extracted from the standard Flickr 30K dataset, consisting of around 11K images containing faces. An analysis of the generated captions finds that, perhaps unexpectedly, the improvement in caption quality appears to come not from the addition of adjectives linked to emotional aspects of the images, but from more variety in the actions described in the captions.
APA, Harvard, Vancouver, ISO, and other styles
2

S, Kavi Priya, Pon Karthika K, Jayakumar Kaliappan, Senthil Kumaran Selvaraj, Nagalakshmi R, and Baye Molla. "Caption Generation Based on Emotions Using CSPDenseNet and BiLSTM with Self-Attention." Applied Computational Intelligence and Soft Computing 2022 (September 17, 2022): 1–13. http://dx.doi.org/10.1155/2022/2756396.

Full text
Abstract:
Automatic image caption generation is an intricate task of describing an image in natural language by gaining insights present in an image. Featuring facial expressions in the conventional image captioning system brings out new prospects to generate pertinent descriptions, revealing the emotional aspects of the image. The proposed work encapsulates the facial emotional features to produce more expressive captions similar to human-annotated ones with the help of Cross Stage Partial Dense Network (CSPDenseNet) and Self-attentive Bidirectional Long Short-Term Memory (BiLSTM) network. The encoding unit captures the facial expressions and dense image features using a Facial Expression Recognition (FER) model and CSPDense neural network, respectively. Further, the word embedding vectors of the ground truth image captions are created and learned using the Word2Vec embedding technique. Then, the extracted image feature vectors and word vectors are fused to form an encoding vector representing the rich image content. The decoding unit employs a self-attention mechanism encompassed with BiLSTM to create more descriptive and relevant captions in natural language. The Flickr11k dataset, a subset of the Flickr30k dataset is used to train, test, and evaluate the present model based on five benchmark image captioning metrics. They are BiLingual Evaluation Understudy (BLEU), Metric for Evaluation of Translation with Explicit Ordering (METEOR), Recall-Oriented Understudy for Gisting Evaluation (ROGUE), Consensus-based Image Description Evaluation (CIDEr), and Semantic Propositional Image Caption Evaluation (SPICE). The experimental analysis indicates that the proposed model enhances the quality of captions with 0.6012(BLEU-1), 0.3992(BLEU-2), 0.2703(BLEU-3), 0.1921(BLEU-4), 0.1932(METEOR), 0.2617(CIDEr), 0.4793(ROUGE-L), and 0.1260(SPICE) scores, respectively, using additive emotional characteristics and behavioral components of the objects present in the image.
APA, Harvard, Vancouver, ISO, and other styles
3

Adriyendi, Adriyendi. "A Rapid Review of Image Captioning." Journal of Information Technology and Computer Science 6, no. 2 (September 3, 2021): 158–69. http://dx.doi.org/10.25126/jitecs.202162316.

Full text
Abstract:
Image captioning is an automatic process for generating text based on the content observed in an image. We do review, create framework, and build application model. We review image captioning into 4 categories based on input model, process model, output model, and lingual image caption. Input model is based on criteria caption, method, and dataset. Process model is based on type of learning, encoder-decoder, image extractor, and metric evaluation. Output model based on architecture, features extraction, feature aping, model, and number of caption. Lingual image caption based on language model with 2 groups: bilingual image caption and cross-language image caption. We also design framework with 3 framework model. Furthermore, we also build application with 3 application models. We also provide research opinions on trends and future research that can be developed with image caption generation. Image captioning can be further developed on computer vision versus human vision.
APA, Harvard, Vancouver, ISO, and other styles
4

La, Tuan-Vinh, Minh-Son Dao, Duy-Dong Le, Kim-Phung Thai, Quoc-Hung Nguyen, and Thuy-Kieu Phan-Thi. "Leverage Boosting and Transformer on Text-Image Matching for Cheap Fakes Detection." Algorithms 15, no. 11 (November 10, 2022): 423. http://dx.doi.org/10.3390/a15110423.

Full text
Abstract:
The explosive growth of the social media community has increased many kinds of misinformation and is attracting tremendous attention from the research community. One of the most prevalent ways of misleading news is cheapfakes. Cheapfakes utilize non-AI techniques such as unaltered images with false context news to create false news, which makes it easy and “cheap” to create and leads to an abundant amount in the social media community. Moreover, the development of deep learning also opens and invents many domains relevant to news such as fake news detection, rumour detection, fact-checking, and verification of claimed images. Nevertheless, despite the impact on and harmfulness of cheapfakes for the social community and the real world, there is little research on detecting cheapfakes in the computer science domain. It is challenging to detect misused/false/out-of-context pairs of images and captions, even with human effort, because of the complex correlation between the attached image and the veracity of the caption content. Existing research focuses mostly on training and evaluating on given dataset, which makes the proposal limited in terms of categories, semantics and situations based on the characteristics of the dataset. In this paper, to address these issues, we aimed to leverage textual semantics understanding from the large corpus and integrated with different combinations of text-image matching and image captioning methods via ANN/Transformer boosting schema to classify a triple of (image, caption1, caption2) into OOC (out-of-context) and NOOC (no out-of-context) labels. We customized these combinations according to various exceptional cases that we observed during data analysis. We evaluate our approach using the dataset and evaluation metrics provided by the COSMOS baseline. Compared to other methods, including the baseline, our method achieves the highest Accuracy, Recall, and F1 scores.
APA, Harvard, Vancouver, ISO, and other styles
5

Reddy, Kota Akshith, Satish C J, Jahnavi Polsani, Teja Naveen Chintapalli, and Gangapatnam Sai Ananya. "Analysis of the Fuzziness of Image Caption Generation Models due to Data Augmentation Techniques." International Journal of Recent Technology and Engineering (IJRTE) 10, no. 3 (September 30, 2021): 131–39. http://dx.doi.org/10.35940/ijrte.c6439.0910321.

Full text
Abstract:
Automatic Image Caption Generation is one of the core problems in the field of Deep Learning. Data Augmentation is a technique which helps in increasing the amount of data at hand and this is done by augmenting the training data using various techniques like flipping, rotating, Zooming, Brightening, etc. In this work, we create an Image Captioning model and check its robustness on all the major types of Image Augmentation techniques. The results show the fuzziness of the model while working with the same image but a different augmentation technique and because of this, a different caption is produced every time a different data augmentation technique is employed. We also show the change in the performance of the model after applying these augmentation techniques. Flickr8k dataset is used for this study along with BLEU score as the evaluation metric for the image captioning model.
APA, Harvard, Vancouver, ISO, and other styles
6

Guan, Zhibin, Kang Liu, Yan Ma, Xu Qian, and Tongkai Ji. "Middle-Level Attribute-Based Language Retouching for Image Caption Generation." Applied Sciences 8, no. 10 (October 9, 2018): 1850. http://dx.doi.org/10.3390/app8101850.

Full text
Abstract:
Image caption generation is attractive research which focuses on generating natural language sentences to describe the visual content of a given image. It is an interdisciplinary subject combining computer vision (CV) and natural language processing (NLP). The existing image captioning methods are mainly focused on generating the final image caption directly, which may lose significant identification information of objects contained in the raw image. Therefore, we propose a new middle-level attribute-based language retouching (MLALR) method to solve this problem. Our proposed MLALR method uses the middle-level attributes predicted from the object regions to retouch the intermediate image description, which is generated by our language generation model. The advantage of our MLALR method is that it can correct descriptive errors in the intermediate image description and make the final image caption more accurate. Moreover, evaluation using benchmark datasets—MSCOCO, Flickr8K, and Flickr30K—validated the impressive performance of our MLALR method with evaluation metrics—BLEU, METEOR, ROUGE-L, CIDEr, and SPICE.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhou, Haonan, Xiaoping Du, Lurui Xia, and Sen Li. "Self-Learning for Few-Shot Remote Sensing Image Captioning." Remote Sensing 14, no. 18 (September 15, 2022): 4606. http://dx.doi.org/10.3390/rs14184606.

Full text
Abstract:
Large-scale caption-labeled remote sensing image samples are expensive to acquire, and the training samples available in practical application scenarios are generally limited. Therefore, remote sensing image caption generation tasks will inevitably fall into the dilemma of few-shot, resulting in poor qualities of the generated text descriptions. In this study, we propose a self-learning method named SFRC for few-shot remote sensing image captioning. Without relying on additional labeled samples and external knowledge, SFRC improves the performance in few-shot scenarios by ameliorating the way and efficiency of the method of learning on limited data. We first train an encoder for semantic feature extraction using a supplemental modified BYOL self-supervised learning method on a small number of unlabeled remote sensing samples, where the unlabeled remote sensing samples are derived from caption-labeled samples. When training the model for caption generation in a small number of caption-labeled remote sensing samples, the self-ensemble yields a parameter-averaging teacher model based on the integration of intermediate morphologies of the model over a certain training time horizon. The self-distillation uses the self-ensemble-obtained teacher model to generate pseudo labels to guide the student model in the next generation to achieve better performance. Additionally, when optimizing the model by parameter back-propagation, we design a baseline incorporating self-critical self-ensemble to reduce the variance during gradient computation and weaken the effect of overfitting. In a range of experiments only using limited caption-labeled samples, the performance evaluation metric scores of SFRC exceed those of recent methods. We conduct percentage sampling few-shot experiments to test the performance of the SFRC method in few-shot remote sensing image captioning with fewer samples. We also conduct ablation experiments on key designs in SFRC. The results of the ablation experiments prove that these self-learning designs we generated for captioning in sparse remote sensing sample scenarios are indeed fruitful, and each design contributes to the performance of the SFRC method.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Yixiao, Xiaosong Wang, Ziyue Xu, Qihang Yu, Alan Yuille, and Daguang Xu. "When Radiology Report Generation Meets Knowledge Graph." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12910–17. http://dx.doi.org/10.1609/aaai.v34i07.6989.

Full text
Abstract:
Automatic radiology report generation has been an attracting research problem towards computer-aided diagnosis to alleviate the workload of doctors in recent years. Deep learning techniques for natural image captioning are successfully adapted to generating radiology reports. However, radiology image reporting is different from the natural image captioning task in two aspects: 1) the accuracy of positive disease keyword mentions is critical in radiology image reporting in comparison to the equivalent importance of every single word in a natural image caption; 2) the evaluation of reporting quality should focus more on matching the disease keywords and their associated attributes instead of counting the occurrence of N-gram. Based on these concerns, we propose to utilize a pre-constructed graph embedding module (modeled with a graph convolutional neural network) on multiple disease findings to assist the generation of reports in this work. The incorporation of knowledge graph allows for dedicated feature learning for each disease finding and the relationship modeling between them. In addition, we proposed a new evaluation metric for radiology image reporting with the assistance of the same composed graph. Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones.
APA, Harvard, Vancouver, ISO, and other styles
9

Qu, Shiru, Yuling Xi, and Songtao Ding. "Image Caption Description of Traffic Scene Based on Deep Learning." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 36, no. 3 (June 2018): 522–27. http://dx.doi.org/10.1051/jnwpu/20183630522.

Full text
Abstract:
It is a hard issue to describe the complex traffic scene accurately in computer vision. The traffic scene is changeable, which causes image captioning easily interfered by light changes and object occlusion. To solve this problem, we propose an image caption generation model based on attention mechanism. Combining convolutional neural network (CNN) and recurrent neural network (RNN) to generate an end-to-end description for traffic images. To generate a semantic description with distinct degree of discrimination, the attention mechanism is applied to language model. Using Flickr8K、Flickr30K and MS COCO benchmark datasets to validate the effectiveness of our method. The accuracy is promoted maximally by 8.6%, 12.4%, 19.3% and 21.5% in different evaluation metrics. Experiments show that our algorithm has good robustness in four different complex traffic scenarios, such as light change, abnormal weather environment, road marked target and various kinds of transportation tools.
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Chen, Shuai Mu, Wanpeng Xiao, Zexiong Ye, Liesi Wu, and Qi Ju. "Improving Image Captioning with Conditional Generative Adversarial Nets." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8142–50. http://dx.doi.org/10.1609/aaai.v33i01.33018142.

Full text
Abstract:
In this paper, we propose a novel conditional-generativeadversarial-nets-based image captioning framework as an extension of traditional reinforcement-learning (RL)-based encoder-decoder architecture. To deal with the inconsistent evaluation problem among different objective language metrics, we are motivated to design some “discriminator” networks to automatically and progressively determine whether generated caption is human described or machine generated. Two kinds of discriminator architectures (CNN and RNNbased structures) are introduced since each has its own advantages. The proposed algorithm is generic so that it can enhance any existing RL-based image captioning framework and we show that the conventional RL training method is just a special case of our approach. Empirically, we show consistent improvements over all language evaluation metrics for different state-of-the-art image captioning models. In addition, the well-trained discriminators can also be viewed as objective image captioning evaluators.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Image caption evaluation metric"

1

Anderson, Peter James. "Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents." Phd thesis, 2018. http://hdl.handle.net/1885/164018.

Full text
Abstract:
Each time we ask for an object, describe a scene, follow directions or read a document containing images or figures, we are converting information between visual and linguistic representations. Indeed, for many tasks it is essential to reason jointly over visual and linguistic information. People do this with ease, typically without even noticing. Intelligent systems that perform useful tasks in unstructured situations, and interact with people, will also require this ability. In this thesis, we focus on the joint modelling of visual and linguistic information using deep neural networks. We begin by considering the challenging problem of automatically describing the content of an image in natural language, i.e., image captioning. Although there is considerable interest in this task, progress is hindered by the difficulty of evaluating the generated captions. Our first contribution is a new automatic image caption evaluation metric that measures the quality of generated captions by analysing their semantic content. Extensive evaluations across a range of models and datasets indicate that our metric, dubbed SPICE, shows high correlation with human judgements. Armed with a more effective evaluation metric, we address the challenge of image captioning. Visual attention mechanisms have been widely adopted in image captioning and visual question answering (VQA) architectures to facilitate fine-grained visual processing. We extend existing approaches by proposing a bottom-up and top-down attention mechanism that enables attention to be focused at the level of objects and other salient image regions, which is the natural basis for attention to be considered. Applying this approach to image captioning we achieve state of the art results on the COCO test server. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge. Despite these advances, recurrent neural network (RNN) image captioning models typically do not generalise well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real applications. To address this problem, we propose constrained beam search, an approximate search algorithm that enforces constraints over RNN output sequences. Using this approach, we show that existing RNN captioning architectures can take advantage of side information such as object detector outputs and ground-truth image annotations at test time, without retraining. Our results significantly outperform previous approaches that incorporate the same information into the learning algorithm, achieving state of the art results for out-of-domain captioning on COCO. Last, to enable and encourage the application of vision and language methods to problems involving embodied agents, we present the Matterport3D Simulator, a large-scale interactive reinforcement learning environment constructed from densely-sampled panoramic RGB-D images of 90 real buildings. Using this simulator, which can in future support a range of embodied vision and language tasks, we collect the first benchmark dataset for visually-grounded natural language navigation in real buildings. We investigate the difficulty of this task, and particularly the difficulty of operating in unseen environments, using several baselines and a sequence-to-sequence model based on methods successfully applied to other vision and language tasks.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Image caption evaluation metric"

1

Anderson, Peter, Basura Fernando, Mark Johnson, and Stephen Gould. "SPICE: Semantic Propositional Image Caption Evaluation." In Computer Vision – ECCV 2016, 382–98. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-46454-1_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bleeker, Maurits, and Maarten de Rijke. "Do Lessons from Metric Learning Generalize to Image-Caption Retrieval?" In Lecture Notes in Computer Science, 535–51. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-99736-6_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Liu, Haoting, Fenggang Xu, Shuo Yang, Weidong Dong, and Shunliang Pan. "Image Quality Evaluation Metric of Brightness Contrast." In Man-Machine-Environment System Engineering, 271–79. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-2481-9_32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Lopez-Antequera, Manuel, Javier Gonzalez-Jimenez, and Nicolai Petkov. "Evaluation of Whole-Image Descriptors for Metric Localization." In Computer Aided Systems Theory – EUROCAST 2017, 281–88. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-74727-9_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Elmahdy, Mohamed S., Thyrza Jagt, Sahar Yousefi, Hessam Sokooti, Roel Zinkstok, Mischa Hoogeman, and Marius Staring. "Evaluation of Multi-metric Registration for Online Adaptive Proton Therapy of Prostate Cancer." In Biomedical Image Registration, 94–104. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-92258-4_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Sharif, Naeha, Lyndon White, Mohammed Bennamoun, and Syed Afaq Ali Shah. "NNEval: Neural Network Based Evaluation Metric for Image Captioning." In Computer Vision – ECCV 2018, 39–55. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01237-3_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rathi, Vishwas, and Puneet Goyal. "Generic Multispectral Image Demosaicking Algorithm and New Performance Evaluation Metric." In Communications in Computer and Information Science, 45–57. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-11346-8_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Haag, M., W. Theilmann, K. Schäfer, and H. H. Nagel. "Integration of image sequence evaluation and fuzzy metric temporal logic programming." In KI-97: Advances in Artificial Intelligence, 301–12. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997. http://dx.doi.org/10.1007/3540634932_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Simone, Gabriele, Valentina Caracciolo, Marius Pedersen, and Faouzi Alaya Cheikh. "Evaluation of a Difference of Gaussians Based Image Difference Metric in Relation to Perceived Compression Artifacts." In Advances in Visual Computing, 491–500. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-17274-8_48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Huisman, Mike, Jan N. van Rijn, and Aske Plaat. "Metalearning for Deep Neural Networks." In Metalearning, 237–67. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-67024-5_13.

Full text
Abstract:
AbstractDeep neural networks have enabled large breakthroughs in various domains ranging from image and speech recognition to automated medical diagnosis. However, these networks are notorious for requiring large amounts of data to learn from, limiting their applicability in domains where data is scarce. Through metalearning, the networks can learn how to learn, allowing them to learn from fewer data. In this chapter, we provide a detailed overview of metalearning for knowledge transfer in deep neural networks. We categorize the techniques into (i) metric-based, (ii) model-based, and (iii) optimization-based techniques, cover the key techniques per category, discuss open challenges, and provide directions for future research such as performance evaluation on heterogeneous benchmarks.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Image caption evaluation metric"

1

Tian, Junjiao, and Jean Oh. "Image Captioning with Compositional Neural Module Networks." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/496.

Full text
Abstract:
In image captioning where fluency is an important factor in evaluation, n-gram metrics, sequential models are commonly used; however, sequential models generally result in overgeneralized expressions that lack the details that may be present in an input image. Inspired by the idea of the compositional neural module networks in the visual question answering task, we introduce a hierarchical framework for image captioning that explores both compositionality and sequentiality of natural language. Our algorithm learns to compose a detail-rich sentence by selectively attending to different modules corresponding to unique aspects of each object detected in an input image to include specific descriptions such as counts and color. In a set of experiments on the MSCOCO dataset, the proposed model outperforms a state-of-the art model across multiple evaluation metrics, more importantly, presenting visually interpretable results. Furthermore, the breakdown of subcategories f-scores of the SPICE metric and human evaluation on Amazon Mechanical Turk show that our compositional module networks effectively generate accurate and detailed captions.
APA, Harvard, Vancouver, ISO, and other styles
2

Liu, Chang, Fuchun Sun, Changhu Wang, Feng Wang, and Alan Yuille. "MAT: A Multimodal Attentive Translator for Image Captioning." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/563.

Full text
Abstract:
In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i.e., MS COCO, and the proposed model surpasses the state-of-the-art methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e.g., a CIDEr of 1.029 (c5) and 1.064 (c40).
APA, Harvard, Vancouver, ISO, and other styles
3

Jiang, Ming, Qiuyuan Huang, Lei Zhang, Xin Wang, Pengchuan Zhang, Zhe Gan, Jana Diesner, and Jianfeng Gao. "TIGEr: Text-to-Image Grounding for Image Caption Evaluation." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/d19-1220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Sijin, Ziwei Yao, Ruiping Wang, Zhongqin Wu, and Xilin Chen. "FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation." In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021. http://dx.doi.org/10.1109/cvpr46437.2021.01383.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wan, Zhenyu, Fan Wu, Ke Xin, and Liqiong Zhang. "Progress of image caption: modelling, datasets, and evaluation." In 2021 International Conference on Computer Information Science and Artificial Intelligence (CISAI). IEEE, 2021. http://dx.doi.org/10.1109/cisai54367.2021.00211.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hemery, Baptiste, Helene Laurent, and Christophe Rosenberger. "Evaluation metric for image understanding." In 2009 16th IEEE International Conference on Image Processing (ICIP 2009). IEEE, 2009. http://dx.doi.org/10.1109/icip.2009.5413548.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Lee, Hwanhee, Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui, and Kyomin Jung. "ViLBERTScore: Evaluating Image Caption Using Vision-and-Language BERT." In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020. http://dx.doi.org/10.18653/v1/2020.eval4nlp-1.4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Dong, Wenjie, and Yufeng Zheng. "An objective evaluation metric for color image fusion." In SPIE Defense, Security, and Sensing, edited by Harold Szu and Liyi Dai. SPIE, 2012. http://dx.doi.org/10.1117/12.919211.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Takemura, Akihiro, Hironori Kojima, Shinichi Ueda, Naoki Isomura, Kimiya Noto, and Tomohiro Ikeda. "A metric for evaluation of deformable image registration." In SPIE Medical Imaging, edited by Robert J. Webster and Ziv R. Yaniv. SPIE, 2015. http://dx.doi.org/10.1117/12.2081846.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Golech, Sina Berk, Saltuk Bugra Karacan, Elena Battini Sonmez, and Hakan Ayral. "A complete human verified Turkish caption dataset for MS COCO and performance evaluation with well-known image caption models trained against it." In 2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). IEEE, 2022. http://dx.doi.org/10.1109/iceccme55909.2022.9988025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography