Dissertations / Theses on the topic 'Image Captioning'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 16 dissertations / theses for your research on the topic 'Image Captioning.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Hoxha, Genc. "IMAGE CAPTIONING FOR REMOTE SENSING IMAGE ANALYSIS." Doctoral thesis, Università degli studi di Trento, 2022. http://hdl.handle.net/11572/351752.
Full textHossain, Md Zakir. "Deep learning techniques for image captioning." Thesis, Hossain, Md. Zakir (2020) Deep learning techniques for image captioning. PhD thesis, Murdoch University, 2020. https://researchrepository.murdoch.edu.au/id/eprint/60782/.
Full textTu, Guoyun. "Image Captioning On General Data And Fashion Data : An Attribute-Image-Combined Attention-Based Network for Image Captioning on Mutli-Object Images and Single-Object Images." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-282925.
Full textBildtextning är ett avgörande fält för datorsyn och behandling av naturligt språk. Det kan tillämpas i stor utsträckning på högvolyms webbbilder, som att överföra bildinnehåll till synskadade användare. Många metoder antas inom detta område såsom uppmärksamhetsbaserade metoder, semantiska konceptbaserade modeller. Dessa uppnår utmärkt prestanda på allmänna bilddatamängder som MS COCO-dataset. Det lämnas dock fortfarande outforskat på bilder med ett objekt.I denna uppsats föreslår vi ett nytt attribut-information-kombinerat uppmärksamhetsbaserat nätverk (AIC-AB Net). I varje tidsteg läggs attributinformation till som ett komplement till visuell information. För sekventiell ordgenerering bestämmer rumslig uppmärksamhet specifika regioner av bilder som ska passera avkodaren. Sentinelgrinden bestämmer om den ska ta hand om bilden eller den visuella vaktposten (vad avkodaren redan vet, inklusive attributinformation). Text attributinformation matas synkront för att hjälpa bildigenkänning och minska osäkerheten.Vi bygger en ny modedataset bestående av modebilder för att skapa ett riktmärke för bilder med en objekt. Denna modedataset består av 144 422 bilder från 24 649 modeprodukter, med en beskrivningsmening för varje bild. Vår metod testas på MS COCO dataset och den föreslagna Fashion dataset. Resultaten visar den överlägsna prestandan hos den föreslagna modellen på både bilder med flera objekt och enbildsbilder. Vårt AIC-AB-nät överträffar det senaste nätverket Adaptive Attention Network med 0,017, 0,095 och 0,095 (CIDEr Score) i COCO-datasetet, modedataset (bästsäljare) respektive modedatasetet (alla leverantörer). Resultaten avslöjar också komplementet till uppmärksamhetsarkitektur och attributinformation.
Karayil, Tushar [Verfasser], and Andreas [Akademischer Betreuer] Dengel. "Affective Image Captioning: Extraction and Semantic Arrangement of Image Information with Deep Neural Networks / Tushar Karayil ; Betreuer: Andreas Dengel." Kaiserslautern : Technische Universität Kaiserslautern, 2020. http://d-nb.info/1214640958/34.
Full textGennari, Riccardo. "End-to-end Deep Metric Learning con Vision-Language Model per il Fashion Image Captioning." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25772/.
Full textKan, Jichao. "Visual-Text Translation with Deep Graph Neural Networks." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/23759.
Full textMa, Yufeng. "Going Deeper with Images and Natural Language." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/99993.
Full textDoctor of Philosophy
Kvita, Jakub. "Popis fotografií pomocí rekurentních neuronových sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255324.
Full text(5930603), Hemanth Devarapalli. "Forced Attention for Image Captioning." Thesis, 2019.
Find full textAutomatic generation of captions for a given image is an active research area in Artificial Intelligence. The architectures have evolved from using metadata of the images on which classical machine learning was employed to neural networks. Two different styles of architectures evolved in the neural network space for image captioning: Encoder-Attention-Decoder architecture, and the transformer architecture. This study is an attempt to modify the attention to allow any object to be specified. An archetypical Encoder-Attention-Decoder architecture (Show, Attend, and Tell (Xu et al., 2015)) is employed as a baseline for this study, and a modification of the Show, Attend, and Tell architecture is proposed. Both the architectures are evaluated on the MSCOCO (Lin et al., 2014) dataset, and seven metrics: BLEU – 1, 2, 3, 4 (Papineni, Roukos, Ward & Zhu, 2002), METEOR (Banerjee & Lavie, 2005), ROGUE L (Lin, 2004), and CIDer (Vedantam, Lawrence & Parikh, 2015) are calculated. Finally, the statistical significance of the results is evaluated by performing paired t tests.
Mathews, Alexander Patrick. "Automatic Image Captioning with Style." Phd thesis, 2018. http://hdl.handle.net/1885/151929.
Full textLIN, JIA-HSING, and 林家興. "Food Image Captioning with Verb-Noun Pairs Empowered by Joint Correlation." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/21674221727413201079.
Full text國立中正大學
資訊工程研究所
103
Studies of image captioning explosively emerge in recent two years. Though many elegant approaches have been proposed for general purposed image captioning, considering domain knowledge or specific description structure in a targeted domain still remains undiscovered. In this thesis, we concentrate on food image captioning where a food image is better described by not only what food it is but also how it was cooked. We propose neural networks to jointly consider multiple factors, i.e., food recognition, ingredient recognition, and cooking method recognition, and verify that recognition performance can be improved by taking multiple factors into account. With these three factors, food image captions composed of verb-noun pairs (usually cooking method followed by ingredients) can be generated. We demonstrate effectiveness of the proposed methods from various viewpoints, and believe this would be a better way to describe food images in contrast to general-purposed image captioning.
Yao, Li. "Learning visual representations with neural networks for video captioning and image generation." Thèse, 2017. http://hdl.handle.net/1866/20502.
Full textHsieh, He-Yen, and 謝禾彥. "Implementing a Real Time Image Captioning System for Scene Identification Using Embedded System." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/6775qr.
Full text國立臺灣科技大學
電子工程系
106
Recently, people have gradually paid their attention to home care, and are considering how to use technology to assist them. With the rapid development of wireless communication technology and the Internet of Things, and the fact that modern people have mobile devices around them, it is more and more common to use a webcam to view the home at a remote location. However, transmitting the captured images to the user's device may result in the need to spend more time understanding the meaning of the image. In addition, too many images consume storage space on the device. Therefore, we use a model to extract the content of the image into a sentence that humans can read. In this paper, we implement a real-time image captioning system for scene identification using an embedded system. Our system captures images through webcam, and uses the image captioning model ported in the embedded system to convert the captured images into human-readable sentences. Users can understand the meaning of the image quickly with the assistance of our system. There are two steps in the image captioning model which converts captured images into human-readable sentences. First, the images features are extracted through deep convolutional neural networks. And then, the long short-term memory network produces corresponding words by using the images features. Due to the portability of embedded systems, we are able to place our image captioning system for scene identification anywhere in the home. To validate our proposed system, we compare the execution time on several different devices. In addition, we show the generated sentences converted from captured images.
Anderson, Peter James. "Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents." Phd thesis, 2018. http://hdl.handle.net/1885/164018.
Full textDel, Chiaro Riccardo. "Anthropomorphous Visual Recognition: Learning with Weak Supervision, with Scarce Data, and Incrementally over Transient Tasks." Doctoral thesis, 2021. http://hdl.handle.net/2158/1238101.
Full textXu, Kelvin. "Exploring Attention Based Model for Captioning Images." Thèse, 2017. http://hdl.handle.net/1866/20194.
Full text