Academic literature on the topic 'Visual question answering (VQA)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Visual question answering (VQA).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Visual question answering (VQA)"
Agrawal, Aishwarya, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Devi Parikh, and Dhruv Batra. "VQA: Visual Question Answering." International Journal of Computer Vision 123, no. 1 (November 8, 2016): 4–31. http://dx.doi.org/10.1007/s11263-016-0966-6.
Full textLei, Chenyi, Lei Wu, Dong Liu, Zhao Li, Guoxin Wang, Haihong Tang, and Houqiang Li. "Multi-Question Learning for Visual Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11328–35. http://dx.doi.org/10.1609/aaai.v34i07.6794.
Full textShah, Sanket, Anand Mishra, Naganand Yadati, and Partha Pratim Talukdar. "KVQA: Knowledge-Aware Visual Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8876–84. http://dx.doi.org/10.1609/aaai.v33i01.33018876.
Full textGuo, Zihan, Dezhi Han, and Kuan-Ching Li. "Double-layer affective visual question answering network." Computer Science and Information Systems, no. 00 (2020): 38. http://dx.doi.org/10.2298/csis200515038g.
Full textWu, Chenfei, Jinlai Liu, Xiaojie Wang, and Ruifan Li. "Differential Networks for Visual Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8997–9004. http://dx.doi.org/10.1609/aaai.v33i01.33018997.
Full textZhou, Yiyi, Rongrong Ji, Jinsong Su, Xiaoshuai Sun, and Weiqiu Chen. "Dynamic Capsule Attention for Visual Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9324–31. http://dx.doi.org/10.1609/aaai.v33i01.33019324.
Full textEt. al., K. P. Moholkar,. "Visual Question Answering using Convolutional Neural Networks." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 1S (April 11, 2021): 170–75. http://dx.doi.org/10.17762/turcomat.v12i1s.1602.
Full textGuo, Wenya, Ying Zhang, Xiaoping Wu, Jufeng Yang, Xiangrui Cai, and Xiaojie Yuan. "Re-Attention for Visual Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 91–98. http://dx.doi.org/10.1609/aaai.v34i01.5338.
Full textBoukhers, Zeyd, Timo Hartmann, and Jan Jürjens. "COIN: Counterfactual Image Generation for Visual Question Answering Interpretation." Sensors 22, no. 6 (March 14, 2022): 2245. http://dx.doi.org/10.3390/s22062245.
Full textLi, Qun, Fu Xiao, Bir Bhanu, Biyun Sheng, and Richang Hong. "Inner Knowledge-based Img2Doc Scheme for Visual Question Answering." ACM Transactions on Multimedia Computing, Communications, and Applications 18, no. 3 (August 31, 2022): 1–21. http://dx.doi.org/10.1145/3489142.
Full textDissertations / Theses on the topic "Visual question answering (VQA)"
Chowdhury, Muhammad Iqbal Hasan. "Question-answering on image/video content." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/205096/1/Muhammad%20Iqbal%20Hasan_Chowdhury_Thesis.pdf.
Full textStrub, Florian. "Développement de modèles multimodaux interactifs pour l'apprentissage du langage dans des environnements visuels." Thesis, Lille 1, 2020. http://www.theses.fr/2020LIL1I030.
Full textWhile our representation of the world is shaped by our perceptions, our languages, and our interactions, they have traditionally been distinct fields of study in machine learning. Fortunately, this partitioning started opening up with the recent advents of deep learning methods, which standardized raw feature extraction across communities. However, multimodal neural architectures are still at their beginning, and deep reinforcement learning is often limited to constrained environments. Yet, we ideally aim to develop large-scale multimodal and interactive models towards correctly apprehending the complexity of the world. As a first milestone, this thesis focuses on visually grounded language learning for three reasons (i) they are both well-studied modalities across different scientific fields (ii) it builds upon deep learning breakthroughs in natural language processing and computer vision (ii) the interplay between language and vision has been acknowledged in cognitive science. More precisely, we first designed the GuessWhat?! game for assessing visually grounded language understanding of the models: two players collaborate to locate a hidden object in an image by asking a sequence of questions. We then introduce modulation as a novel deep multimodal mechanism, and we show that it successfully fuses visual and linguistic representations by taking advantage of the hierarchical structure of neural networks. Finally, we investigate how reinforcement learning can support visually grounded language learning and cement the underlying multimodal representation. We show that such interactive learning leads to consistent language strategies but gives raise to new research issues
Mahendru, Aroma. "Role of Premises in Visual Question Answering." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/78030.
Full textMaster of Science
Malinowski, Mateusz [Verfasser], and Mario [Akademischer Betreuer] Fritz. "Towards holistic machines : From visual recognition to question answering about real-world images / Mateusz Malinowski ; Betreuer: Mario Fritz." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2017. http://d-nb.info/1136607889/34.
Full textDushi, Denis. "Using Deep Learning to Answer Visual Questions from Blind People." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-247910.
Full textEn naturlig tillämpning av artificiell intelligens är att hjälpa blinda med deras dagliga visuella utmaningar genom AI-baserad hjälpmedelsteknik. I detta avseende, är en av de mest lovande uppgifterna Visual Question Answering (VQA): modellen presenteras med en bild och en fråga om denna bild, och måste sedan förutspå det korrekta svaret. Nyligen introducerades VizWiz-datamängd, en samling bilder och frågor till dessa från blinda personer. Då detta är det första VQA-datamängden som härstammar från en naturlig miljö, har det många begränsningar och särdrag. Mer specifikt är de observerade egenskaperna: hög osäkerhet i svaren, informell samtalston i frågorna, relativt liten datamängd och slutligen obalans mellan svarbara och icke svarbara klasser. Dessa egenskaper kan även observeras, enskilda eller tillsammans, i andra VQA-datamängd, vilket utgör särskilda utmaningar vid lösning av VQA-uppgiften. Särskilt lämplig för att hantera dessa aspekter av data är förbehandlingsteknik från området data science. För att bidra till VQA-uppgiften, svarade vi därför på frågan “Kan förbehandlingstekniker från området data science bidra till lösningen av VQA-uppgiften?” genom att föreslå och studera effekten av fyra olika förbehandlingstekniker. För att hantera den höga osäkerheten i svaren använde vi ett förbehandlingssteg där vi beräknade osäkerheten i varje svar och använde detta mått för att vikta modellens utdata-värden under träning. Användandet av en ”osäkerhetsmedveten” träningsprocedur förstärkte den förutsägbara noggrannheten hos vår modell med 10%. Med detta nådde vi ett toppresultat när modellen utvärderades på testdelen av VizWiz-datamängden. För att övervinna problemet med den begränsade mängden data, konstruerade och testade vi en ny förbehandlingsprocedur som nästan dubblerar datapunkterna genom att beräkna cosinuslikheten mellan svarens vektorer. Vi hanterade även problemet med den informella samtalstonen i frågorna, som samlats in från den verkliga världens verbala konversationer, genom att föreslå en alternativ väg att förbehandla frågorna, där samtalstermer är borttagna. Detta ledde till en ytterligare förbättring: från en förutsägbar noggrannhet på 0.516 med det vanliga sättet att bearbeta frågorna kunde vi uppnå 0.527 prediktiv noggrannhet vid användning av det nya sättet att förbehandla frågorna. Slutligen hanterade vi obalansen mellan svarbara och icke svarbara klasser genom att förutse om en visuell fråga har ett möjligt svar. Vi testade två standard-förbehandlingstekniker för att justeradatamängdens klassdistribution: översampling och undersampling. Översamplingen gav en om än liten förbättring i både genomsnittlig precision och F1-poäng.
Ben-Younes, Hedi. "Multi-modal representation learning towards visual reasoning." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS173.
Full textThe quantity of images that populate the Internet is dramatically increasing. It becomes of critical importance to develop the technology for a precise and automatic understanding of visual contents. As image recognition systems are becoming more and more relevant, researchers in artificial intelligence now seek for the next generation vision systems that can perform high-level scene understanding. In this thesis, we are interested in Visual Question Answering (VQA), which consists in building models that answer any natural language question about any image. Because of its nature and complexity, VQA is often considered as a proxy for visual reasoning. Classically, VQA architectures are designed as trainable systems that are provided with images, questions about them and their answers. To tackle this problem, typical approaches involve modern Deep Learning (DL) techniques. In the first part, we focus on developping multi-modal fusion strategies to model the interactions between image and question representations. More specifically, we explore bilinear fusion models and exploit concepts from tensor analysis to provide tractable and expressive factorizations of parameters. These fusion mechanisms are studied under the widely used visual attention framework: the answer to the question is provided by focusing only on the relevant image regions. In the last part, we move away from the attention mechanism and build a more advanced scene understanding architecture where we consider objects and their spatial and semantic relations. All models are thoroughly experimentally evaluated on standard datasets and the results are competitive with the literature
Lin, Xiao. "Leveraging Multimodal Perspectives to Learn Common Sense for Vision and Language Tasks." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/79521.
Full textPh. D.
Huang, Jia-Hong. "Robustness Analysis of Visual Question Answering Models by Basic Questions." Thesis, 2017. http://hdl.handle.net/10754/626314.
Full textAnderson, Peter James. "Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents." Phd thesis, 2018. http://hdl.handle.net/1885/164018.
Full text"Compressive Visual Question Answering." Master's thesis, 2017. http://hdl.handle.net/2286/R.I.45952.
Full textDissertation/Thesis
Masters Thesis Computer Engineering 2017
Books on the topic "Visual question answering (VQA)"
Wu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. Visual Question Answering. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1.
Full textVisual Question Answering: From Theory to Application. Springer Singapore Pte. Limited, 2022.
Find full textBook chapters on the topic "Visual question answering (VQA)"
Wu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. "Medical VQA." In Visual Question Answering, 165–76. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1_11.
Full textWu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. "Embodied VQA." In Visual Question Answering, 147–64. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1_10.
Full textWu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. "Knowledge-Based VQA." In Visual Question Answering, 73–90. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1_5.
Full textWu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. "Text-Based VQA." In Visual Question Answering, 177–87. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1_12.
Full textWu, Qi, Peng Wang, Xin Wang, Xiaodong He, and Wenwu Zhu. "Vision-and-Language Pretraining for VQA." In Visual Question Answering, 91–107. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0964-1_6.
Full textGoel, Vatsal, Mohit Chandak, Ashish Anand, and Prithwijit Guha. "IQ-VQA: Intelligent Visual Question Answering." In Pattern Recognition. ICPR International Workshops and Challenges, 357–70. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-68790-8_28.
Full textGokhale, Tejas, Pratyay Banerjee, Chitta Baral, and Yezhou Yang. "VQA-LOL: Visual Question Answering Under the Lens of Logic." In Computer Vision – ECCV 2020, 379–96. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58589-1_23.
Full textSeenivasan, Lalithkumar, Mobarakol Islam, Adithya K. Krishna, and Hongliang Ren. "Surgical-VQA: Visual Question Answering in Surgical Scenes Using Transformer." In Lecture Notes in Computer Science, 33–43. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-16449-1_4.
Full textNarayanan, Abhishek, Abijna Rao, Abhishek Prasad, and S. Natarajan. "Towards Open Ended and Free Form Visual Question Answering: Modeling VQA as a Factoid Question Answering Problem." In Emerging Technologies in Data Mining and Information Security, 749–59. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-15-9774-9_69.
Full textSalewski, Leonard, A. Sophia Koepke, Hendrik P. A. Lensch, and Zeynep Akata. "CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations." In xxAI - Beyond Explainable AI, 69–88. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-04083-2_5.
Full textConference papers on the topic "Visual question answering (VQA)"
Antol, Stanislaw, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. "VQA: Visual Question Answering." In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015. http://dx.doi.org/10.1109/iccv.2015.279.
Full textMishra, Aakansha, Ashish Anand, and Prithwijit Guha. "CQ-VQA: Visual Question Answering on Categorized Questions." In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. http://dx.doi.org/10.1109/ijcnn48605.2020.9206913.
Full textLiu, Fei, Jing Liu, Zhiwei Fang, Richang Hong, and Hanqing Lu. "Densely Connected Attention Flow for Visual Question Answering." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/122.
Full textLiu, Yuhang, Wei Wei, Daowan Peng, and Feida Zhu. "Declaration-based Prompt Tuning for Visual Question Answering." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/453.
Full textQiao, Yanyuan, Zheng Yu, and Jing Liu. "VC-VQA: Visual Calibration Mechanism For Visual Question Answering." In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020. http://dx.doi.org/10.1109/icip40778.2020.9190828.
Full textLao, Mingrui, Yanming Guo, Wei Chen, Nan Pu, and Michael S. Lew. "VQA-BC: Robust Visual Question Answering Via Bidirectional Chaining." In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022. http://dx.doi.org/10.1109/icassp43922.2022.9746493.
Full textLin, Yuetan, Zhangyang Pang, Donghui Wang, and Yueting Zhuang. "Feature Enhancement in Attention for Visual Question Answering." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/586.
Full textSong, Jingkuan, Pengpeng Zeng, Lianli Gao, and Heng Tao Shen. "From Pixels to Objects: Cubic Visual Attention for Visual Question Answering." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/126.
Full textGao, Chenyu, Qi Zhu, Peng Wang, and Qi Wu. "Chop Chop BERT: Visual Question Answering by Chopping VisualBERT’s Heads." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/92.
Full textCascante-Bonilla, Paola, Hui Wu, Letao Wang, Rogerio Feris, and Vicente Ordonez. "Sim VQA: Exploring Simulated Environments for Visual Question Answering." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00500.
Full text