To see the other types of publications on this topic, follow the link: Stanford question answering dataset.

Journal articles on the topic 'Stanford question answering dataset'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Stanford question answering dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Gholami, Sia, and Mehdi Noori. "You Don’t Need Labeled Data for Open-Book Question Answering." Applied Sciences 12, no. 1 (2021): 111. http://dx.doi.org/10.3390/app12010111.

Full text
Abstract:
Open-book question answering is a subset of question answering (QA) tasks where the system aims to find answers in a given set of documents (open-book) and common knowledge about a topic. This article proposes a solution for answering natural language questions from a corpus of Amazon Web Services (AWS) technical documents with no domain-specific labeled data (zero-shot). These questions have a yes–no–none answer and a text answer which can be short (a few words) or long (a few sentences). We present a two-step, retriever–extractor architecture in which a retriever finds the right documents and an extractor finds the answers in the retrieved documents. To test our solution, we are introducing a new dataset for open-book QA based on real customer questions on AWS technical documentation. In this paper, we conducted experiments on several information retrieval systems and extractive language models, attempting to find the yes–no–none answers and text answers in the same pass. Our custom-built extractor model is created from a pretrained language model and fine-tuned on the the Stanford Question Answering Dataset—SQuAD and Natural Questions datasets. We were able to achieve 42% F1 and 39% exact match score (EM) end-to-end with no domain-specific training.
APA, Harvard, Vancouver, ISO, and other styles
2

Ait-Mlouk, Addi, Sadi A. Alawadi, Salman Toor, and Andreas Hellander. "FedQAS: Privacy-Aware Machine Reading Comprehension with Federated Learning." Applied Sciences 12, no. 6 (2022): 3130. http://dx.doi.org/10.3390/app12063130.

Full text
Abstract:
Machine reading comprehension (MRC) of text data is a challenging task in Natural Language Processing (NLP), with a lot of ongoing research fueled by the release of the Stanford Question Answering Dataset (SQuAD) and Conversational Question Answering (CoQA). It is considered to be an effort to teach computers how to “understand” a text, and then to be able to answer questions about it using deep learning. However, until now, large-scale training on private text data and knowledge sharing has been missing for this NLP task. Hence, we present FedQAS, a privacy-preserving machine reading system capable of leveraging large-scale private data without the need to pool those datasets in a central location. The proposed approach combines transformer models and federated learning technologies. The system is developed using the FEDn framework and deployed as a proof-of-concept alliance initiative. FedQAS is flexible, language-agnostic, and allows intuitive participation and execution of local model training. In addition, we present the architecture and implementation of the system, as well as provide a reference evaluation based on the SQuAD dataset, to showcase how it overcomes data privacy issues and enables knowledge sharing between alliance members in a Federated learning setting.
APA, Harvard, Vancouver, ISO, and other styles
3

Katib, Sahar Sami Raheem, and Mohammed Hasan Abdulameer. "Question Answering System Based on Bideirectional Long-Short-Term Memory (Bilstm)." Al-Furat Journal of Innovations in Electronics and Computer Engineering 3, no. 2 (2024): 105–20. http://dx.doi.org/10.46649/fjiece.v3.2.9a.18.5.2024.

Full text
Abstract:
In the modern world, Q&A systems are essential for promoting better communication between people and technology. These systems play an important role in collecting information quickly and efficiently, and this leads to great progress in learning, teaching and development in many areas of life. Using deep learning techniques, this research addresses the problem of excellent prediction of the questions that need to be answered. We created a question-and-answer system using "Bidirectional Long Short-Term Memory (BiLSTM)", a modern neural network known for its accuracy and results in text analysis and natural language understanding. This technique is more effective in understanding questions and producing very accurate answers because of its special ability to pay attention to preceding and following information in a sentence. Preprocessing was used to remove unnecessary, unimportant and time-consuming data. The "Stanford Question Answering Dataset version 2 (SQuAD 2.0)"was used, which is considered one of the important datasets used in the field of machine learning and natural language processing. The following evaluation metrics were used to evaluate the model’s performance: “Mean Average Precision(MAP), Mean Reciprocal Rank(MRR), Recall, Precision, Loss, F1 Score, and Exact Match (EM). The results, based on 150 epochs (EPOGs) and 128 batch sizes with a cleaned dataset split into 70% training and 30% test/validation (15% each), are as follows:"Precision (0.966), Loss (0.591), F1-score (0.966), Recall (0.967), EM (0.967), MRR(0.918), MAP (0.776), and accuracy(0.966)". Interestingly, the highest performance was observed when using the accuracy measure.
APA, Harvard, Vancouver, ISO, and other styles
4

Vinay, R., B. U. Thejas, A. Vibhav Sharma H., Ghuli Poonam, and G. Shobha. "A multilingual semantic search chatbot framework." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 2 (2024): 2333–41. https://doi.org/10.11591/ijai.v13.i2.pp2333-2341.

Full text
Abstract:
Chatbots are conversational agents which interact with users and simulate a human interaction. Companies use chatbots on their customer-facing sites to enhance user experience by answering questions about their products and directing users to relevant pages on the site. Existing chatbots which are used for this purpose give responses based on pre-defined frequently asked questions (FAQs) only. This paper proposes a framework for a chatbot which combines two approaches-retrieval from a knowledge base consisting of question answer pairs, combined with a natural language search mechanism which can scan through the paragraphs of text information. A feedback-based knowledge base update is implemented which provides continuous improvement in user experience. The framework achieves a result of 81.73 percent answer matching on stanford question answering dataset (SQuAD) 1.1 and 69.21 percent answer matching on SQuAD 2.0. The framework also performs well on languages such as Spanish (67.32 percent answer match), Russian (61.43 percent answer match), and Arabic (51.63 percent answer match). By means of zero shot learning.
APA, Harvard, Vancouver, ISO, and other styles
5

Ramaraj, Vijayan, Mareeswari Venkatachala Appa Swamy, Ephzibah Evan Prince, and Chandhan Kumar. "Improving the BERT model for long text sequences in question answering domain." International Journal of Advances in Applied Sciences 13, no. 1 (2024): 106. http://dx.doi.org/10.11591/ijaas.v13.i1.pp106-115.

Full text
Abstract:
The text-based question-answering (QA) system aims to answer natural language questions by querying the external knowledge base. It can be applied to real-world systems like medical documents, research papers, and crime-related documents. Using this system, users don't have to go through the documents manually the system will understand the knowledge base and find the answer based on the text and question given to the system. Earlier state-of-the-art natural language processing (NLP) was recurrent neural network (RNN) and long short-term memory (LSTM). As a result, these models are hard to parallelize and poor at retaining contextual relationships across long text inputs. Today, bidirectional encoder representations from transformers (BERT) are the contemporary algorithm for NLP. BERT is not capable of handling long text sequences; it can handle 512 tokens at a time which makes it difficult for long context. Smooth inverse frequency (SIF) and the BERT model will be incorporated together to solve this challenge. BERT trained on the Stanford question answering dataset (SQuAD) and SIF model demonstrates robustness and effectiveness on long text sequences from different domains. Experimental results suggest that the proposed approach is a promising solution for QA on long text sequences.
APA, Harvard, Vancouver, ISO, and other styles
6

Vijayan, Ramaraj, Venkatachala Appa Swamy Mareeswari, Evan Prince Ephzibah, and Kumar Chandan. "Improving the BERT model for long text sequences in question answering domain." International Journal of Advances in Applied Sciences (IJAAS) 13-, no. 1 (2024): 106–15. https://doi.org/10.11591/ijaas.v13.i1.pp106-115.

Full text
Abstract:
The text-based question-answering (QA) system aims to answer natural language questions by querying the external knowledge base. It can be applied to real-world systems like medical documents, research papers, and crime-related documents. Using this system, users don't have to go through the documents manually the system will understand the knowledge base and find the answer based on the text and question given to the system. Earlier state-of-the-art natural language processing (NLP) was recurrent neural network (RNN) and long short-term memory (LSTM). As a result, these models are hard to parallelize and poor at retaining contextual relationships across long text inputs. Today, bidirectional encoder representations from transformers (BERT) are the contemporary algorithm for NLP. BERT is not capable of handling long text sequences; it can handle 512 tokens at a time which makes it difficult for long context. Smooth inverse frequency (SIF) and the BERT model will be incorporated together to solve this challenge. BERT trained on the Stanford question answering dataset (SQuAD) and SIF model demonstrates robustness and effectiveness on long text sequences from different domains. Experimental results suggest that the proposed approach is a promising solution for QA on long text sequences.
APA, Harvard, Vancouver, ISO, and other styles
7

Sathish Dhanasegar, Sathish. "QUESTION ANSWERING SYSTEM FOR HOSPITALITY DOMAIN USING TRANSFORMER-BASED LANGUAGE MODELS." International Research Journal of Computer Science 9, no. 5 (2022): 110–34. http://dx.doi.org/10.26562/irjcs.2022.v0905.003.

Full text
Abstract:
Recent research demonstrates significant success on a wide range of Natural Language Processing (NLP) tasks by utilizing Transformer architectures. Question answering (QA) is an important aspect of the NLP task. The systems enable users to ask a question in natural language and receive an answer accordingly. Most questions in the hospitality industry are content-based, with the expected response being accurate data rather than”yes” or ”no.” Therefore, it requires the system to understand the semantics of the questions and return relevant answers. Despite several advancements in transformer-based models for QA, we are interested in evaluating how it performs with unlabeled data using a pre-trained model, which could also define-tune. This project aims to develop a Question-Answering system for the hospitality domain, in which text will have hospitality content, and the user will be able to ask a question about them. We use an Attention mechanism to train a span-based model that predicts the position of the start and end tokens in a paragraph. By using the model, the users can directly type in their questions in the interactive user interface and receive the response. The data set for this study is created using response templates from the existing dialogue system. We use the Stanford Question and Answer (SQuAD 2.0) data structure to form the dataset, which is mostly used for QA models. During phase1, we evaluate the pre-trained QA models BERT, ROBERTa, and DistilBERT to predict answers and measure the results using Exact Match(EM) and ROUGE-LF1-Score. In Phase 2 of the project, we fine-tune the QA models and their hyper-parameters by training the model with hospitality data sets, and the results are compared. The fine-tuned ROBERTa models achieved the maximum of ROUGE-L F1-Score and EM of 71.39 and 52.17, respectively, which is a relatively 4% increase in F1-Score and 8.7% increase in EM score compared to the pre-trained model. The results of this project will be used to improve the efficiency of the dialogue system in the hospitality industry.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhang, Xuanyu, and Zhichun Wang. "Rception: Wide and Deep Interaction Networks for Machine Reading Comprehension (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (2020): 13987–88. http://dx.doi.org/10.1609/aaai.v34i10.7266.

Full text
Abstract:
Most of models for machine reading comprehension (MRC) usually focus on recurrent neural networks (RNNs) and attention mechanism, though convolutional neural networks (CNNs) are also involved for time efficiency. However, little attention has been paid to leverage CNNs and RNNs in MRC. For a deeper understanding, humans sometimes need local information for short phrases, sometimes need global context for long passages. In this paper, we propose a novel architecture, i.e., Rception, to capture and leverage both local deep information and global wide context. It fuses different kinds of networks and hyper-parameters horizontally rather than simply stacking them layer by layer vertically. Experiments on the Stanford Question Answering Dataset (SQuAD) show that our proposed architecture achieves good performance.
APA, Harvard, Vancouver, ISO, and other styles
9

Farea, Amer, and Frank Emmert-Streib. "Experimental Design of Extractive Question-Answering Systems: Influence of Error Scores and Answer Length." Journal of Artificial Intelligence Research 80 (May 23, 2024): 87–125. http://dx.doi.org/10.1613/jair.1.15642.

Full text
Abstract:
Question-answering (QA) systems are becoming more and more important because they enable human-computer communication in a natural language. In recent years, significant progress has been made with transformer-based models that leverage deep learning in combination with large amounts of text data. However, a significant challenge with QA systems lies in their complexity rooted in the ambiguity and flexibility of a natural language. This makes even their evaluation a formidable task. For this reason, in this study, we focus on the evaluation of extractive question-answering (EQA) systems by conducting a large-scale analysis of distilBERT using benchmark data provided by the Stanford Question Answering Dataset (SQuAD). Specifically, the main objectives of this paper are fourfold. First, we study the influence of the answer length on the performance and we demonstrate that there is an inverse correlation between both. Second, we study differences in exact match (EM) measures because there are different definitions commonly used in the literature. As a result, we find that despite the fact that all of those measures are named ”exact match” these measures are actually different from each other. Third, we study the practical relevance of these different definitions because due to the ambivalent meaning of ”exact match” in the literature, it is often unclear if reported improvements are genuine or only due to a change in the exact match measure. Importantly, our results show that differences between differently defined EM measures are in the same order of magnitude as reported differences found in the literature. This raises concerns about the robustness of reported results. Fourth, we provide guidelines to improve the experimental design of general EQA studies, aiming to enhance performance evaluation and minimize the potential for spurious results.
APA, Harvard, Vancouver, ISO, and other styles
10

Kuwana, Ayato, Atsushi Oba, Ranto Sawai, and Incheon Paik. "Automatic Taxonomy Classification by Pretrained Language Model." Electronics 10, no. 21 (2021): 2656. http://dx.doi.org/10.3390/electronics10212656.

Full text
Abstract:
In recent years, automatic ontology generation has received significant attention in information science as a means of systemizing vast amounts of online data. As our initial attempt of ontology generation with a neural network, we proposed a recurrent neural network-based method. However, updating the architecture is possible because of the development in natural language processing (NLP). By contrast, the transfer learning of language models trained by a large, unlabeled corpus has yielded a breakthrough in NLP. Inspired by these achievements, we propose a novel workflow for ontology generation comprising two-stage learning. Our results showed that our best method improved accuracy by over 12.5%. As an application example, we applied our model to the Stanford Question Answering Dataset to show ontology generation in a real field. The results showed that our model can generate a good ontology, with some exceptions in the real field, indicating future research directions to improve the quality.
APA, Harvard, Vancouver, ISO, and other styles
11

Zhou, Chuxi. "Comparative Evaluation of GPT, BERT, and XLNet: Insights into Their Performance and Applicability in NLP Tasks." Transactions on Computer Science and Intelligent Systems Research 7 (November 25, 2024): 415–21. https://doi.org/10.62051/h08exg91.

Full text
Abstract:
Natural Language Processing (NLP) is a pivotal area in artificial intelligence, aiming to make computers capable of understanding and generating human language. This study evaluates and compares three prominent NLP models—the Generative Pre-trained Transformer (GPT) model, Bidirectional Encoder Representations from Transformers (BERT) model, and Generalized Autoregressive Pretraining for Language Understanding (XLNet)—to determine their strengths, limitations, and suitability for various tasks. The research involves a comprehensive analysis of these models, utilizing well-established datasets such as the Stanford Question Answering Dataset (SQuAD), General Language Understanding Evaluation (GLUE), Reading Comprehension from Examinations (RACE), and the Situations with Adversarial Generations (SWAG). The study explores each model's architecture, pre-training, and fine-tuning processes: GPT’s unidirectional approach is assessed for its language generation and handling of long-range dependencies; Bidirectional encoding is examined for its effectiveness in context understanding, and XLNet permutation-based training is analyzed for its robust contextual comprehension. The experimental results reveal that GPT excels in generative tasks but is constrained by its unidirectional nature. BERT achieves superior accuracy in comprehension tasks but is computationally demanding and susceptible to pre-training bias. XLNet outperforms both GPT and BERT in accuracy and contextual understanding, though at the cost of increased complexity. The results offer a significant understanding of the effectiveness and applicability of these models, suggesting future research directions such as hybrid models and improvements in efficiency.
APA, Harvard, Vancouver, ISO, and other styles
12

Park, Dongju, and Chang Wook Ahn. "Self-Supervised Contextual Data Augmentation for Natural Language Processing." Symmetry 11, no. 11 (2019): 1393. http://dx.doi.org/10.3390/sym11111393.

Full text
Abstract:
In this paper, we propose a novel data augmentation method with respect to the target context of the data via self-supervised learning. Instead of looking for the exact synonyms of masked words, the proposed method finds words that can replace the original words considering the context. For self-supervised learning, we can employ the masked language model (MLM), which masks a specific word within a sentence and obtains the original word. The MLM learns the context of a sentence through asymmetrical inputs and outputs. However, without using the existing MLM, we propose a label-masked language model (LMLM) that can include label information for the mask tokens used in the MLM to effectively use the MLM in data with label information. The augmentation method performs self-supervised learning using LMLM and then implements data augmentation through the trained model. We demonstrate that our proposed method improves the classification accuracy of recurrent neural networks and convolutional neural network-based classifiers through several experiments for text classification benchmark datasets, including the Stanford Sentiment Treebank-5 (SST5), the Stanford Sentiment Treebank-2 (SST2), the subjectivity (Subj), the Multi-Perspective Question Answering (MPQA), the Movie Reviews (MR), and the Text Retrieval Conference (TREC) datasets. In addition, since the proposed method does not use external data, it can eliminate the time spent collecting external data, or pre-training using external data.
APA, Harvard, Vancouver, ISO, and other styles
13

Rotimi-Williams Bello. "Architecture for IT Internship Recruitment Process Based on AWS Cloud." Journal of Information Systems Engineering and Management 10, no. 49s (2025): 532–42. https://doi.org/10.52783/jisem.v10i49s.9903.

Full text
Abstract:
Introduction: The current recruitment processes face challenges due to lack of automation and scalability. Filtering student data manually is both time-intensive and prone to errors. Managing high volumes of applications securely is difficult, and there is limited capability to effectively track applications and notify students. Objectives: This paper aims to design a secure, automated, and scalable cloud application on Amazon Web Services (AWS) leveraged by NLP-based deep learning models like Bidirectional Encoder Representations from Transformers (BERT) and Sentence Transformer to streamline Information Technology (IT) internship recruitment using Amazon S3 + CloudFront, AWS WAF, API Gateway, AWS Step Functions, AWS Lambda, Amazon RDS, AWS SNS, and CloudWatch. Methods: AWS’s functions as well as powerful machine learning tools were utilized for the recruitment process automation and streamlining. We present BERT fine-tuning results on the research tasks by employing datasets such as (1) Stanford Question Answering Dataset (SQuAD v1.1), (2) SQuAD 2.0, (3) General Language Understanding Evaluation (GLUE) benchmark, and (4) Situations With Adversarial Generations (SWAG) dataset. Results: With SQuAD v1.1, our proposed AWS-based method obtained 88.1 (EM) and 95.1 (F1) for Dev, and 90.1 (EM) and 95.2 (F1) for Test. With SQuAD v2.0, our proposed AWS-based method obtained 83.2 (EM) and 86.3 (F1) for Dev, and 83.4 (EM) and 88.4 (F1) for Test. Our proposed AWS-based method obtained 88.9 (Dev) and 88.8 (Test) for SWAG Dev and test accuracy. Conclusions: The solution provided by our proposed method simplifies the recruitment process, strengthens security through AWS services, scales effortlessly to manage high volumes of applications, automates notifications for better communication, provides administrators with convenient access to recruitment data, offers a cost-efficient and fully managed cloud-based infrastructure.
APA, Harvard, Vancouver, ISO, and other styles
14

Xu, Chuanyun, Zixu Liu, Gang Li, Changpeng Zhu, and Yang Zhang. "Multigranularity Syntax Guidance with Graph Structure for Machine Reading Comprehension." Applied Sciences 12, no. 19 (2022): 9525. http://dx.doi.org/10.3390/app12199525.

Full text
Abstract:
In recent years, pre-trained language models, represented by the bidirectional encoder representations from transformers (BERT) model, have achieved remarkable success in machine reading comprehension (MRC). However, limited by the structure of BERT-based MRC models (for example, restrictions on word count), such models cannot effectively integrate significant features, such as syntax relations, semantic connections, and long-distance semantics between sentences, leading to the inability of the available models to better understand the intrinsic connections between text and questions to be answered based on it. In this paper, a multi-granularity syntax guidance (MgSG) module that consists of a “graph with dependence” module and a “graph with entity” module is proposed. MgSG selects both sentence and word granularities to guide the text model to decipher the text. In particular, syntactic constraints are used to guide the text model while exploiting the global nature of graph neural networks to enhance the model’s ability to construct long-range semantics. Simultaneously, named entities play an important role in text and answers and focusing on entities can improve the model’s understanding of the text’s major idea. Ultimately, fusing multiple embedding representations to form a representation yields the semantics of the context and the questions. Experiments demonstrate that the performance of the proposed method on the Stanford Question Answering Dataset is better when compared with the traditional BERT baseline model. The experimental results illustrate that our proposed “MgSG” module effectively utilizes the graph structure to learn the internal features of sentences, solve the problem of long-distance semantics, while effectively improving the performance of PrLM in machine reading comprehension.
APA, Harvard, Vancouver, ISO, and other styles
15

Wang, Zhensheng, Wenmian Yang, Kun Zhou, Yiquan Zhang, and Weijia Jia. "RETQA: A Large-Scale Open-Domain Tabular Question Answering Dataset for Real Estate Sector." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 24 (2025): 25452–60. https://doi.org/10.1609/aaai.v39i24.34734.

Full text
Abstract:
The real estate market relies heavily on structured data, such as property details, market trends, and price fluctuations. However, the lack of specialized Tabular Question Answering datasets in this domain limits the development of automated question-answering systems. To fill this gap, we introduce RETQA, the first large-scale open-domain Chinese Tabular Question Answering dataset for Real Estate. RETQA comprises 4,932 tables and 20,762 question-answer pairs across 16 sub-fields within three major domains: property information, real estate company finance information and land auction information. Compared with existing tabular question answering datasets, RETQA poses greater challenges due to three key factors: long-table structures, open-domain retrieval, and multi-domain queries. To tackle these challenges, we propose the SLUTQA framework, which integrates large language models with spoken language understanding tasks to enhance retrieval and answering accuracy. Extensive experiments demonstrate that SLUTQA significantly improves the performance of large language models on RETQA by in-context learning. RETQA and SLUTQA provide essential resources for advancing tabular question answering research in the real estate domain, addressing critical challenges in open-domain and long-table question-answering.
APA, Harvard, Vancouver, ISO, and other styles
16

Yunika Bajracharya, Suban Shrestha, Saurav Bastola, and Sanjivan Satyal. "Extractive Nepali Question Answering System." KEC Journal of Science and Engineering 9, no. 1 (2025): 95–102. https://doi.org/10.3126/kjse.v9i1.78368.

Full text
Abstract:
There is a noticeable gap in language processing tools and resources for Nepali, a language spoken by more than 17 million people [1] yet significantly underrepresented in computational linguistics. We present an Extractive Nepali Question Answering System designed to generate precise, contextually accurate responses in Nepali. Addressing the lack of high-quality training data, we contribute three key datasets: a Nepali and Hindi translation of SQuAD 1.1, a Nepali translation of XQuAD for benchmarking, and a curated Nepali QA dataset derived from Belebele’s MCQ data. To mitigate translation-induced answer span loss, we utilize translation-invariant tokens, improving span retention from 50% to 93%, and evaluate translation quality using human assessment and GPT-4, confirming a faithful answer span distribution. We evaluate our models on XQuAD and our curated dataset, demonstrating the effectiveness of fine-tuning multilingual models for Nepali QA. Our best-performing model achieves an exact match (EM) score of 72.99 and an F1 score of 84.13 on XQuAD-Nepali. These results establish a strong baseline for Nepali QA and highlight the impact of utilizing cross-lingual transfer from same language family data. All datasets and code are publicly available, encouraging further advancements in Nepali NLP research.
APA, Harvard, Vancouver, ISO, and other styles
17

Jiang, Jianwen, Ziqiang Chen, Haojie Lin, Xibin Zhao, and Yue Gao. "Divide and Conquer: Question-Guided Spatio-Temporal Contextual Attention for Video Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11101–8. http://dx.doi.org/10.1609/aaai.v34i07.6766.

Full text
Abstract:
Understanding questions and finding clues for answers are the key for video question answering. Compared with image question answering, video question answering (Video QA) requires to find the clues accurately on both spatial and temporal dimension simultaneously, and thus is more challenging. However, the relationship between spatio-temporal information and question still has not been well utilized in most existing methods for Video QA. To tackle this problem, we propose a Question-Guided Spatio-Temporal Contextual Attention Network (QueST) method. In QueST, we divide the semantic features generated from question into two separate parts: the spatial part and the temporal part, respectively guiding the process of constructing the contextual attention on spatial and temporal dimension. Under the guidance of the corresponding contextual attention, visual features can be better exploited on both spatial and temporal dimensions. To evaluate the effectiveness of the proposed method, experiments are conducted on TGIF-QA dataset, MSRVTT-QA dataset and MSVD-QA dataset. Experimental results and comparisons with the state-of-the-art methods have shown that our method can achieve superior performance.
APA, Harvard, Vancouver, ISO, and other styles
18

Andrew, Peter, and Abba Suganda Girsang. "IndoBART optimization for question answer generation system with longformer attention." International Journal of Informatics and Communication Technology (IJ-ICT) 14, no. 2 (2025): 478. https://doi.org/10.11591/ijict.v14i2.pp478-487.

Full text
Abstract:
The Incorporation of Question Answering system holds immense potential for addressing Indonesia’s educational disparities between the abundance of high school students and the limited number of teachers in Indonesia. These studies aim to enhance the Question Answering System model tailored for the Indonesian language dataset through enhancements to the Indonesian IndoBART model. Improvement was done by incorporating Longformer’s sliding windows attention mechanism into the IndoBART model, it would increase model proficiency in managing extended sequence tasks such as question answering. The dataset used in this research was TyDiQA multilingual dataset and translated the SQuADv2 dataset. The evaluation indicates that the Longformer-IndoBART model outperforms its predecessor on the TyDiQA dataset, showcasing an average 26% enhancement across F1, Exact Match, BLEU, and ROUGE metrics. Nevertheless, it experienced a minor setback on the SQuAD v2 dataset, leading to an average decrease of 0.6% across all metrics.
APA, Harvard, Vancouver, ISO, and other styles
19

Shah, Sanket, Anand Mishra, Naganand Yadati, and Partha Pratim Talukdar. "KVQA: Knowledge-Aware Visual Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8876–84. http://dx.doi.org/10.1609/aaai.v33i01.33018876.

Full text
Abstract:
Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In conventional VQA, one may ask questions about an image which can be answered purely based on its content. For example, given an image with people in it, a typical VQA question may inquire about the number of people in the image. More recently, there is growing interest in answering questions which require commonsense knowledge involving common nouns (e.g., cats, dogs, microphones) present in the image. In spite of this progress, the important problem of answering questions requiring world knowledge about named entities (e.g., Barack Obama, White House, United Nations) in the image has not been addressed in prior research. We address this gap in this paper, and introduce KVQA – the first dataset for the task of (world) knowledge-aware VQA. KVQA consists of 183K question-answer pairs involving more than 18K named entities and 24K images. Questions in this dataset require multi-entity, multi-relation, and multi-hop reasoning over large Knowledge Graphs (KG) to arrive at an answer. To the best of our knowledge, KVQA is the largest dataset for exploring VQA over KG. Further, we also provide baseline performances using state-of-the-art methods on KVQA.
APA, Harvard, Vancouver, ISO, and other styles
20

Xu, Marie-Anne, and Rahul Khanna. "Evaluation of Single-Span Models on Extractive Multi-Span Question-Answering." International journal of Web & Semantic Technology 12, no. 1 (2021): 19–29. http://dx.doi.org/10.5121/ijwest.2021.12102.

Full text
Abstract:
Machine Reading Comprehension (MRC), particularly extractive close-domain question-answering, is a prominent field in Natural Language Processing (NLP). Given a question and a passage or set of passages, a machine must be able to extract the appropriate answer from the passage(s). However, the majority of these existing questions have only one answer, and more substantial testing on questions with multiple answers, or multi-span questions, has not yet been applied. Thus, we introduce a newly compiled dataset consisting of questions with multiple answers that originate from previously existing datasets. In addition, we run BERT-based models pre-trained for question-answering on our constructed dataset to evaluate their reading comprehension abilities. Runtime of base models on the entire dataset is approximately one day while the runtime for all models on a third of the dataset is a little over two days. Among the three of BERT-based models we ran, RoBERTa exhibits the highest consistent performance, regardless of size. We find that all our models perform similarly on this new, multi-span dataset compared to the single-span source datasets. While the models tested on the source datasets were slightly fine-tuned in order to return multiple answers, performance is similar enough to judge that task formulation does not drastically affect question-answering abilities. Our evaluations indicate that these models are indeed capable of adjusting to answer questions that require multiple answers. We hope that our findings will assist future development in question-answering and improve existing question-answering products and methods.
APA, Harvard, Vancouver, ISO, and other styles
21

Lamm, Matthew, Jennimaria Palomaki, Chris Alberti, et al. "QED: A Framework and Dataset for Explanations in Question Answering." Transactions of the Association for Computational Linguistics 9 (2021): 790–806. http://dx.doi.org/10.1162/tacl_a_00398.

Full text
Abstract:
A question answering system that in addition to providing an answer provides an explanation of the reasoning that leads to that answer has potential advantages in terms of debuggability, extensibility, and trust. To this end, we propose QED, a linguistically informed, extensible framework for explanations in question answering. A QED explanation specifies the relationship between a question and answer according to formal semantic notions such as referential equality, sentencehood, and entailment. We describe and publicly release an expert-annotated dataset of QED explanations built upon a subset of the Google Natural Questions dataset, and report baseline models on two tasks—post- hoc explanation generation given an answer, and joint question answering and explanation generation. In the joint setting, a promising result suggests that training on a relatively small amount of QED data can improve question answering. In addition to describing the formal, language-theoretic motivations for the QED approach, we describe a large user study showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.
APA, Harvard, Vancouver, ISO, and other styles
22

Zhong, Haoxi, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. "JEC-QA: A Legal-Domain Question Answering Dataset." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 9701–8. http://dx.doi.org/10.1609/aaai.v34i05.6519.

Full text
Abstract:
We present JEC-QA, the largest question answering dataset in the legal domain, collected from the National Judicial Examination of China. The examination is a comprehensive evaluation of professional skills for legal practitioners. College students are required to pass the examination to be certified as a lawyer or a judge. The dataset is challenging for existing question answering methods, because both retrieving relevant materials and answering questions require the ability of logic reasoning. Due to the high demand of multiple reasoning abilities to answer legal questions, the state-of-the-art models can only achieve about 28% accuracy on JEC-QA, while skilled humans and unskilled humans can reach 81% and 64% accuracy respectively, which indicates a huge gap between humans and machines on this task. We will release JEC-QA and our baselines to help improve the reasoning ability of machine comprehension models. You can access the dataset from http://jecqa.thunlp.org/.
APA, Harvard, Vancouver, ISO, and other styles
23

Jin, Jiho, Jiseon Kim, Nayeon Lee, Haneul Yoo, Alice Oh, and Hwaran Lee. "KoBBQ: Korean Bias Benchmark for Question Answering." Transactions of the Association for Computational Linguistics 12 (2024): 507–24. http://dx.doi.org/10.1162/tacl_a_00661.

Full text
Abstract:
Abstract Warning: This paper contains examples of stereotypes and biases. The Bias Benchmark for Question Answering (BBQ) is designed to evaluate social biases of language models (LMs), but it is not simple to adapt this benchmark to cultural contexts other than the US because social biases depend heavily on the cultural context. In this paper, we present KoBBQ, a Korean bias benchmark dataset, and we propose a general framework that addresses considerations for cultural adaptation of a dataset. Our framework includes partitioning the BBQ dataset into three classes—Simply-Transferred (can be used directly after cultural translation), Target-Modified (requires localization in target groups), and Sample-Removed (does not fit Korean culture)—and adding four new categories of bias specific to Korean culture. We conduct a large-scale survey to collect and validate the social biases and the targets of the biases that reflect the stereotypes in Korean culture. The resulting KoBBQ dataset comprises 268 templates and 76,048 samples across 12 categories of social bias. We use KoBBQ to measure the accuracy and bias scores of several state-of-the-art multilingual LMs. The results clearly show differences in the bias of LMs as measured by KoBBQ and a machine-translated version of BBQ, demonstrating the need for and utility of a well-constructed, culturally aware social bias benchmark.
APA, Harvard, Vancouver, ISO, and other styles
24

Reddy, Siva, Danqi Chen, and Christopher D. Manning. "CoQA: A Conversational Question Answering Challenge." Transactions of the Association for Computational Linguistics 7 (November 2019): 249–66. http://dx.doi.org/10.1162/tacl_a_00266.

Full text
Abstract:
Humans gather information through conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning). We evaluate strong dialogue and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating that there is ample room for improvement. We present CoQA as a challenge to the community at https://stanfordnlp.github.io/coqa .
APA, Harvard, Vancouver, ISO, and other styles
25

Staš, Ján, Daniel Hládek, and Tomáš Koctúr. "Slovak Question Answering Dataset Based on the Machine Translation of the Squad V2.0." Journal of Linguistics/Jazykovedný casopis 74, no. 1 (2023): 381–90. http://dx.doi.org/10.2478/jazcas-2023-0054.

Full text
Abstract:
Abstract This paper describes the process of building the first large-scale machinetranslated question answering dataset SQuAD-sk for the Slovak language. The dataset was automatically translated from the original English SQuAD v2.0 using the Marian neural machine translation together with the Helsinki-NLP Opus English-Slovak model. Moreover, we proposed an effective approach for the approximate search of the translated answer in the translated paragraph based on measuring their similarity using their word vectors. In this way, we obtained more than 92% of the translated questions and answers from the original English dataset. We then used this machine-translated dataset to train the Slovak question answering system by fine-tuning monolingual and multilingual BERT-based language models. The scores achieved by EM = 69.48% and F1 = 78.87% for the fine-tuned mBERT model show comparable results of question answering with recently published machinetranslated SQuAD datasets for other European languages.
APA, Harvard, Vancouver, ISO, and other styles
26

Longpre, Shayne, Yi Lu, and Joachim Daiber. "MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering." Transactions of the Association for Computational Linguistics 9 (2021): 1389–406. http://dx.doi.org/10.1162/tacl_a_00433.

Full text
Abstract:
Abstract Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language- independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering. We benchmark a variety of state- of-the-art methods and baselines for generative and extractive question answering, trained on Natural Questions, in zero shot and translation settings. Results indicate this dataset is challenging even in English, but especially in low-resource languages.1
APA, Harvard, Vancouver, ISO, and other styles
27

Diaconu, Bogdan-Alexandru, and Beáta Lázár-Lőrincz. "Romanian Question Answering Using Transformer Based Neural Networks." Studia Universitatis Babeș-Bolyai Informatica 67, no. 1 (2022): 37–44. http://dx.doi.org/10.24193/subbi.2022.1.03.

Full text
Abstract:
"Question answering is the task of predicting answers for questions based on a context paragraph. It has become especially important, as the large amounts of textual data available online requires not only gathering information but also the task of findings specific answers to specific questions. In this work, we present experiments evaluated on the XQuAD-ro question answering dataset that has been recently published based on the translation of the SQuAD dataset into Romanian. Our bestperforming model, Romanian fine-tuned BERT, achieves an F1 score of 0.80 and an EM score of 0.73. We show that fine-tuning the model with the addition of the Romanian translation slightly increases the evaluation metrics. Keywords and phrases: question answering, deep learning, Transformer, Romanian. "
APA, Harvard, Vancouver, ISO, and other styles
28

Kaddari, Zakaria, and Toumi Bouchentouf. "FrBMedQA: the first French biomedical question answering dataset." IAES International Journal of Artificial Intelligence (IJ-AI) 11, no. 4 (2022): 1588. http://dx.doi.org/10.11591/ijai.v11.i4.pp1588-1595.

Full text
Abstract:
FrBMedQA is the first French biomedical question answering dataset, containing 41k+ passage-question instances. It was automatically constructed in a cloze-style manner, from biomedical French Wikipedia articles. To test the validity and difficulty of the dataset, we experimented with four statistical baseline models, a biomedical bidirectional encoder representation from transformers (BERT)-based model, and two French BERT-based language model. We also did human evaluation on a subset of the test set. All the three tested models were not able to surpass the best performing baseline model. Human performance at 61.11% is leading the leaderboard with more than 8% from the best performing model. We made available the dataset and the code to reproduce our results.
APA, Harvard, Vancouver, ISO, and other styles
29

Zakaria, Kaddari, and Bouchentouf Toumi. "FrBMedQA: the first French biomedical question answering dataset." International Journal of Artificial Intelligence (IJ-AI) 11, no. 4 (2022): 1588–95. https://doi.org/10.11591/ijai.v11.i4.pp1588-1595.

Full text
Abstract:
FrBMedQA is the first French biomedical question answering dataset, containing 41k+ passage-question instances. It was automatically constructed in a cloze-style manner, from biomedical French Wikipedia articles. To test the validity and difficulty of the dataset, we experimented with four statistical baseline models, a biomedical bidirectional encoder representation from transformers (BERT)-based model, and two French BERT-based language model. We also did human evaluation on a subset of the test set. All the three tested models were not able to surpass the best performing baseline model. Human performance at 61.11% is leading the leaderboard with more than 8% from the best performing model. We made available the dataset and the code to reproduce our results.
APA, Harvard, Vancouver, ISO, and other styles
30

Abdalla, Mahmoud, Mahmoud SalahEldin Kasem, Mohamed Mahmoud, et al. "ReceiptQA: A Question-Answering Dataset for Receipt Understanding." Mathematics 13, no. 11 (2025): 1760. https://doi.org/10.3390/math13111760.

Full text
Abstract:
Understanding information extracted from receipts is a critical task for real-world applications such as financial tracking, auditing, and enterprise resource management. In this paper, we introduce ReceiptQA, a novel large-scale dataset designed for receipt understanding through question-answering (QA). ReceiptQA contains 171,000 question–answer pairs derived from 3500 receipt images, constructed via two complementary methodologies: (1) LLM-Generated Dataset: 70,000 synthetically generated QA pairs, where each receipt is paired with 20 unique, context-specific questions. These questions are produced using a state-of-the-art large language model (LLM) and validated through human annotation to ensure accuracy, relevance, and diversity. (2) Human-Created Dataset: 101,000 manually crafted questions spanning answerable and unanswerable queries. This subset includes carefully designed templates of varying difficulty (easy/hard) to comprehensively evaluate QA systems across diverse receipt domains. To benchmark performance, we evaluate leading vision–language models (VLMs) and language models (LMs), including GPT-4o, Phi-3B, Phi-3.5B, LLaVA-7B, InternVL2 (4B/8B), LLaMA-3.2, and Gemini. We further fine-tune a LLaMA-3.2 11B model on ReceiptQA, achieving significant improvements over baseline models on validation and test sets. Our analysis uncovers critical strengths and limitations of existing models in handling receipt-based QA tasks, establishing a robust benchmark for future research.
APA, Harvard, Vancouver, ISO, and other styles
31

Dipali Koshti, Ashutosh Gupta, Mukesh Kalla, Pramit Kanjilal, Sushant Shanbhag, and Nirmit Karkera. "EDUVQA – Visual Question Answering: An Educational Perspective." Journal of Advanced Research in Applied Sciences and Engineering Technology 42, no. 1 (2024): 144–57. http://dx.doi.org/10.37934/araset.42.1.144157.

Full text
Abstract:
Increasing applications of artificial intelligence in the field of education have changed the way school children learn various concepts. Educational Visual Question Answering or EDUVQA is one such application that allows students to interact directly with images, ask educational questions, and get the correct answer. Two major challenges faced by educational VQA are the lack of availability of domain-specific datasets and often it requires referring to the external knowledge bases to answer open-domain questions. We propose a novel EDUVQA model developed especially for educational purposes and introduce our own EDUVQA dataset. The dataset consists of four categories of images - animals, plants, fruits, and vegetables. The majority of the currently used techniques focus on the extraction of picture and question characteristics in order to discover the joint feature embeddings via multimodal fusion or attention mechanisms. We propose a different method that aims to better utilize the semantic knowledge present in images. Our approach entails building an EDUVQA dataset using educational images, where each data point is made up of an image, a question that corresponds to it, a valid response, and a fact that supports it. The fact is created in the form of <S,V,O> triplet where ‘s’ denotes a subject, ‘v’ a verb, and ‘o’ an object. First, an SVO detector model is trained on EDUVQA dataset capable of predicting the Subject, Verb, and Object present in the image-question pair. Using this <S,V,O> triplet, the most relevant facts from our fact base are extracted. The final answer is predicted using these extracted facts, image, and question attributes. The image features are extricated using pretrained ResNet and question features using a pre-trained BERT model. We have optimized and improved on the current methodologies that use a Relation-based approach and built our SVO-detector model that outperforms current models by 10%.
APA, Harvard, Vancouver, ISO, and other styles
32

Cowell, Andrew J., Alan R. Chappell, and David A. Thurmanb. "Towards an Adaptive Question Answering System for Intelligence Analysts." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 49, no. 10 (2005): 927–31. http://dx.doi.org/10.1177/154193120504901012.

Full text
Abstract:
Battelle is working in partnership with Stanford University's Knowledge Systems Laboratory (KSL) and IBM's T.J. Watson Research Center to develop a suite of technologies for knowledge discovery, knowledge extraction, knowledge representation, automated reasoning, and human information interaction, in unison entitled “Knowledge Associates for Novel Intelligence” (KANI). We have developed an integrated analytic environment composed of a collection of analyst associates, software components that aid the analyst at different stages of the analytical process. In this paper, we discuss our efforts in the research, design and implementation of the question answering elements of the Information Interaction Associate. Specifically, we focus on the techniques employed to produce an effective user interface to these elements. In addition, we touch upon the methodologies we intend to use to empirically evaluate our approach with active intelligence analysts.
APA, Harvard, Vancouver, ISO, and other styles
33

Dun, Yijie, Na Wang, Min Wang, and Tianyong Hao. "Revealing Learner Interests through Topic Mining from Question-Answering Data." International Journal of Distance Education Technologies 15, no. 2 (2017): 18–32. http://dx.doi.org/10.4018/ijdet.2017040102.

Full text
Abstract:
In a question-answering system, learner generated content including asked and answered questions is a meaningful resource to capture learning interests. This paper proposes an approach based on question topic mining for revealing learners' concerned topics in real community question-answering systems. The authors' approach firstly preprocesses all questions associated with learners. Afterwards, it analyzes each question with text features and generates a weight feature matrix using a revised TF/IDF method. In order to decrease the sparsity issue of data distribution, the authors employ three concept-mapping strategies including named entity recognition, synonym extension, and hyponym replacement. Applying an SVM classifier, their approach categorizes user questions into representative topics. Three experiments are conducted based on a TREC dataset and an actual dataset containing 1,120 questions posted by learners from a commercial question-answering community. Results demonstrate the effectiveness of the method compared with conventional classifiers as baselines.
APA, Harvard, Vancouver, ISO, and other styles
34

R, Lokesh, Madhusudan C, Darshan T, and Sunil kumar N. "VISUAL QUESTIONING AND ANSWERING." International Journal of Innovative Research in Advanced Engineering 9, no. 8 (2022): 312–15. http://dx.doi.org/10.26562/ijirae.2022.v0908.29.

Full text
Abstract:
In general, a VQA system is an algorithm that takes a picture and natural language query about the image as input and produces natural language response as output. The nature of a multi-discipline research problem necessitates this. This is a diagnostic dataset that assesses a variety of visual reason skills. VQA features few biases and through annotations defining the type of reasoning required for each question. The dataset is used to examine a number of recent visual reasoning systems, revealing new information about their capabilities and limitations.
APA, Harvard, Vancouver, ISO, and other styles
35

Zhang, Xiao, and Guorui Zhao. "Taking a Closed-Book Examination: Decoupling KB-Based Inference by Virtual Hypothesis for Answering Real-World Questions." Computational Intelligence and Neuroscience 2021 (February 22, 2021): 1–9. http://dx.doi.org/10.1155/2021/6689740.

Full text
Abstract:
Complex question answering in real world is a comprehensive and challenging task due to its demand for deeper question understanding and deeper inference. Information retrieval is a common solution and easy to implement, but it cannot answer questions which need long-distance dependencies across multiple documents. Knowledge base (KB) organizes information as a graph, and KB-based inference can employ logic formulas or knowledge embeddings to capture such long-distance semantic associations. However, KB-based inference has not been applied to real-world question answering well, because there are gaps among natural language, complex semantic structure, and appropriate hypothesis for inference. We propose decoupling KB-based inference by transforming a question into a high-level triplet in the KB, which makes it possible to apply KB-based inference methods to answer complex questions. In addition, we create a specialized question answering dataset only for inference, and our method is proved to be effective by conducting experiments on both AI2 Science Questions dataset and ours.
APA, Harvard, Vancouver, ISO, and other styles
36

Wu, Yuehong, Zhiwei Wen, and Shangsong Liang. "Predicting Question Popularity for Community Question Answering." Electronics 13, no. 16 (2024): 3260. http://dx.doi.org/10.3390/electronics13163260.

Full text
Abstract:
In this paper, we study the problem of predicting popularities of questions in Community Question Answering (CQA). To address this problem, we propose a Posterior Attention Recurrent Point Process Model (PARPP) to take both the interaction of users and the Matthew effect into account for question popularity prediction. Our PARPP uses long short-term memory (LSTM) to encode the observed history and another LSTM network to record each step of decoding information. At each decoding step, it uses prior attention to capture answers that have a greater impact on the problem. When a new answer is observed, it uses Bayes’ rule to modify prior attention and obtain posterior attention. Then, the posterior attention is used to update the decoding status. We further introduce a convergence strategy to capture the Matthew effect in CQA. We conduct experiments on a Zhihu dataset crawled from a famous Chinese CQA forum. The experimental results show that our model outperforms several state-of-the-art methods. We further analyze the attention mechanism in our model. Our analysis shows that the proposed attention mechanism can better capture the impact of each answer on the future popularity of the question, which makes our model more interpretable. Our study would shed light on other similar studies such as answer ranking in response to the question and finding experts who have expertise on the topics of the questions.
APA, Harvard, Vancouver, ISO, and other styles
37

Risco Cosavalente, Mariana. "Visual Question Answering for Peruvian Cuisine in Regional Spanish." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 28 (2025): 29602–4. https://doi.org/10.1609/aaai.v39i28.35339.

Full text
Abstract:
This project leverages Visual Question Answering (VQA) to promote Peruvian gastronomy by utilizing a culturally rich dataset and advanced models such as LLaVA-1.5 and GPT-2 Large. The evaluation will comprise both automated metrics and culinary expert assessments. This system addresses regional variations in dish names, promotes inclusivity by involving Peruvians from diverse regions in dataset construction, and enhances cultural representation.
APA, Harvard, Vancouver, ISO, and other styles
38

Adlakha, Vaibhav, Shehzaad Dhuliawala, Kaheer Suleman, Harm de Vries, and Siva Reddy. "TopiOCQA: Open-domain Conversational Question Answering with Topic Switching." Transactions of the Association for Computational Linguistics 10 (2022): 468–83. http://dx.doi.org/10.1162/tacl_a_00471.

Full text
Abstract:
Abstract In a conversational question answering scenario, a questioner seeks to extract information about a topic through a series of interdependent questions and answers. As the conversation progresses, they may switch to related topics, a phenomenon commonly observed in information-seeking search sessions. However, current datasets for conversational question answering are limiting in two ways: 1) they do not contain topic switches; and 2) they assume the reference text for the conversation is given, that is, the setting is not open-domain. We introduce TopiOCQA (pronounced Tapioca), an open-domain conversational dataset with topic switches based on Wikipedia. TopiOCQA contains 3,920 conversations with information-seeking questions and free-form answers. On average, a conversation in our dataset spans 13 question-answer turns and involves four topics (documents). TopiOCQA poses a challenging test-bed for models, where efficient retrieval is required on multiple turns of the same conversation, in conjunction with constructing valid responses using conversational history. We evaluate several baselines, by combining state-of-the-art document retrieval methods with neural reader models. Our best model achieves F1 of 55.8, falling short of human performance by 14.2 points, indicating the difficulty of our dataset. Our dataset and code are available at https://mcgill-nlp.github.io/topiocqa.
APA, Harvard, Vancouver, ISO, and other styles
39

Zhang, Liang, Anwen Hu, Jing Zhang, Shuo Hu, and Qin Jin. "MPMQA: Multimodal Question Answering on Product Manuals." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (2023): 13958–66. http://dx.doi.org/10.1609/aaai.v37i11.26634.

Full text
Abstract:
Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents and only retain textual parts. In this work, to emphasize the importance of multimodal contents, we propose a Multimodal Product Manual Question Answering (MPMQA) task. For each question, MPMQA requires the model not only to process multimodal contents but also to provide multimodal answers. To support MPMQA, a large-scale dataset PM209 is constructed with human annotations, which contains 209 product manuals from 27 well-known consumer electronic brands. Human annotations include 6 types of semantic regions for manual contents and 22,021 pairs of question and answer. Especially, each answer consists of a textual sentence and related visual regions from manuals. Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers. We further propose a unified model that can perform these two subtasks all together and achieve comparable performance with multiple task-specific models. The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA.
APA, Harvard, Vancouver, ISO, and other styles
40

Gyanwali, Aashish, Binod Sapkota, Abhishek Koirala, and Babu R. Dawadi. "Modular Co-attention Networks in Nepali Visual Question Answering Systems." Asian Journal of Research in Computer Science 17, no. 10 (2024): 62–84. http://dx.doi.org/10.9734/ajrcos/2024/v17i10510.

Full text
Abstract:
Visual question answering (VQA) has been regarded as a challenging task requiring a perfect blend of computer vision and natural language processing. As no dataset was available to train such a model for the Nepali language, a new dataset was developed during the research by translating the VQAv2 dataset. Then the dataset consisting of 202,577 images and 886,560 questions was used to train an attention-based VQA model. The dataset consists of yes/no, counting, and other questions with primarily one-word answers. Modular Co-attention Network (MCAN) was applied to the visual features extracted using the Faster RCNN framework and question embeddings extracted using the Nepali GloVe model. After co-attending the visual and language features for a few cascaded MCAN layers, the features are fused to train the whole network. During evaluation, an overall accuracy of 69.87% was obtained with 81.09% accuracy in yes/no type questions. The results surpassed the performance of models developed for Hindi and Bengali languages. Overall, novel research has been done in the Nepali Language VQA domain paving the way for further advancements.
APA, Harvard, Vancouver, ISO, and other styles
41

Khot, Tushar, Peter Clark, Michal Guerquin, Peter Jansen, and Ashish Sabharwal. "QASC: A Dataset for Question Answering via Sentence Composition." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (2020): 8082–90. http://dx.doi.org/10.1609/aaai.v34i05.6319.

Full text
Abstract:
Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are annotated in a large corpus, and (b) the decomposition into these facts is not evident from the question itself. The latter makes retrieval challenging as the system must introduce new concepts or relations in order to discover potential decompositions. Further, the reasoning model must then learn to identify valid compositions of these retrieved facts using common-sense reasoning. To help address these challenges, we provide annotation for supporting facts as well as their composition. Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges. We use other multiple-choice datasets as additional training data to strengthen the reasoning model. Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance.
APA, Harvard, Vancouver, ISO, and other styles
42

Ismail, Walaa Saber, and Masun Nabhan Homsi. "DAWQAS: A Dataset for Arabic Why Question Answering System." Procedia Computer Science 142 (2018): 123–31. http://dx.doi.org/10.1016/j.procs.2018.10.467.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Das, Anubrata, Samreen Anjum, and Danna Gurari. "Dataset bias: A case study for visual question answering." Proceedings of the Association for Information Science and Technology 56, no. 1 (2019): 58–67. http://dx.doi.org/10.1002/pra2.7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Lee, Seongyun, Hyunjae Kim, and Jaewoo Kang. "LIQUID: A Framework for List Question Answering Dataset Generation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (2023): 13014–24. http://dx.doi.org/10.1609/aaai.v37i11.26529.

Full text
Abstract:
Question answering (QA) models often rely on large-scale training datasets, which necessitates the development of a data generation framework to reduce the cost of manual annotations. Although several recent studies have aimed to generate synthetic questions with single-span answers, no study has been conducted on the creation of list questions with multiple, non-contiguous spans as answers. To address this gap, we propose LIQUID, an automated framework for generating list QA datasets from unlabeled corpora. We first convert a passage from Wikipedia or PubMed into a summary and extract named entities from the summarized text as candidate answers. This allows us to select answers that are semantically correlated in context and is, therefore, suitable for constructing list questions. We then create questions using an off-the-shelf question generator with the extracted entities and original passage. Finally, iterative filtering and answer expansion are performed to ensure the accuracy and completeness of the answers. Using our synthetic data, we significantly improve the performance of the previous best list QA models by exact-match F1 scores of 5.0 on MultiSpanQA, 1.9 on Quoref, and 2.8 averaged across three BioASQ benchmarks.
APA, Harvard, Vancouver, ISO, and other styles
45

Sapitri, Ade Iriani, Said Al-Faraby, and Adiwijaya Adiwijaya. "Analisis Metode Pattern Based Approach Question Answering System Pada Dataset Hukum Islam Berbasis Bahasa Indonesia." JURNAL MEDIA INFORMATIKA BUDIDARMA 2, no. 4 (2018): 159. http://dx.doi.org/10.30865/mib.v2i4.950.

Full text
Abstract:
Islamic law is a provision of the command of Allah SWT which has different laws. It takes a long time in the process of searching information manually given the many types of islamic law. From the above problems with the help of Question Answering System can solve the problem. The purpose of this study is to assist usesrs in finding the required information with input in the form of question with property category (OBJECT) What, (PERSON) Who, (LOCATION) Where, (TIME) When and (COUNT) How much. Research Question Answering System is implemented with the Pattern Based Approach method based on pattern classification. In this research we get the result of accuracy of answer equal 64,5% in every type of question category “What”,”When”,”How much”, “Who”, and “Where” with answer accuracy equal to 63,3%, 65%, 73,3%, 65% and 40%. From the accuracy results obtained that the method of Pattern Based Approach is able to be implemented in Question Answering System to solve the above problems
APA, Harvard, Vancouver, ISO, and other styles
46

Kim, Incheol. "Visual Experience-Based Question Answering with Complex Multimodal Environments." Mathematical Problems in Engineering 2020 (November 19, 2020): 1–18. http://dx.doi.org/10.1155/2020/8567271.

Full text
Abstract:
This paper proposes a novel visual experience-based question answering problem (VEQA) and the corresponding dataset for embodied intelligence research that requires an agent to do actions, understand 3D scenes from successive partial input images, and answer natural language questions about its visual experiences in real time. Unlike the conventional visual question answering (VQA), the VEQA problem assumes both partial observability and dynamics of a complex multimodal environment. To address this VEQA problem, we propose a hybrid visual question answering system, VQAS, integrating a deep neural network-based scene graph generation model and a rule-based knowledge reasoning system. The proposed system can generate more accurate scene graphs for dynamic environments with some uncertainty. Moreover, it can answer complex questions through knowledge reasoning with rich background knowledge. Results of experiments using a photo-realistic 3D simulated environment, AI2-THOR, and the VEQA benchmark dataset prove the high performance of the proposed system.
APA, Harvard, Vancouver, ISO, and other styles
47

Yu, Zhou, Dejing Xu, Jun Yu, et al. "ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9127–34. http://dx.doi.org/10.1609/aaai.v33i01.33019127.

Full text
Abstract:
Recent developments in modeling language and vision have been successfully applied to image question answering. It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA). Compared to the image domain where large scale and fully annotated benchmark datasets exists, VideoQA datasets are limited to small scale and are automatically generated, etc. These limitations restrict their applicability in practice. Here we introduce ActivityNet-QA, a fully annotated and large scale VideoQA dataset. The dataset consists of 58,000 QA pairs on 5,800 complex web videos derived from the popular ActivityNet dataset. We present a statistical analysis of our ActivityNet-QA dataset and conduct extensive experiments on it by comparing existing VideoQA baselines. Moreover, we explore various video representation strategies to improve VideoQA performance, especially for long videos.
APA, Harvard, Vancouver, ISO, and other styles
48

Zhang, Pufen, Hong Lan, and Muhammad Asim Khan. "Multiple Context Learning Networks for Visual Question Answering." Scientific Programming 2022 (February 9, 2022): 1–11. http://dx.doi.org/10.1155/2022/4378553.

Full text
Abstract:
A novel Multiple Context Learning Network (MCLN) is proposed to model multiple contexts for visual question answering (VQA), aiming to learn comprehensive contexts. Three kinds of contexts are discussed and the corresponding three context learning modules are proposed based on a uniform context learning strategy. Specifically, the proposed context learning modules are visual context learning module (VCL), textual context learning module (TCL), and visual-textual context learning module (VTCL). The VCL and TCL, respectively, learn the context of objects in an image and the context of words in a question, allowing object and word features to own intra-modal context information. The VTCL is performed on the concatenated visual-textual features that endows the output features with synergic visual-textual context information. These modules work together to form a multiple context learning layer (MCL) and MCL can be stacked in depth for deep context learning. Furthermore, a contextualized text encoder based on the pretrained BERT is introduced and fine-tuned, which enhances the textual context learning at the feature extraction stage of text. The approach is evaluated by using two benchmark datasets: VQA v2.0 dataset and GQA dataset. The MCLN achieves 71.05% and 71.48% overall accuracy on the test-dev and test-std sets of VQA v2.0, respectively. And an accuracy of 57.0% is gained by the MCLN on the test-standard split of GQA dataset. The MCLN outperforms the previous state-of-the-art models and the extensive ablation studies examine the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
49

Wang, Huan, Jian Li, and Jiapeng Wang. "Retrieving Chinese Questions and Answers Based on Deep-Learning Algorithm." Mathematics 11, no. 18 (2023): 3843. http://dx.doi.org/10.3390/math11183843.

Full text
Abstract:
Chinese open-domain reading comprehension question answering is a task in the field of natural language processing. Traditional neural network-based methods lack interpretability in answer reasoning when addressing open-domain reading comprehension questions. This research is grounded in cognitive science’s dual-process theory, where System One performs question reading and System Two handles reasoning, resulting in a novel Chinese open-domain question-answering retrieval algorithm. The experiment employs the publicly available WebQA dataset and is compared against other reading comprehension methods, with the F1-score reaching 78.66%, confirming the effectiveness of the proposed approach. Therefore, adopting a reading comprehension question-answering model based on cognitive graphs can effectively address Chinese reading comprehension questions.
APA, Harvard, Vancouver, ISO, and other styles
50

Zhao, Tian. "A Question Answering System for Situation Puzzle with SPQA." Frontiers in Computing and Intelligent Systems 4, no. 2 (2023): 63–67. http://dx.doi.org/10.54097/fcis.v4i2.10203.

Full text
Abstract:
There are many questions answering (QA) system built for solving QA tasks. In 2020 and 2022, Allen Institute and the University of Washington proposed UnifiedQA and UnifiedQA-v2. Their core concept is that the semantic understanding and reasoning capabilities required by models are common, and may not require format specific models although the QA task forms are different. Behind this concept, I build a new QA model named SPQA, aiming to answer the situation puzzle questions by adding new situation-puzzle related dataset (SpQ). In addition, I evaluate the performance of SPQA and UnifiedQA-v2 for fine-tuning and prompt-tuning. The results of fine-tuning indicate that SpQ dataset is important for fine-tuning and prompt-tuning to answer situation puzzle questions well, but also make the answering ability of normal yes/no questions worse. Eventually, the results of prompt-tuning indicate that the effects of SpQ is larger and more significant on situation puzzle questions and normal yes/no questions under the same data scale. In the future work, the further research like building larger SpQ dataset should be considered.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!