Relevant bibliographies by topics / Multimodal NLP

Journal articles
Dissertations / Theses
Book chapters
Conference papers
Reports

Academic literature on the topic 'Multimodal NLP'

Author: Grafiati

Published: 5 June 2025

Last updated: 16 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multimodal NLP.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multimodal NLP"

Tiwari, Manisha, Pragati Khare, Ishani Saha, and Mahesh Mali. "Multimodal NLP for image captioning : Fusing text and image modalities for accurate and informative descriptions." Journal of Information and Optimization Sciences 45, no. 4 (2024): 1041–49. http://dx.doi.org/10.47974/jios-1626.

Full text

Abstract:

Multimodal Natural Language Processing (NLP) offers significant potential for improving the understanding and generation of content that combines various modalities, including text and images. Image captioning, automatically generating textual descriptions of images, represents a crucial application of multimodal NLP. While existing methods primarily rely on image features, we propose a novel multimodal NLP model that leverages the power of both text and image modalities for generating informative and accurate image captions. Our model incorporates information from text descriptions associated with images, enabling it to capture contextual cues and generate richer captions than traditional unimodal models. We validate our method using the industry standard Flickr8K dataset and obtain cutting-edge outcomes, proving the potency of our multimodal fusion technique. Furthermore, we discuss the challenges and opportunities of multimodal NLP for image captioning and highlight its potential to revolutionise how we interact with computers and interpret visual information.

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Yingjie. "The current status and prospects of transformer in multimodality." Applied and Computational Engineering 11, no. 1 (2023): 224–30. http://dx.doi.org/10.54254/2755-2721/11/20230240.

Full text

Abstract:

At present, the attention mechanism represented by transformer has greatly promoted the development of natural language processing (NLP) and image processing (CV). However, in the multimodal field, the application of attention mechanism still mainly focuses on extracting the features of different types of data, and then fusing these features (such as text and image). With the increasing scale of the model and the instability of the Internet data, feature fusion has been difficult to solve the growing variety of multimodal problems for us, and the multimodal field has always lacked a model that can uniformly handle all types of data. In this paper, we first take the CV and NLP fields as examples to review various derived models of transformer. Then, based on the mechanism of word embedding and image embedding, we discuss how embedding with different granularity is handled uniformly under the attention mechanism in multimodal scenes. Further, we reveal that this mechanism will not only be limited to CV and NLP, but the real unified model will be able to handle tasks across data types through pre-training and fine tuning. Finally, on the specific implementation of the unified model, this paper lists several cases, and analyzes the valuable research directions in related fields.

APA, Harvard, Vancouver, ISO, and other styles

Manish Kumar Keshri. "The Integration of NLP and Computer Vision: Advanced Frameworks for Multi-Modal Content Understanding." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 11, no. 2 (2025): 2788–98. https://doi.org/10.32628/cseit25112708.

Full text

Abstract:

Content understanding systems leveraging Natural Language Processing (NLP) and Computer Vision (CV) have revolutionized how machines interpret and analyze multimodal information across diverse applications. This article explores the technologies driving advancements in content analysis, from text embedding techniques such as BERT to image and video representation methods including CNN-based approaches and Vision Transformers. It examines the challenges of processing diverse languages and regional contexts in a multimodal framework, alongside methodologies for collecting and preparing high-quality training data. The discussion covers various fusion architectures for integrating information across modalities, training approaches for multimodal classifiers, and evaluation frameworks to ensure model effectiveness. As these technologies continue to evolve, the integration of NLP and CV promises to unlock new possibilities for intelligent content understanding in an increasingly complex digital landscape.

APA, Harvard, Vancouver, ISO, and other styles

Hu, Qinrui. "Sentiment Analysis and Facial Expression Recognition in Customer Service Interactions." Frontiers in Business, Economics and Management 16, no. 3 (2024): 72–75. http://dx.doi.org/10.54097/tx980862.

Full text

Abstract:

In the evolving landscape of digital customer service, the need for advanced methods to accurately understand and respond to customer emotions has become critical. Traditional systems often rely solely on textual data, missing non-verbal cues that significantly contribute to the customer's emotional state. This study proposes a combined approach integrating Facial Expression Recognition (FER) and Natural Language Processing (NLP) to enhance emotion detection accuracy in customer service interactions. The FER component employs Convolutional Neural Networks (CNNs) to analyze facial expressions, while the NLP component uses Long Short-Term Memory (LSTM) networks to process textual data. This multimodal system aims to provide a comprehensive understanding of customer emotions by capturing both verbal and non-verbal cues. Experiments demonstrate that the integrated FER and NLP model significantly outperforms standalone models, achieving an accuracy of 92.3%, compared to 85.2% for FER-only and 87.4% for NLP-only models. The results highlight the benefits of a multimodal approach, showing substantial improvements in both training and validation performance. This study also compares the proposed model with other state-of-the-art models such as the Deep Learning Assisted Semantic Text Analysis (DLSTA) and Multimodal Emotion Recognition using Deep Belief Networks (DBN). While DLSTA achieves higher accuracy in text-based emotion detection, and DBNs provide robust emotion classification by integrating various modalities, our model effectively balances the strengths of both visual and textual data. The findings suggest that integrating FER and NLP can significantly enhance the quality of customer service by enabling more empathetic and effective interactions. Future work will focus on optimizing computational efficiency, addressing data variability, and ensuring adaptability across diverse customer service scenarios.

APA, Harvard, Vancouver, ISO, and other styles

Researcher. "UNDERSTANDING NATURAL LANGUAGE PROCESSING (NLP) TECHNIQUES." International Journal of Computer Engineering and Technology (IJCET) 15, no. 4 (2024): 527–36. https://doi.org/10.5281/zenodo.13311223.

Full text

Abstract:

Natural Language Processing (NLP) is a rapidly evolving field at the intersection of artificial intelligence, linguistics, and cognitive psychology. This article provides a comprehensive overview of NLP, exploring its core techniques, wide-ranging applications, and future directions. We delve into key NLP methods such as sentiment analysis, language generation, and named entity recognition, examining their underlying mechanisms and diverse applications. The impact of NLP across various sectors, including virtual assistants, translation services, healthcare, finance, and education, is thoroughly discussed. Despite significant advancements, NLP faces challenges in handling language ambiguity, multilingual processing, and ethical considerations. Looking ahead, the field is poised for further innovation in model efficiency, interpretability, multimodal integration, and commonsense reasoning. This review underscores NLP's transformative potential in reshaping human-computer interaction and information processing in the digital age.

APA, Harvard, Vancouver, ISO, and other styles

Researcher. "UNDERSTANDING NATURAL LANGUAGE PROCESSING (NLP) TECHNIQUES." International Journal of Research In Computer Applications and Information Technology (IJRCAIT) 15, no. 6 (2024): 1221–31. https://doi.org/10.5281/zenodo.14359554.

Full text

Abstract:

Artificial intelligence, linguistics, and cognitive psychology all come together in Natural Language Processing (NLP), an area that is changing quickly. This article discusses NLP's main methods, practical uses, and possible future developments. Key NLP techniques like sentiment analysis, language generation, and named object recognition are studied in depth and have various uses. NLP's Significant effects are discussed in many areas, such as virtual helpers, translation services, healthcare, finance, and education. NLP has come quite a way, but it still has issues with unclear words, understanding more than one language, and moral issues. More improvements will be made to model speed, interpretability, multimodal integration, and using common sense in the future. 

APA, Harvard, Vancouver, ISO, and other styles

Fan, Yuhan. "Research progress and challenges of deep learning in Natural Language Processing." Advances in Engineering Innovation 16, no. 6 (2025): None. https://doi.org/10.54254/2977-3903/2025.24550.

Full text

Abstract:

With the rapid development of artificial intelligence, Natural Language Processing (NLP) has emerged as a critical area for enabling intelligent human-computer interaction. This paper reviews key deep learning technologies and their applications in NLP. It first examines foundational techniques such as word embeddings and pre-trained models, and analyzes the structures and use cases of core models includingConvolutional Neural Networks(CNNs),Recurrent Neural Networks(RNNs) and their variants, as well as Transformers. It then explores the application of these models in tasks such as sentiment analysis, machine translation, and question-answering systems. The study highlights how pre-trained models like BERT and GPT significantly enhance semantic understanding through large-scale unsupervised learning. However, challenges remain, including limited interpretability, weak performance in low-resource languages, and inadequate multimodal integration. The paper concludes by discussing future directions such as lightweight model design, cross-lingual transfer learning, and deep multimodal fusion. This research aims to provide theoretical references for advancing NLP technology and enhancing its practicality across various domains.

APA, Harvard, Vancouver, ISO, and other styles

Wang, Bin, Chunyu Xie, Dawei Leng, and Yuhui Yin. "IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 20 (2025): 21035–43. https://doi.org/10.1609/aaai.v39i20.35400.

Full text

Abstract:

In the field of multimodal large language models (MLLMs), common methods typically involve unfreezing the language model during training to foster profound visual understanding. However, the fine-tuning of such models with vision-language data often leads to a diminution of their natural language processing (NLP) capabilities. To avoid this performance degradation, a straightforward solution is to freeze the language model while developing multimodal competencies. Unfortunately, previous works have not attained satisfactory outcomes. Building on the strategy of freezing the language model, we conduct thorough structural exploration and introduce the Inner-Adaptor Architecture (IAA). Specifically, the architecture incorporates multiple multimodal adaptors at varying depths within the large language model to facilitate direct interaction with the inherently text-oriented transformer layers, thereby enabling the frozen language model to acquire multimodal capabilities. Unlike previous approaches of freezing language models that require large-scale aligned data, our proposed architecture is able to achieve superior performance on small-scale datasets. We conduct extensive experiments to improve the general multimodal capabilities and visual grounding abilities of the MLLM. Our approach remarkably outperforms previous state-of-the-art methods across various vision-language benchmarks without sacrificing performance on NLP tasks. Code and models will be released.

APA, Harvard, Vancouver, ISO, and other styles

Mrs., Nagarathnamma S. M. "The Future of Natural Language Processing: A Survey of Recent Advances and Emerging Trends." Journal of Scholastic Engineering Science and Management 2, no. 6 (2023): 26–35. https://doi.org/10.5281/zenodo.8243058.

Full text

Abstract:

Natural language processing (NLP) is a rapidly growing field with a wide range of applications, such as machine translation, speech recognition, and text analysis. In recent years, there have been significant advances in NLP, driven by the development of new machine learning algorithms and the availability of large datasets. This paper surveys the latest advances in NLP and discusses some of the emerging trends in the field. We focus on the following topics: Machine learning for NLP: We review the latest machine learning algorithms that have been used for NLP, such as deep learning, reinforcement learning, and transfer learning. Large datasets for NLP: We discuss the importance of large datasets for training NLP models and the challenges of collecting and curating these datasets. Emerging trends in NLP: We discuss some of the emerging trends in NLP, such as multimodal NLP, zero-shot learning, and adversarial NLP. We conclude by discussing the future of NLP and the challenges that the field faces. We believe that NLP has the potential to revolutionize the way we interact with computers and the way we process information. However, there are also some challenges that need to be addressed, such as the lack of interpretability of NLP models and the need for more data.

APA, Harvard, Vancouver, ISO, and other styles

Singh, Ankit Kumar. "Desktop Assistant Based on NLP." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem34539.

Full text

Abstract:

Natural Language Processing (NLP) has emerged as a critical component of artificial intelligence, enabling machines to comprehend and interact with human language. This research paper explores the current state of the art in NLP, highlighting recent innovations, trends, and ongoing challenges. It delves into various applications of NLP, discusses the datasets and models that drive advancements, and examines the evaluation metrics used to assess NLP systems. Key innovations such as transformers, pre-trained language models, and transfer learning have revolutionized the field, leading to significant improvements in performance across a variety of tasks. Additionally, the paper addresses the growing emphasis on ethical AI and bias mitigation, as well as the integration of NLP with other AI technologies to create multimodal systems. Applications of NLP in text classification, sentiment analysis, machine translation, conversational agents, and information retrieval are thoroughly examined. The discussion extends to the critical role of benchmark datasets and pre-trained models in driving progress. Furthermore, the paper evaluates the effectiveness of various metrics used to measure the performance of NLP systems. Finally, the future prospects and potential research directions are considered, highlighting the ongoing efforts to push the boundaries of what NLP can achieve in an increasingly interconnected and data-driven world. Keywords Natural language processing . Natural language understanding . Natural language generation

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Multimodal NLP"

Nouri, Golmaei Sara. "Improving the Performance of Clinical Prediction Tasks by using Structured and Unstructured Data combined with a Patient Network." Thesis, 2021. http://dx.doi.org/10.7912/C2/41.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI) With the increasing availability of Electronic Health Records (EHRs) and advances in deep learning techniques, developing deep predictive models that use EHR data to solve healthcare problems has gained momentum in recent years. The majority of clinical predictive models benefit from structured data in EHR (e.g., lab measurements and medications). Still, learning clinical outcomes from all possible information sources is one of the main challenges when building predictive models. This work focuses mainly on two sources of information that have been underused by researchers; unstructured data (e.g., clinical notes) and a patient network. We propose a novel hybrid deep learning model, DeepNote-GNN, that integrates clinical notes information and patient network topological structure to improve 30-day hospital readmission prediction. DeepNote-GNN is a robust deep learning framework consisting of two modules: DeepNote and patient network. DeepNote extracts deep representations of clinical notes using a feature aggregation unit on top of a state-of-the-art Natural Language Processing (NLP) technique - BERT. By exploiting these deep representations, a patient network is built, and Graph Neural Network (GNN) is used to train the network for hospital readmission predictions. Performance evaluation on the MIMIC-III dataset demonstrates that DeepNote-GNN achieves superior results compared to the state-of-the-art baselines on the 30-day hospital readmission task. We extensively analyze the DeepNote-GNN model to illustrate the effectiveness and contribution of each component of it. The model analysis shows that patient network has a significant contribution to the overall performance, and DeepNote-GNN is robust and can consistently perform well on the 30-day readmission prediction task. To evaluate the generalization of DeepNote and patient network modules on new prediction tasks, we create a multimodal model and train it on structured and unstructured data of MIMIC-III dataset to predict patient mortality and Length of Stay (LOS). Our proposed multimodal model consists of four components: DeepNote, patient network, DeepTemporal, and score aggregation. While DeepNote keeps its functionality and extracts representations of clinical notes, we build a DeepTemporal module using a fully connected layer stacked on top of a one-layer Gated Recurrent Unit (GRU) to extract the deep representations of temporal signals. Independent to DeepTemporal, we extract feature vectors of temporal signals and use them to build a patient network. Finally, the DeepNote, DeepTemporal, and patient network scores are linearly aggregated to fit the multimodal model on downstream prediction tasks. Our results are very competitive to the baseline model. The multimodal model analysis reveals that unstructured text data better help to estimate predictions than temporal signals. Moreover, there is no limitation in applying a patient network on structured data. In comparison to other modules, the patient network makes a more significant contribution to prediction tasks. We believe that our efforts in this work have opened up a new study area that can be used to enhance the performance of clinical predictive models.

APA, Harvard, Vancouver, ISO, and other styles

Baria, Enrico. "Multimodal imaging for tissue diagnostics by combined two-photon and Raman microscopy." Doctoral thesis, 2018. http://hdl.handle.net/2158/1129455.

Full text

Abstract:

During my PhD, I designed and developed a custom-made multimodal microscope for combining Fluorescence Lifetime Imaging Microscopy (FLIM), Second Harmonic Generation (SHG), Two-Photon Excited Fluorescence (TPEF) and Raman microscopy for studying both tumour and non-tumour diseases. Then, I conducted three research studies based on the biomedical applications of these optical techniques. I used two-photon microscopy for examining ex vivo human carotids affected by atherosclerosis, and Raman spectroscopy for discriminating three in vitro melanoma cell lines. Moreover, I combined TPEF and Raman microscopy for studying two ex vivo “bulk” tissue types: human bladder affected by urothelial carcinoma (UC) and atherosclerotic aorta obtained from an animal model. Both two-photon and Raman microscopy proved to be valuable tools for analysing biological samples.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Multimodal NLP"

Kubsch, Marcus, Daniela Caballero, and Pablo Uribe. "Once More with Feeling: Emotions in Multimodal Learning Analytics." In The Multimodal Learning Analytics Handbook. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-08076-0_11.

Full text

Abstract:

AbstractThe emotions that students experience when engaging in tasks critically influence their performance and many models of learning and competence include assumptions about affective variables and respective emotions. However, while researchers agree about the importance of emotions for learning, it remains challenging to connect momentary affect, i.e., emotions, to learning processes. Advances in automated speech recognition and natural language processing (NLP) allow real time detection of emotions in recorded language. We use NLP and machine learning techniques to automatically extract information about students’ motivational states while engaging in the construction of explanations and investigate how this information can help more accurately predict students’ learning over the course of a 10-week energy unit. Our results show how NLP and ML techniques allow the use of different modalities of the same data in order to better understand individual differences in students’ performances. However, in realistic settings, this task remains far from trivial and requires extensive preprocessing of the data and the results need to be interpreted with care and caution. Thus, future research is needed before these methods can be deployed at scale.

APA, Harvard, Vancouver, ISO, and other styles

Johnson, David, Nick Dragojlovic, Nicola Kopac, et al. "EXPECT-NLP: An Integrated Pipeline and User Interface for Exploring Patient Preferences Directly from Patient-Generated Text." In Multimodal AI in Healthcare. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-14771-5_6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Liu, Yicheng. "Multimodal NLP and Artificial Intelligence: Cross-Media Information Understanding and Generation." In Advances in Social Science, Education and Humanities Research. Atlantis Press SARL, 2024. https://doi.org/10.2991/978-2-38476-327-6_24.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Modi, Sangita S., and Sudhir B. Jagtap. "Multimodal Web Content Mining to Filter Non-learning Sites Using NLP." In Lecture Notes on Data Engineering and Communications Technologies. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-24643-3_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Saba, N. S., Kumari Anjali, Akansha Tanu, Aryan Porwal, Ahan Tejaswi, and R. Sindhu Rajendran. "NLP in Social Media Data Processing." In Advances in Computational Intelligence and Robotics. IGI Global, 2025. https://doi.org/10.4018/979-8-3693-2935-1.ch007.

Full text

Abstract:

Natural Language Processing (NLP) represents a cross-disciplinary domain that merges computer science with linguistics, empowering machines to comprehend, interpret, and generate human language. Its significance extends to transformative applications like chatbots, language translation, and sentiment analysis, thereby reshaping the landscape of human-computer interaction. Within the context of social media data analysis, NLP is indispensable for gleaning valuable insights from unstructured text data present on platforms such as Twitter, Facebook and Instagram. NLP serves as a linchpin in sentiment analysis, unraveling emotions in posts and comments. It aids businesses in understanding public opinion, customer satisfaction, and brand perception. Additionally, NLP facilitates trend analysis, user engagement, and personalized marketing by dissecting user-generated content. A notable trend is the fusion of NLP with image and video processing, ushering in multimodal analysis. Explainable AI (XAI) enhances transparency in NLP models, contributing to accountable decision-making. NLP also plays an important role in combating misinformation on social media and assisting companies in refining their engagement strategies. The chapter explores the future of NLP in social media data analysis. It also involves advancements in contextual understanding, accuracy, and adaptability. It also includes incorporation of multimodal analysis and real-time data processing promises deeper insights and informed decision-making. The evolving landscape of NLP in social media analysis ensures refined analyses, responsible implementation, and informed decisions across diverse industries and societal contexts.

APA, Harvard, Vancouver, ISO, and other styles

Paolozzi, Stefano, Fernando Ferri, and Patrizia Grifoni. "Improving Multimedia Digital Libraries Usability Applying NLP Sentence Similarity to Multimodal Sentences." In Handbook of Research on Digital Libraries. IGI Global, 2009. http://dx.doi.org/10.4018/978-1-59904-879-6.ch022.

Full text

Abstract:

This chapter describes multimodality as a means of augmenting information retrieval activities in multimedia digital libraries. Multimodal interaction systems combine visual information with voice, gestures, and other modalities to provide flexible and powerful dialogue approaches. The use of integrated multiple input modes enables users to benefit from the natural approach used in human communication, improving usability of the systems. However, natural interaction approaches may introduce interpretation problems as the systems’ usability is directly proportional to users’ satisfaction. To improve multimedia digital library usability users can express their queries by means of a multimodal sentence. The authors proposes a new approach to match a multimodal sentence with a template stored in a knowledge base to interpret the multimodal sentence and define the multimodal templates similarity.

APA, Harvard, Vancouver, ISO, and other styles

de Hond, Anne, Marieke van Buchem, Claudio Fanconi, et al. "Predicting Depression Risk in Patients with Cancer Using Multimodal Data." In Caring is Sharing – Exploiting the Value in Data for Health and Innovation. IOS Press, 2023. http://dx.doi.org/10.3233/shti230274.

Full text

Abstract:

When patients with cancer develop depression, it is often left untreated. We developed a prediction model for depression risk within the first month after starting cancer treatment using machine learning and Natural Language Processing (NLP) models. The LASSO logistic regression model based on structured data performed well, whereas the NLP model based on only clinician notes did poorly. After further validation, prediction models for depression risk could lead to earlier identification and treatment of vulnerable patients, ultimately improving cancer care and treatment adherence.

APA, Harvard, Vancouver, ISO, and other styles

Samuthira Pandi, V., and Shobana D. "Incorporation of NLP techniques to facilitate intuitive user interactions with prosthetic devices." In The Role of Artificial Intelligence in Advanced Prosthetics and Implantable Devices. RADemics Research Institute, 2025. https://doi.org/10.71443/9789349552975-06.

Full text

Abstract:

The advancement of prosthetic technology has increasingly emphasized the integration of intelligent control systems to enhance user experience and functional adaptability. Traditional prosthetic devices rely on limited input modalities, often restricting their ability to provide intuitive and context-aware interactions. The incorporation of Natural Language Processing (NLP) techniques, combined with multimodal sensor fusion, offers a transformative approach to improving user interactions with prosthetic systems. By leveraging speech recognition, gesture control, biometric signals, and environmental data, prosthetic devices can achieve real-time adaptive behavior, enabling a seamless and natural communication framework. Multimodal NLP-driven systems enhance proprioceptive feedback, improve control accuracy, and facilitate adaptive learning based on user preferences and contextual variations. The integration of haptic, auditory, and visual feedback further optimizes the interaction loop, reducing cognitive load while improving precision and response efficiency. Additionally, emotion-aware prosthetics utilizing biometric data and sentiment analysis enable more personalized and human-like interactions, advancing the field of assistive technology. Despite the significant progress, challenges remain in real-time data processing, computational efficiency, and the seamless fusion of multimodal inputs. Future research must focus on optimizing deep learning architectures, developing low-latency processing techniques, and enhancing the energy efficiency of embedded systems to ensure practical deployment. The convergence of NLP, artificial intelligence, and multimodal feedback will define the next generation of intelligent prosthetic systems, significantly improving mobility, autonomy, and quality of life for individuals with limb loss.

APA, Harvard, Vancouver, ISO, and other styles

Jain, Raghav, Tulika Saha, and Sriparna Saha. "T-VAKS: A Tutoring-Based Multimodal Dialog System via Knowledge Selection." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2023. http://dx.doi.org/10.3233/faia230388.

Full text

Abstract:

Advancements in Conversational Natural Language Processing (NLP) have the potential to address critical social challenges, particularly in achieving the United Nations’ Sustainable Development Goal of quality education. However, the application of NLP in the educational domain, especially language learning, has been limited due to the inherent complexities of the field and the scarcity of available datasets. In this paper, we introduce T-VAKS (Tutoring Virtual Agent with Knowledge Selection), a novel language tutoring multimodal Virtual Agent (VA) designed to assist students in learning a new language, thereby promoting AI for Social Good. T-VAKS aims to bridge the gap between NLP and the educational domain, enabling more effective language tutoring through intelligent virtual agents. Our approach employs an information theory-based knowledge selection module built on top of a multimodal seq2seq generative model, facilitating the generation of appropriate, informative, and contextually relevant tutor responses. The knowledge selection module in turn consists of two sub-modules: (i) knowledge relevance estimation, and (ii) knowledge focusing framework. We evaluate the performance of our proposed end-to-end dialog system against various baseline models and the most recent state-of-the-art models, using multiple evaluation metrics. The results demonstrate that T-VAKS outperforms competing models, highlighting the potential of our approach in enhancing language learning through the use of conversational NLP and virtual agents, ultimately contributing to addressing social challenges and promoting well-being.

APA, Harvard, Vancouver, ISO, and other styles

Hidayatullah, Ahmad Fathan, Kassim Kalinaki, Haji Gul, Rufai Zakari Yusuf, and Wasswa Shafik. "Leveraging Natural Language Processing for Enhanced Text Analysis in Business Intelligence." In Advances in Computational Intelligence and Robotics. IGI Global, 2024. http://dx.doi.org/10.4018/979-8-3693-5288-5.ch006.

Full text

Abstract:

Business intelligence (BI) is crucial for informed decision-making, optimizing operations, and gaining a competitive edge. The rapid growth of unstructured text data has created a need for advanced text analysis techniques in BI. Natural language processing (NLP) is essential for analyzing unstructured textual data. This chapter covers foundational NLP techniques for text analysis, the role of text analysis in BI, and challenges and opportunities in this area. Real-world applications of NLP in BI demonstrate how organizations use NLP-driven text analysis to gain insights, improve customer experience, and anticipate market trends. Future directions and emerging trends, including multimodal learning, contextualized embeddings, conversational AI, explainable AI, federated learning, and knowledge graph integration, were explored. These advancements enhance the scalability, interpretability, and privacy of NLP-driven BI systems, enabling organizations to derive deeper insights and drive innovation in data-driven business landscapes.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multimodal NLP"

Bonnier, Thomas. "Error Detection for Multimodal Classification." In Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025). Association for Computational Linguistics, 2025. https://doi.org/10.18653/v1/2025.trustnlp-main.6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chiguru, Aparna, and Rajchandar K. "Revolutionizing NLP: Multimodal Integration for Enhanced Image-to-Text Extraction." In 2024 3rd International Conference on Computational Modelling, Simulation and Optimization (ICCMSO). IEEE, 2024. http://dx.doi.org/10.1109/iccmso61761.2024.00093.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Baghalizadeh-Moghadam, Neda, Frédéric Cuppens, and Nora Boulahia-Cuppens. "An NLP-Based Framework Leveraging Email and Multimodal User Data." In 22nd International Conference on Security and Cryptography. SCITEPRESS - Science and Technology Publications, 2025. https://doi.org/10.5220/0013524000003979.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wang, Jiawen, Longfei Zuo, Siyao Peng, and Barbara Plank. "MultiClimate: Multimodal Stance Detection on Climate Change Videos." In Proceedings of the Third Workshop on NLP for Positive Impact. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.nlp4pi-1.27.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Horawalavithana, Sameera, Sai Munikoti, Ian Stewart, Henry Kvinge, and Karl Pazdernik. "SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions." In Proceedings of the 1st Workshop on NLP for Science (NLP4Science). Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.nlp4science-1.7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gupta, Tanay, Tushar Goel, and Ishan Verma. "Exploring Multimodal Language Models for Sustainability Disclosure Extraction: A Comparative Study." In The Sixth Workshop on Insights from Negative Results in NLP. Association for Computational Linguistics, 2025. https://doi.org/10.18653/v1/2025.insights-1.13.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rawte, Vipula, Sarthak Jain, Aarush Sinha, et al. "ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models." In Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025). Association for Computational Linguistics, 2025. https://doi.org/10.18653/v1/2025.trustnlp-main.15.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Razzhigaev, Anton, Maxim Kurkin, Elizaveta Goncharova, et al. "OmniDialog: A Multimodal Benchmark for Generalization Across Text, Visual, and Audio Modalities." In Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.genbench-1.12.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Babu, Alby, Dharshini T, Gayathry Krishnan V.S, Ummu Haiman V.P, Annie Julie Joseph, and Rajesh K.R. "Multimodal Emotion Analysis Using Integrating NLP, AI, and Facial Expression Recognition for Enhanced Emotion Detection." In 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES). IEEE, 2024. https://doi.org/10.1109/spices62143.2024.10779754.

Full text

APA, Harvard, Vancouver, ISO, and other styles

P, Priyanka, and Balachander T. "Multimodal Fusion for Coherent Description Generation: A System Integrating NLP, Computer Vision, and Speech Recognition." In 2025 International Conference on Computational Robotics, Testing and Engineering Evaluation (ICCRTEE). IEEE, 2025. https://doi.org/10.1109/iccrtee64519.2025.11053045.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Multimodal NLP"

Pande, Vikram. Looking to Collaborate on ML Research (NLP / Multimodal AI). ResearchHub Technologies, Inc., 2025. https://doi.org/10.55277/researchhub.tr6juazr.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Multimodal NLP'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Multimodal NLP"

Dissertations / Theses on the topic "Multimodal NLP"

Book chapters on the topic "Multimodal NLP"

Conference papers on the topic "Multimodal NLP"

Reports on the topic "Multimodal NLP"