Log in

Relevant bibliographies by topics / Multimodal NLP / Journal articles

To see the other types of publications on this topic, follow the link: Multimodal NLP.

Journal articles on the topic 'Multimodal NLP'

Author: Grafiati

Published: 5 June 2025

Last updated: 16 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multimodal NLP.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Tiwari, Manisha, Pragati Khare, Ishani Saha, and Mahesh Mali. "Multimodal NLP for image captioning : Fusing text and image modalities for accurate and informative descriptions." Journal of Information and Optimization Sciences 45, no. 4 (2024): 1041–49. http://dx.doi.org/10.47974/jios-1626.

Full text

Abstract:

Multimodal Natural Language Processing (NLP) offers significant potential for improving the understanding and generation of content that combines various modalities, including text and images. Image captioning, automatically generating textual descriptions of images, represents a crucial application of multimodal NLP. While existing methods primarily rely on image features, we propose a novel multimodal NLP model that leverages the power of both text and image modalities for generating informative and accurate image captions. Our model incorporates information from text descriptions associated with images, enabling it to capture contextual cues and generate richer captions than traditional unimodal models. We validate our method using the industry standard Flickr8K dataset and obtain cutting-edge outcomes, proving the potency of our multimodal fusion technique. Furthermore, we discuss the challenges and opportunities of multimodal NLP for image captioning and highlight its potential to revolutionise how we interact with computers and interpret visual information.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Yingjie. "The current status and prospects of transformer in multimodality." Applied and Computational Engineering 11, no. 1 (2023): 224–30. http://dx.doi.org/10.54254/2755-2721/11/20230240.

Full text

Abstract:

At present, the attention mechanism represented by transformer has greatly promoted the development of natural language processing (NLP) and image processing (CV). However, in the multimodal field, the application of attention mechanism still mainly focuses on extracting the features of different types of data, and then fusing these features (such as text and image). With the increasing scale of the model and the instability of the Internet data, feature fusion has been difficult to solve the growing variety of multimodal problems for us, and the multimodal field has always lacked a model that can uniformly handle all types of data. In this paper, we first take the CV and NLP fields as examples to review various derived models of transformer. Then, based on the mechanism of word embedding and image embedding, we discuss how embedding with different granularity is handled uniformly under the attention mechanism in multimodal scenes. Further, we reveal that this mechanism will not only be limited to CV and NLP, but the real unified model will be able to handle tasks across data types through pre-training and fine tuning. Finally, on the specific implementation of the unified model, this paper lists several cases, and analyzes the valuable research directions in related fields.

APA, Harvard, Vancouver, ISO, and other styles

3

Manish Kumar Keshri. "The Integration of NLP and Computer Vision: Advanced Frameworks for Multi-Modal Content Understanding." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 11, no. 2 (2025): 2788–98. https://doi.org/10.32628/cseit25112708.

Full text

Abstract:

Content understanding systems leveraging Natural Language Processing (NLP) and Computer Vision (CV) have revolutionized how machines interpret and analyze multimodal information across diverse applications. This article explores the technologies driving advancements in content analysis, from text embedding techniques such as BERT to image and video representation methods including CNN-based approaches and Vision Transformers. It examines the challenges of processing diverse languages and regional contexts in a multimodal framework, alongside methodologies for collecting and preparing high-quality training data. The discussion covers various fusion architectures for integrating information across modalities, training approaches for multimodal classifiers, and evaluation frameworks to ensure model effectiveness. As these technologies continue to evolve, the integration of NLP and CV promises to unlock new possibilities for intelligent content understanding in an increasingly complex digital landscape.

APA, Harvard, Vancouver, ISO, and other styles

4

Hu, Qinrui. "Sentiment Analysis and Facial Expression Recognition in Customer Service Interactions." Frontiers in Business, Economics and Management 16, no. 3 (2024): 72–75. http://dx.doi.org/10.54097/tx980862.

Full text

Abstract:

In the evolving landscape of digital customer service, the need for advanced methods to accurately understand and respond to customer emotions has become critical. Traditional systems often rely solely on textual data, missing non-verbal cues that significantly contribute to the customer's emotional state. This study proposes a combined approach integrating Facial Expression Recognition (FER) and Natural Language Processing (NLP) to enhance emotion detection accuracy in customer service interactions. The FER component employs Convolutional Neural Networks (CNNs) to analyze facial expressions, while the NLP component uses Long Short-Term Memory (LSTM) networks to process textual data. This multimodal system aims to provide a comprehensive understanding of customer emotions by capturing both verbal and non-verbal cues. Experiments demonstrate that the integrated FER and NLP model significantly outperforms standalone models, achieving an accuracy of 92.3%, compared to 85.2% for FER-only and 87.4% for NLP-only models. The results highlight the benefits of a multimodal approach, showing substantial improvements in both training and validation performance. This study also compares the proposed model with other state-of-the-art models such as the Deep Learning Assisted Semantic Text Analysis (DLSTA) and Multimodal Emotion Recognition using Deep Belief Networks (DBN). While DLSTA achieves higher accuracy in text-based emotion detection, and DBNs provide robust emotion classification by integrating various modalities, our model effectively balances the strengths of both visual and textual data. The findings suggest that integrating FER and NLP can significantly enhance the quality of customer service by enabling more empathetic and effective interactions. Future work will focus on optimizing computational efficiency, addressing data variability, and ensuring adaptability across diverse customer service scenarios.

APA, Harvard, Vancouver, ISO, and other styles

5

Researcher. "UNDERSTANDING NATURAL LANGUAGE PROCESSING (NLP) TECHNIQUES." International Journal of Computer Engineering and Technology (IJCET) 15, no. 4 (2024): 527–36. https://doi.org/10.5281/zenodo.13311223.

Full text

Abstract:

Natural Language Processing (NLP) is a rapidly evolving field at the intersection of artificial intelligence, linguistics, and cognitive psychology. This article provides a comprehensive overview of NLP, exploring its core techniques, wide-ranging applications, and future directions. We delve into key NLP methods such as sentiment analysis, language generation, and named entity recognition, examining their underlying mechanisms and diverse applications. The impact of NLP across various sectors, including virtual assistants, translation services, healthcare, finance, and education, is thoroughly discussed. Despite significant advancements, NLP faces challenges in handling language ambiguity, multilingual processing, and ethical considerations. Looking ahead, the field is poised for further innovation in model efficiency, interpretability, multimodal integration, and commonsense reasoning. This review underscores NLP's transformative potential in reshaping human-computer interaction and information processing in the digital age.

APA, Harvard, Vancouver, ISO, and other styles

6

Researcher. "UNDERSTANDING NATURAL LANGUAGE PROCESSING (NLP) TECHNIQUES." International Journal of Research In Computer Applications and Information Technology (IJRCAIT) 15, no. 6 (2024): 1221–31. https://doi.org/10.5281/zenodo.14359554.

Full text

Abstract:

Artificial intelligence, linguistics, and cognitive psychology all come together in Natural Language Processing (NLP), an area that is changing quickly. This article discusses NLP's main methods, practical uses, and possible future developments. Key NLP techniques like sentiment analysis, language generation, and named object recognition are studied in depth and have various uses. NLP's Significant effects are discussed in many areas, such as virtual helpers, translation services, healthcare, finance, and education. NLP has come quite a way, but it still has issues with unclear words, understanding more than one language, and moral issues. More improvements will be made to model speed, interpretability, multimodal integration, and using common sense in the future. 

APA, Harvard, Vancouver, ISO, and other styles

7

Fan, Yuhan. "Research progress and challenges of deep learning in Natural Language Processing." Advances in Engineering Innovation 16, no. 6 (2025): None. https://doi.org/10.54254/2977-3903/2025.24550.

Full text

Abstract:

With the rapid development of artificial intelligence, Natural Language Processing (NLP) has emerged as a critical area for enabling intelligent human-computer interaction. This paper reviews key deep learning technologies and their applications in NLP. It first examines foundational techniques such as word embeddings and pre-trained models, and analyzes the structures and use cases of core models includingConvolutional Neural Networks(CNNs),Recurrent Neural Networks(RNNs) and their variants, as well as Transformers. It then explores the application of these models in tasks such as sentiment analysis, machine translation, and question-answering systems. The study highlights how pre-trained models like BERT and GPT significantly enhance semantic understanding through large-scale unsupervised learning. However, challenges remain, including limited interpretability, weak performance in low-resource languages, and inadequate multimodal integration. The paper concludes by discussing future directions such as lightweight model design, cross-lingual transfer learning, and deep multimodal fusion. This research aims to provide theoretical references for advancing NLP technology and enhancing its practicality across various domains.

APA, Harvard, Vancouver, ISO, and other styles

8

Wang, Bin, Chunyu Xie, Dawei Leng, and Yuhui Yin. "IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 20 (2025): 21035–43. https://doi.org/10.1609/aaai.v39i20.35400.

Full text

Abstract:

In the field of multimodal large language models (MLLMs), common methods typically involve unfreezing the language model during training to foster profound visual understanding. However, the fine-tuning of such models with vision-language data often leads to a diminution of their natural language processing (NLP) capabilities. To avoid this performance degradation, a straightforward solution is to freeze the language model while developing multimodal competencies. Unfortunately, previous works have not attained satisfactory outcomes. Building on the strategy of freezing the language model, we conduct thorough structural exploration and introduce the Inner-Adaptor Architecture (IAA). Specifically, the architecture incorporates multiple multimodal adaptors at varying depths within the large language model to facilitate direct interaction with the inherently text-oriented transformer layers, thereby enabling the frozen language model to acquire multimodal capabilities. Unlike previous approaches of freezing language models that require large-scale aligned data, our proposed architecture is able to achieve superior performance on small-scale datasets. We conduct extensive experiments to improve the general multimodal capabilities and visual grounding abilities of the MLLM. Our approach remarkably outperforms previous state-of-the-art methods across various vision-language benchmarks without sacrificing performance on NLP tasks. Code and models will be released.

APA, Harvard, Vancouver, ISO, and other styles

9

Mrs., Nagarathnamma S. M. "The Future of Natural Language Processing: A Survey of Recent Advances and Emerging Trends." Journal of Scholastic Engineering Science and Management 2, no. 6 (2023): 26–35. https://doi.org/10.5281/zenodo.8243058.

Full text

Abstract:

  <strong>Natural language processing (NLP) is a rapidly growing field with a wide range of applications, such as machine translation, speech recognition, and text analysis. In recent years, there have been significant advances in NLP, driven by the development of new machine learning algorithms and the availability of large datasets. This paper surveys the latest advances in NLP and discusses some of the emerging trends in the field. We focus on the following topics:</strong> <strong>Machine learning for NLP: We review the latest machine learning algorithms that have been used for NLP, such as deep learning, reinforcement learning, and transfer learning.</strong> <strong>Large datasets for NLP: We discuss the importance of large datasets for training NLP models and the challenges of collecting and curating these datasets.</strong> <strong>Emerging trends in NLP: We discuss some of the emerging trends in NLP, such as multimodal NLP, zero-shot learning, and adversarial NLP.</strong> <strong>We conclude by discussing the future of NLP and the challenges that the field faces. We believe that NLP has the potential to revolutionize the way we interact with computers and the way we process information. However, there are also some challenges that need to be addressed, such as the lack of interpretability of NLP models and the need for more data.</strong>

APA, Harvard, Vancouver, ISO, and other styles

10

Singh, Ankit Kumar. "Desktop Assistant Based on NLP." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem34539.

Full text

Abstract:

Natural Language Processing (NLP) has emerged as a critical component of artificial intelligence, enabling machines to comprehend and interact with human language. This research paper explores the current state of the art in NLP, highlighting recent innovations, trends, and ongoing challenges. It delves into various applications of NLP, discusses the datasets and models that drive advancements, and examines the evaluation metrics used to assess NLP systems. Key innovations such as transformers, pre-trained language models, and transfer learning have revolutionized the field, leading to significant improvements in performance across a variety of tasks. Additionally, the paper addresses the growing emphasis on ethical AI and bias mitigation, as well as the integration of NLP with other AI technologies to create multimodal systems. Applications of NLP in text classification, sentiment analysis, machine translation, conversational agents, and information retrieval are thoroughly examined. The discussion extends to the critical role of benchmark datasets and pre-trained models in driving progress. Furthermore, the paper evaluates the effectiveness of various metrics used to measure the performance of NLP systems. Finally, the future prospects and potential research directions are considered, highlighting the ongoing efforts to push the boundaries of what NLP can achieve in an increasingly interconnected and data-driven world. Keywords Natural language processing . Natural language understanding . Natural language generation

APA, Harvard, Vancouver, ISO, and other styles

11

Wang, Xurui. "The application of NLP in information retrieval." Applied and Computational Engineering 42, no. 1 (2024): 290–97. http://dx.doi.org/10.54254/2755-2721/42/20230795.

Full text

Abstract:

The field of Natural Language Processing (NLP) has experienced impressive advancements and has found diverse applications. This paper presents a comprehensive review of the development of NLP in the field of information retrieval. It explores different stages of NLP techniques and methods, including keyword matching, rule-based approaches, statistical methods, and the utilization of machine learning and deep learning technologies. Furthermore, the paper provides detailed insights into the specific applications of NLP in domains such as academic information retrieval, medical information retrieval, travel information retrieval, and e-commerce information retrieval. It analyzes the current state of NLP applications in these domains, highlights their advantages, and discusses their associated limitations. Finally, the paper emphasizes the continuous advancement of the NLP field, with a particular focus on semantic understanding, personalized retrieval, and multimodal information retrieval, to better adapt to diverse data types and user requirements. The paper concludes by summarizing the main points discussed and providing future directions.

APA, Harvard, Vancouver, ISO, and other styles

12

Malik, Dr Pankaj. "The integration of Natural Language Processing (NLP) in Human-Robot Interaction (HRI) represents a significant advancement towards achieving more natural and effective communication between humans and robots. This research explores the application of state-of-the-art NLP techniques to enhance HRI, focusing on improving robots' abilities to understand and generate human language. Key components of our approach include advanced speech recognition, natural language understanding (NLU), dialogue management, and natural language generation (NLG). We designed and implemented an HRI system that leverages models such as BERT for language understanding and GPT-3 for generating contextually appropriate responses. Our methodology involves integrating these NLP models with a robotics platform, ensuring real-time interaction capabilities while maintaining a high level of accuracy and context awareness. The system was evaluated through a series of user studies, measuring performance metrics such as accuracy, latency, and user satisfaction. Results indicate that our NLP-enhanced HRI system significantly improves the quality of interactions, demonstrating superior understanding and responsiveness compared to traditional systems. This paper discusses the implementation challenges, including computational constraints and ambiguity resolution, and provides insights into user feedback and system performance. Future work will focus on enhancing context management, exploring multimodal interaction, and addressing ethical considerations in deploying advanced HRI systems. Our findings underscore the potential of NLP to transform human-robot communication, paving the way for more intuitive and effective robotic assistants in various domains. Keywords: Human-Robot Interaction (HRI), Natural Language Processing (NLP), Conversational AI, Speech Recognition, Natural Language Understanding (NLU), Natural Language Generation (NLG), Multimodal Interaction, Dialogue Systems, Context Awareness, Emotion Recognition, Machine Learning in HRI, Personalized Interaction, User Experience (UX) in HRI, Human-Centered Design, Collaborative Robots (Cobots)." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 06 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem35803.

Full text

Abstract:

The integration of Natural Language Processing (NLP) in Human-Robot Interaction (HRI) represents a significant advancement towards achieving more natural and effective communication between humans and robots. This research explores the application of state-of-the-art NLP techniques to enhance HRI, focusing on improving robots' abilities to understand and generate human language. Key components of our approach include advanced speech recognition, natural language understanding (NLU), dialogue management, and natural language generation (NLG). We designed and implemented an HRI system that leverages models such as BERT for language understanding and GPT-3 for generating contextually appropriate responses. Our methodology involves integrating these NLP models with a robotics platform, ensuring real-time interaction capabilities while maintaining a high level of accuracy and context awareness. The system was evaluated through a series of user studies, measuring performance metrics such as accuracy, latency, and user satisfaction. Results indicate that our NLP-enhanced HRI system significantly improves the quality of interactions, demonstrating superior understanding and responsiveness compared to traditional systems. This paper discusses the implementation challenges, including computational constraints and ambiguity resolution, and provides insights into user feedback and system performance. Future work will focus on enhancing context management, exploring multimodal interaction, and addressing ethical considerations in deploying advanced HRI systems. Our findings underscore the potential of NLP to transform human-robot communication, paving the way for more intuitive and effective robotic assistants in various domains. Keywords: Human-Robot Interaction (HRI), Natural Language Processing (NLP), Conversational AI, Speech Recognition, Natural Language Understanding (NLU), Natural Language Generation (NLG), Multimodal Interaction, Dialogue Systems, Context Awareness, Emotion Recognition, Machine Learning in HRI, Personalized Interaction, User Experience (UX) in HRI, Human-Centered Design, Collaborative Robots (Cobots)

APA, Harvard, Vancouver, ISO, and other styles

13

Tulsyan, Ansh. "Personality Prediction Model : Using Multimodal Data." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 01 (2025): 1–9. https://doi.org/10.55041/ijsrem40458.

Full text

Abstract:

This research introduces a model, for self-prediction that combines natural language processing (NLP) visual analysis and behavioural analysis to analyse data such as anchor means, language usage, visual content and personality. To effectively assess behaviour the model incorporates NLP algorithms for speech and computer analysis to understand issues. An important aspect of this model is its design, which prioritizes user privacy, data security and compliance with privacy standards. Manufactured insights (AI)-driven identity forecast models give a novel strategy for comprehending and anticipating human conduct. These models look at a extend of information sources, such as organizing destinations, fitness exams, and discourse designs. They are based on mental thoughts such as the five major identity characteristics. In arrange to distinguish designs and associations in expansive datasets and progress estimate precision and understanding, such calculations depend on fake insights methods. Potential employments for AI incorporate working environment administration, promoting, and mental wellness. In any case, moral issues like protection and predisposition hazard emerge with AI's potential for identity expectation. Person assent and rigid confinements must control the utilization of individual information. It is expected that identity expectation will see more advanced and exact models within the future, clearing the way for more personalized intuitive in different areas. Keywords— Multimodal Data Integration, Advanced Machine Learning, Ethical Design Framework, Personality Assessment Accuracy, Scalable

APA, Harvard, Vancouver, ISO, and other styles

14

SIDDIQUI, SADIYA MAHEEN. "TruthLens: AI-Powered Fake News and Misinformation Detection Using Multimodal Analysis." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 06 (2025): 1–7. https://doi.org/10.55041/ijsrem.ncft030.

Full text

Abstract:

Abstract—The spread of misinformation and fake news poses a significant challenge in today’s digital landscape. This research presents TruthLens, an AI-powered framework integrating Nat- ural Language Processing (NLP), Computer Vision (CV), and Fact-Checking APIs to identify and mitigate misinformation. Our system leverages machine learning models for textual analysis, deep learning-based image/video forensics, and web scraping techniques for real-time verification. The credibility scores are evaluated using TF-IDF with LinearSVC, BERT, RoBERTa, CNNs for manipulated media, and Google Fact-Check API, achieving a robust, multi-modal detection pipeline. Index Terms—Misinformation Detection, Fake News, Artificial Intelligence (AI), Natural Language Processing (NLP), Computer Vision (CV), Fact-Checking APIs, Machine Learning, Deep Learning, Media Manipulation Detection, Multi-Modal Detection

APA, Harvard, Vancouver, ISO, and other styles

15

Liao, Katherine P., Jiehuan Sun, Tianrun A. Cai, et al. "High-throughput multimodal automated phenotyping (MAP) with application to PheWAS." Journal of the American Medical Informatics Association 26, no. 11 (2019): 1255–62. http://dx.doi.org/10.1093/jamia/ocz066.

Full text

Abstract:

Abstract Objective Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Materials and Methods We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. Results The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. Conclusion The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.

APA, Harvard, Vancouver, ISO, and other styles

16

Warveen, Merza Eido, and Mahmood Ibrahim Ibrahim. "Analyzing Textual Data in Behavioral Science with Natural Language Processing." Engineering and Technology Journal 10, no. 04 (2025): 4365–85. https://doi.org/10.5281/zenodo.15125219.

Full text

Abstract:

: Natural Language Processing (NLP) has emerged as a breakthrough technique in behavioral science, enabling researchers to examine large-scale textual data to acquire insights into human cognition, emotions, and social interactions. Traditional behavioral research methods frequently rely on manual analysis, which is time-consuming and prone to biases. NLP improves the precision and scalability of behavioral research by automating this process through sentiment analysis, topic modeling, and deep learning techniques. Its applications extend to mental health monitoring, education, social media analysis, and healthcare, with studies demonstrating its effectiveness in detecting depression, analyzing public discourse, and improving clinical decision-making. However, challenges remain such as data bias, ethical concerns, privacy issues, and the interpretability of NLP models. Future research should focus on developing interpretable AI models, integrating multimodal data sources, and improving privacy-preserving techniques to ensure responsible and ethical application of NLP in behavioral science. Addressing these challenges will allow NLP to bridge the gap between qualitative and quantitative research, and revolutionize the way human behavior is studied and understood.  

APA, Harvard, Vancouver, ISO, and other styles

17

Oluwatosin Agbaakin and Verseo’ter Iyorkar. "Transforming global health through multimodal deep learning: Integrating NLP and predictive modelling for disease surveillance and prevention." World Journal of Advanced Research and Reviews 24, no. 3 (2024): 095–114. https://doi.org/10.30574/wjarr.2024.24.3.3673.

Full text

Abstract:

The integration of multimodal deep learning (DL) approaches in global health represents a transformative advancement in disease surveillance and prevention. The complexity of modern public health challenges, including emerging infectious diseases, healthcare disparities, and resource constraints, necessitates innovative tools that can analyse and interpret diverse data sources in real time. Multimodal DL combines natural language processing (NLP) and predictive Modelling to bridge the gap between structured data, such as case numbers and hospital resources, and unstructured data, including epidemiological reports, social media, and clinical notes. By doing so, it provides comprehensive insights for early outbreak detection, resource allocation, and risk management. NLP enables the extraction of actionable information from diverse unstructured datasets, facilitating the identification of disease patterns and potential outbreaks from news articles, public health bulletins, and social media trends. Predictive models, on the other hand, excel in forecasting disease spread, estimating healthcare demand, and optimizing resource distribution. Together, these technologies empower decision-makers with real-time, actionable insights, enhancing public health preparedness and response capabilities. This paper explores the applications of multimodal DL in disease surveillance, focusing on its role in integrating diverse data modalities for actionable insights. It highlights case studies demonstrating the success of AI-driven tools in mitigating outbreaks and improving healthcare resource management. Additionally, the study discusses ethical, social, and technical challenges, offering recommendations for scaling these systems globally. The adoption of multimodal DL can significantly advance public health strategies, ensuring a more resilient and equitable healthcare system.

APA, Harvard, Vancouver, ISO, and other styles

18

Oluwatosin, Agbaakin, and Iyorkar Verseo'ter. "Transforming global health through multimodal deep learning: Integrating NLP and predictive modelling for disease surveillance and prevention." World Journal of Advanced Research and Reviews 24, no. 3 (2024): 095–114. https://doi.org/10.5281/zenodo.15148671.

Full text

Abstract:

The integration of multimodal deep learning (DL) approaches in global health represents a transformative advancement in disease surveillance and prevention. The complexity of modern public health challenges, including emerging infectious diseases, healthcare disparities, and resource constraints, necessitates innovative tools that can analyse and interpret diverse data sources in real time. Multimodal DL combines natural language processing (NLP) and predictive Modelling to bridge the gap between structured data, such as case numbers and hospital resources, and unstructured data, including epidemiological reports, social media, and clinical notes. By doing so, it provides comprehensive insights for early outbreak detection, resource allocation, and risk management. NLP enables the extraction of actionable information from diverse unstructured datasets, facilitating the identification of disease patterns and potential outbreaks from news articles, public health bulletins, and social media trends. Predictive models, on the other hand, excel in forecasting disease spread, estimating healthcare demand, and optimizing resource distribution. Together, these technologies empower decision-makers with real-time, actionable insights, enhancing public health preparedness and response capabilities. This paper explores the applications of multimodal DL in disease surveillance, focusing on its role in integrating diverse data modalities for actionable insights. It highlights case studies demonstrating the success of AI-driven tools in mitigating outbreaks and improving healthcare resource management. Additionally, the study discusses ethical, social, and technical challenges, offering recommendations for scaling these systems globally. The adoption of multimodal DL can significantly advance public health strategies, ensuring a more resilient and equitable healthcare system.  

APA, Harvard, Vancouver, ISO, and other styles

19

Wu, Te-Lin, Shikhar Singh, Sayan Paul, Gully Burns, and Nanyun Peng. "MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (2021): 14076–84. http://dx.doi.org/10.1609/aaai.v35i16.17657.

Full text

Abstract:

We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeNt methoD clAssification. The dataset is collected in a fully automated distant supervision manner, where the labels are obtained from an existing curated database, and the actual contents are extracted from papers associated with each of the records in the database. We benchmark various state-of-the-art NLP and computer vision models, including unimodal models which only take either caption texts or images as inputs, and multimodal models. Extensive experiments and analysis show that multimodal models, despite outperforming unimodal ones, still need improvements especially on a less-supervised way of grounding visual concepts with languages, and better transferability to low resource domains. We release our dataset and the benchmarks to facilitate future research in multimodal learning, especially to motivate targeted improvements for applications in scientific domains.

APA, Harvard, Vancouver, ISO, and other styles

20

Ghafoor, Abdul, Sidra Norren, Anosh Fatima, and Hoda Ezz Abdel Hakim Mahmoud. "Cross-cultural emotion recognition in AI: Enhancing multimodal NLP for empathetic interaction." Social Sciences Spectrum 4, no. 2 (2025): 575–88. https://doi.org/10.71085/sss.04.02.295.

Full text

Abstract:

It investigates using cross-cultural understanding of emotions and empathy to make HCI better. Using techniques such as NLP, examining text, sound, and visuals, along with transformer models, the research enables AI to identify emotions. The system was most accurate in identifying both positive and neutral emotions but struggled slightly in detecting anger or sadness. Contextual and organized answers were generated by the empathetic response module, which achieved an average of 4.3/5 in empathy metrics. There are still difficulties in evoking strong emotions in audiences, especially when it comes to portraying complex emotions. The research emphasizes that AI systems may fail to recognize certain emotions if they are not designed to detect diverse cultural expressions of emotions. Topics related to the privacy of emotional data and problems with algorithm bias are openly discussed, highlighting the need for open and responsible work on AI. Study results contribute to building AI that understands emotions, which helps users in industries such as healthcare, education, and service, and also supports cultural understanding and ethical design in AI.

APA, Harvard, Vancouver, ISO, and other styles

21

Papti, Mr Madhu Kumar. "Multimodal Content Analysis Using Deep Learning." International Journal for Research in Applied Science and Engineering Technology 12, no. 5 (2024): 564–69. http://dx.doi.org/10.22214/ijraset.2024.61566.

Full text

Abstract:

Abstract: The multimodal content analysis platform combines sentiment analysis and neural style transfertechniques to process and improve various types of digital content. The sentiment analysis module utilizes natural language processing (NLP) algorithms, such as recurrent neural networks (RNNs) or transformer models like BERT, to extract emotional signals from textual, visual, and auditory inputs. Signals are classified into predefined sentiment categories, providing granular insights into the emotional context of the content. The platform employs neural style transfer algorithms, such as style transfer networks (NSTNs) or generative adversarial networks (GANs), to transfer stylistic attributes between texts. By training on a diverse range of artistic styles, the system learns to apply these styles to input text while preserving semantic meaning. This process enhances the visual representation of textual content, making it more appealing and engaging to users.

APA, Harvard, Vancouver, ISO, and other styles

22

NISHANTH JOSEPH PAULRAJ. "Natural Language Processing on Clinical Notes: Advanced Techniques for Risk Prediction and Summarization." Journal of Computer Science and Technology Studies 7, no. 3 (2025): 494–502. https://doi.org/10.32996/jcsts.2025.7.3.56.

Full text

Abstract:

This article explores the application of Natural Language Processing (NLP) techniques to clinical notes, focusing specifically on risk prediction and automated summarization capabilities. Healthcare institutions generate vast amounts of unstructured clinical text that contains critical information not captured in structured data fields. It examines how modern NLP approaches, including named entity recognition, text classification, and clinical summarization, can extract actionable insights from narrative documentation. It discusses specialized language models like BioBERT, ClinicalBERT, and Med-PaLM that have been optimized for clinical text processing, along with implementation tools such as ScispaCy and Hugging Face Transformers. Practical applications with demonstrated efficacy include risk prediction from clinical notes and adverse drug reaction detection. It explores how the MIMIC datasets provide valuable resources for developing and evaluating these approaches. The article also addresses future directions and challenges in multimodal clinical AI integration, explainability and trust in clinical NLP systems, and privacy and security considerations when working with sensitive clinical text. Overall, this comprehensive review highlights how advanced NLP techniques offer transformative capabilities for extracting clinical intelligence from unstructured documentation.

APA, Harvard, Vancouver, ISO, and other styles

23

J.Sravanthi, Charan Teja Karupalli, Deepthi Gullipalli, Ashika Gundlatati, and Bhargav Bandaru. "A Multimodal Translation Interface for Sign Language." International Journal for Modern Trends in Science and Technology 11, no. 04 (2025): 17–23. https://doi.org/10.5281/zenodo.15108955.

Full text

Abstract:

This project report describes the development of a “BSL Video Carousel for Sign language” application—a multimodal system designed to facilitate the translation of gestures, spoken language, and text into British Sign Language (BSL) videos. The system leverages computer vision for real-time gesture recognition using a webcam, speech-to-text transcription for voice inputs, and natural language processing (NLP) techniques to simplify and convert textual inputs into a list of essential BSL keywords. These keywords are then used to retrieve corresponding sign videos from an online repository. This report outlines both the existing approaches and the proposed solution, provides an in-depth discussion of the implementation architecture, and evaluates the system’s performance and limitations.

APA, Harvard, Vancouver, ISO, and other styles

24

Qi, Qingfu, Liyuan Lin, and Rui Zhang. "Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis." Information 12, no. 9 (2021): 342. http://dx.doi.org/10.3390/info12090342.

Full text

Abstract:

Multimodal sentiment analysis and emotion recognition represent a major research direction in natural language processing (NLP). With the rapid development of online media, people often express their emotions on a topic in the form of video, and the signals it transmits are multimodal, including language, visual, and audio. Therefore, the traditional unimodal sentiment analysis method is no longer applicable, which requires the establishment of a fusion model of multimodal information to obtain sentiment understanding. In previous studies, scholars used the feature vector cascade method when fusing multimodal data at each time step in the middle layer. This method puts each modal information in the same position and does not distinguish between strong modal information and weak modal information among multiple modalities. At the same time, this method does not pay attention to the embedding characteristics of multimodal signals across the time dimension. In response to the above problems, this paper proposes a new method and model for processing multimodal signals, which takes into account the delay and hysteresis characteristics of multimodal signals across the time dimension. The purpose is to obtain a multimodal fusion feature emotion analysis representation. We evaluate our method on the multimodal sentiment analysis benchmark dataset CMU Multimodal Opinion Sentiment and Emotion Intensity Corpus (CMU-MOSEI). We compare our proposed method with the state-of-the-art model and show excellent results.

APA, Harvard, Vancouver, ISO, and other styles

25

Niu, Shuo, D. Scott McCrickard, Timothy L. Stelter, Alan Dix, and G. Don Taylor. "Reorganize Your Blogs: Supporting Blog Re-visitation with Natural Language Processing and Visualization." Multimodal Technologies and Interaction 3, no. 4 (2019): 66. http://dx.doi.org/10.3390/mti3040066.

Full text

Abstract:

Temporally-connected personal blogs contain voluminous textual content, presenting challenges in re-visiting and reflecting on experiences. Other data repositories have benefited from natural language processing (NLP) and interactive visualizations (VIS) to support exploration, but little is known about how these techniques could be used with blogs to present experiences and support multimodal interaction with blogs, particularly for authors. This paper presents the effect of reorganization—reorganizing the large blog set with NLP and presenting abstract topics with VIS—to support novel re-visitation experiences to blogs. The BlogCloud tool, a blog re-visitation tool that reorganizes blog paragraphs around user-searched keywords, implements reorganization and similarity-based content grouping. Through a public use session with bloggers who wrote about extended hikes, we observed the effect of NLP-based reorganization in delivering novel re-visitation experiences. Findings suggest that the re-presented topics provide new reflection materials and re-visitation paths, enabling interaction with symbolic items in memory.

APA, Harvard, Vancouver, ISO, and other styles

26

Chandra Sekar Nandanavanam. "The convergence of human language and computing: NLP as the bridge to intuitive interaction." World Journal of Advanced Engineering Technology and Sciences 15, no. 3 (2025): 2080–87. https://doi.org/10.30574/wjaets.2025.15.3.1081.

Full text

Abstract:

This article explores the multifaceted evolution and applications of Natural Language Processing (NLP) as the critical bridge between human language and computing systems. Beginning with foundational definitions and historical developments, the article traces NLP's progression from early rule-based systems to contemporary neural architectures. It delves into essential techniques including text preprocessing, syntactic and semantic frameworks, and machine learning methodologies that form the technical foundation of modern language processing. The article extends to the transformative impact of NLP on human-computer interfaces, chronicling the transition from command-line to graphical and now conversational paradigms, with particular attention to accessibility improvements. Contemporary applications are thoroughly assessed, including virtual assistants, customer service platforms, and multilingual communication tools that have reshaped digital interaction. The article concludes by examining future directions and challenges facing NLP development, with critical focus on ethical considerations, contextual understanding limitations, and the promising frontier of multimodal integration that will define the next generation of language technologies.

APA, Harvard, Vancouver, ISO, and other styles

27

Veluswamy, Anusha Sowbarnika, Nagamani A, SilpaRaj M, Yobu D, Ashwitha M, and Mangaiyarkaras V. "Natural Language Processing for Sentiment Analysis in Socialmedia Techniques and Case Studies." ITM Web of Conferences 76 (2025): 05004. https://doi.org/10.1051/itmconf/20257605004.

Full text

Abstract:

Social media platforms have become a significant medium for expressing opinions, emotions, and sentiments, making sentiment analysis a crucial task in Natural Language Processing (NLP). While various sentiment analysis techniques have been proposed, existing studies often face challenges such as language dependency, platform-specific biases, lack of real-time processing, and limited multimodal analysis. This research explores the evolution of sentiment analysis in social media by leveraging cutting-edge NLP techniques, including transformer-based models (BERT, RoBERTa, GPT) and multimodal approaches. By addressing the limitations of previous studies, our research proposes a real-time, multilingual, and cross-platform sentiment analysis model capable of analyzing textual, audio, and visual content from diverse social media platforms (e.g., Twitter, Facebook, Instagram, and TikTok). Additionally, this study investigates the effectiveness of domain-specific sentiment analysis (e.g., political discourse, health-related discussions) to improve sentiment classification in specialized contexts. Benchmark datasets and experimental validation will be used to compare existing sentiment analysis models with our proposed approach. Our findings aim to enhance scalability, accuracy, and real-time adaptability of sentiment analysis in social media applications, ultimately contributing to improved decision-making in social monitoring, brand analysis, and crisis management.

APA, Harvard, Vancouver, ISO, and other styles

28

Chen, Yanhan, Hanxuan Wang, Kaiwen Yu, and Ruoshui Zhou. "Artificial Intelligence Methods in Natural Language Processing: A Comprehensive Review." Highlights in Science, Engineering and Technology 85 (March 13, 2024): 545–50. http://dx.doi.org/10.54097/vfwgas09.

Full text

Abstract:

The rapid evolution of Artificial Intelligence (AI) since its inception in the mid-20th century has significantly influenced the field of Natural Language Processing (NLP), transforming it from a rule-based system to a dynamic and adaptive model capable of understanding the complexities of human language. This paper aims to offer a comprehensive review of the various applications and methodologies of AI in NLP, serving as a detailed guide for future research and practical applications. In the early sections, the paper elucidates the indispensable role of AI in NLP, highlighting its transition from symbolic reasoning to a focus on machine learning and deep learning, and its extensive applications in sectors such as healthcare, transportation, and finance. It emphasizes the symbiotic relationship between AI and NLP, facilitated by platforms like AllenNLP, which aid in the development of advanced language understanding models. Further, the paper explores specific AI techniques employed in NLP, including machine learning, Naive Bayes, and Support Vector Machines, and identifies pressing challenges and avenues for future research. It delves into the applications of AI in NLP, showcasing its transformative potential in tasks such as machine translation, facilitated by deep learning methods, and the development of chatbots and virtual assistants that have revolutionized human-technology interaction. The paper also highlights other fields impacted by AI techniques, including text summarization, sentiment analysis, and named entity recognition, emphasizing the efficiency and accuracy brought about by the integration of AI in these areas. In conclusion, the paper summarizes the remarkable advancements and persistent challenges in NLP, such as language ambiguity and contextual understanding, and underscores the need for diverse and representative labeled data for training. Looking forward, it identifies promising research avenues including Explainable AI, Few-shot and Zero-shot Learning, and the integration of NLP with other data modalities, aiming for a holistic understanding of multimodal data. The paper calls for enhanced robustness and security in NLP systems, especially in sensitive applications like content moderation and fake news detection, to foster trust and reliability in AI technologies. It advocates for continual learning in NLP models to adapt over time without losing previously acquired knowledge, paving the way for a future where AI and NLP work synergistically to understand and generate human language more effectively and efficiently.

APA, Harvard, Vancouver, ISO, and other styles

29

Alkaabi, Hussein, Ali Kadhim Jasim, and Ali Darroudi. "From Static to Contextual: A Survey of Embedding Advances in NLP." PERFECT: Journal of Smart Algorithms 2, no. 2 (2025): 57–66. https://doi.org/10.62671/perfect.v2i2.77.

Full text

Abstract:

Embedding techniques have been a cornerstone of Natural Language Processing (NLP), enabling machines to represent textual data in a form that captures semantic and syntactic relationships. Over the years, the field has witnessed a significant evolution—from static word embeddings, such as Word2Vec and GloVe, which represent words as fixed vectors, to dynamic, contextualized embeddings like BERT and GPT, which generate word representations based on their surrounding context. This survey provides a comprehensive overview of embedding techniques, tracing their development from early methods to state-of-the-art approaches. We discuss the strengths and limitations of each paradigm, their applications across various NLP tasks, and the challenges they address, such as polysemy and out-of-vocabulary words. Furthermore, we highlight emerging trends, including multimodal embeddings, domain-specific representations, and efforts to mitigate embedding bias. By synthesizing the advancements in this rapidly evolving field, this paper aims to serve as a valuable resource for researchers and practitioners while identifying open challenges and future directions for embedding research in NLP.

APA, Harvard, Vancouver, ISO, and other styles

30

Sreenivasul Reddy Meegada. "Impact of customer experience from traditional IVR to virtual assistants in contact centers." Global Journal of Engineering and Technology Advances 23, no. 1 (2025): 097–102. https://doi.org/10.30574/gjeta.2025.23.1.0099.

Full text

Abstract:

The evolution of customer experience in contact centers has undergone a remarkable transformation with the transition from traditional Interactive Voice Response (IVR) systems to Natural Language Processing (NLP)-powered Virtual Assistants. This article explores the fundamental limitations of conventional IVR technologies that have led to customer frustration, including navigational complexity, lack of personalization, cognitive burden on users, and emotional disconnection. The integration of advanced NLP capabilities has revolutionized customer interactions by enabling more intuitive engagement through intent recognition, contextual processing, affective computing, and multimodal understanding. These technological advancements deliver substantial operational benefits through intelligent routing precision, predictive prioritization, dynamic capacity management, and agent augmentation. The article further examines critical implementation considerations, including data-driven design methodologies, hybrid architecture deployment strategies, continuous learning frameworks, cross-functional governance structures, and transparent design principles. By comprehensively analyzing both the challenges of traditional systems and the transformative potential of NLP technologies, this article provides valuable insights into a technological evolution that is fundamentally reshaping customer service paradigms across industries, establishing experience quality as a primary competitive differentiator in contemporary business environments.

APA, Harvard, Vancouver, ISO, and other styles

31

Sonawale, Om. "Hybrid Deep Learning Framework for Personality Prediction in E-Recruitment." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 06 (2025): 1–9. https://doi.org/10.55041/ijsrem49831.

Full text

Abstract:

Abstract - The Personality prediction is a vital task in the domain of psychology, human-computer interaction, and user behavior analysis, with applications ranging from tailored advertisements to mental health assessments. Traditional methods rely heavily on self-report questionnaires or psychological assessments, which can be time-consuming and subjective. To overcome these limitations, researchers are exploring automated and objective methods using deep learning techniques, particularly Convolutional Neural Networks (CNNs) and Natural Language Processing (NLP) algorithms, which excel at capturing complex patterns in multimodal data. In this work, we propose a hybrid framework that combines CNN-based feature extraction and NLP algorithms for predicting personality traits using data such as facial expressions, speech patterns, and text analysis. The CNN model is leveraged due to its robust feature extraction capabilities, enabling it to learn intricate patterns directly from raw data inputs, which correlate with the Big Five personality traits. Additionally, we integrate a Support Vector Classifier (SVC) to classify personality traits based on the extracted features, offering improved prediction accuracy across diverse data sources. Keywords—Personality Prediction, Big Five Personality Traits, CNN, NLP Algorithm, SVC Classifier

APA, Harvard, Vancouver, ISO, and other styles

32

Shahid Iqbal Rai, Maida Maqsood, Bushra Hanif, et al. "Computational linguistics at the crossroads: A comprehensive review of NLP advancements." World Journal of Advanced Engineering Technology and Sciences 11, no. 2 (2024): 578–91. http://dx.doi.org/10.30574/wjaets.2024.11.2.0146.

Full text

Abstract:

New NLP breakthroughs have put Computational Linguistics at a crossroads. NLP's past, present, and future are covered. This review explains computational linguistics' creation with a brief history of linguistics and computer science. Early solutions processed and understood natural language using rule-based systems using manually constructed linguistic rules. Over time, these tactics became increasingly problematic as language became more complex and obscure. Statistical approaches transformed operations. Neural network-based machine learning methods are leading the area because they can learn complicated patterns and representations from large text collections. A data-driven model revolution in natural language processing enhanced language modelling, machine translation, and sentiment analysis. Next, NLP improvements for several tasks and applications are evaluated. Language understanding models that capture semantic nuances and contextual relationships use deep learning frameworks. Word embeddings and transformer-based architectures like GPT and BERT perform well on benchmark datasets for text classification, question answering, and named item identification. The paper also shows how NLP interacts with computer vision, voice processing, and other domains to show the merits and cons of cross-disciplinary research. Multimodal techniques that combine text, graphics, and audio may increase natural language processing and interpretation. The review discusses NLP's effects on prejudice, justice, and privacy. Responsible development and implementation are needed when NLP technology becomes widespread due to algorithmic bias and data privacy concerns. NLP research directions and concerns are reviewed. Existing models may meet standards but fail in practice.

APA, Harvard, Vancouver, ISO, and other styles

33

Vani Panguluri. "AI-powered sales quote generation: The Intersection of NLP, CRM, and Revenue Optimization." World Journal of Advanced Engineering Technology and Sciences 15, no. 3 (2025): 248–58. https://doi.org/10.30574/wjaets.2025.15.3.0849.

Full text

Abstract:

This article examines the transformative impact of Natural Language Processing (NLP) in automating and enhancing sales quote generation processes. The article shows the operational efficiencies, strategic advantages, and customer experience improvements resulting from implementing AI-driven quote generation systems. Through an article analysis of implementation case studies across diverse industries, the research quantifies performance improvements in quote generation time, error reduction, sales cycle duration, and conversion rates. The findings reveal significant implications for sales organization structures, talent development strategies, and performance management frameworks. The article further identifies critical research opportunities in contextual understanding capabilities, multimodal integration, implementation methodologies, and strategic differentiation. This article provides a foundation for organizations seeking to leverage NLP technologies to achieve sustainable competitive advantage in increasingly digital sales environments while offering a roadmap for future technical and organizational research priorities.

APA, Harvard, Vancouver, ISO, and other styles

34

Md Fokrul Islam Khan, Mst Halema Begum, Md Arifur Rahman, Golam Qibria Limon, Md Ali Azam, and Abdul Kadar Muhammad Masum. "A comprehensive review of advances in transformer, GAN, and attention mechanisms: Their role in multimodal learning and applications across NLP." International Journal of Science and Research Archive 15, no. 1 (2025): 454–59. https://doi.org/10.30574/ijsra.2025.15.1.0980.

Full text

Abstract:

The emergence and subsequent development of deep learning, specifically transformer-based architectures, Generative Adversarial Networks (GANs), and attention mechanisms, have had revolutionary implications on Natural Language Processing (NLP) and multimodal learning. Transformer models are neural network architectures that change an input sequence into an output sequence. Transformer architectures like the Generative Pre-Training Transformer (GPT) and Bidirectional Encoder Representations from Transformers (BERT) leverage self-attention mechanisms to enable high-level contextual learning as well as long-range dependencies. GANs are a kind of AI algorithm that is designed to solve generative modeling problems. Different GANs, such as StyleGAN and BigCAN, study a collection of training data and learn the distribution probabilities used to generate such datasets. Attention mechanisms, acting as the unifying thread between Transformers and GANs in multimodal learning, optimize deep learning models to attend to the most relevant parts of the input data. This paper explores the synergy between these technologies, emphasizing their combined potential in multimodal learning frameworks. In addition, the paper analyzes recent advancements, key innovations, and practical implementations that leverage Transformers, GANs, and attention mechanisms to enhance natural language understanding and generation.

APA, Harvard, Vancouver, ISO, and other styles

35

K, Sharavana, Kedarnath Bhakta, Jayanth Sai Chethan S, Jayant Chand, and Meet Joshi K. "Evolution of Natural Language Processing: A Review." Journal of Knowledge in Data Science and Information Management 1, no. 1 (2024): 30–38. http://dx.doi.org/10.46610/jokdsim.2024.v01i01.004.

Full text

Abstract:

Over the years, Natural Language Processing (NLP) has evolved dramatically, moving from early rule based systems to the current era dominated by advanced deep learning models. An overview of the significant turning points and patterns that have influenced the development of NLP is given in this study. In the early days of Natural Language Processing (NLP), rule based methods were the main focus. Linguists would manually create rules to analyze and comprehend human language. Although these systems were somewhat successful, they were unable to handle the complexity and unpredictability of spoken language. A major change was brought about by the introduction of probabilistic models and machine learning techniques with the emergence of statistical approaches. The development of methods like n gram models and hidden Markov models during this time allowed computers to handle linguistic patterns. Large scale linguistic resources like word embeddings and annotated corpora started to appear, which further accelerated the development of NLP. Machine learning algorithms have led to notable advancements in tasks such as machine translation, named entity recognition, and part of speech tagging. NLP has seen a revolution in recent years thanks to deep learning, which uses neural networks to learn intricate language representations. Sequential dependencies in language can now be better understood because of models like Long Short Term Memory Networks (LSTMs) and Recurrent Neural Networks (RNNs). The addition of attention mechanisms, as demonstrated by Transformer and other models, improves the model's ability to manage long range dependencies and perform better on a variety of NLP tasks. In the future, NLP will develop in ways that go beyond performance measurements, exploring interpretability, ethical issues, and the incorporation of multimodal data. As the area develops, it becomes increasingly important to eliminate biases, ensure ethical AI deployment, and improve user centric experiences. This introduction lays the groundwork for an in depth examination of the development of NLP, highlighting significant turning points, difficulties, and potential future directions in this vibrant and quickly developing subject.

APA, Harvard, Vancouver, ISO, and other styles

36

Zhang, Yao. "Analysis of the Integration Strategies of LLM and VLM Models with the Transformer Architecture." Journal of Computer Science and Artificial Intelligence 2, no. 3 (2025): 55–57. https://doi.org/10.54097/3fhs5d75.

Full text

Abstract:

With the rapid development of artificial intelligence technology, Transformer architecture has become the core framework of natural language processing (NLP) and multimodal domain. In this paper, the fusion strategies of Large Language Model (LLM) and Visual Language Model (VLM) with Transformer architecture are deeply studied. This paper first introduces the basic principles and characteristics of Transformer architecture, LLM and VLM models, and then makes a comprehensive analysis of the advantages and challenges of different fusion strategies, and demonstrates the practical application effect of these fusion strategies in multimodal tasks through application cases such as visual question answering (VQA) and image description generation. The results show that by optimizing the model structure, training strategy and data processing, the integration of LLM and VLM with Transformer architecture can significantly improve the performance of the model in language and visual tasks, which provides a new idea and method for the development of multimodal artificial intelligence.

APA, Harvard, Vancouver, ISO, and other styles

37

Suryavanshi, Pallavi. "Deep Learning for Multimodal Sentiment Analysis Integrating Text, Audio, and Video." International Journal of Recent Development in Engineering and Technology 14, no. 2 (2025): 1–5. https://doi.org/10.54380/ijrdet0225_01.

Full text

Abstract:

In the fields of artificial intelligence (AI) and natural language processing (NLP), sentiment analysis (SA) has become increasingly popular. Demand for automating user sentiment analysis of goods and services is rising. Videos, as opposed to just text, are becoming more and more common online for sharing opinions. This has made the use of various modalities in SA, known as Multimodal Sentiment Analysis (MSA), a significant field of study. MSA uses the most recent developments in deep learning and machine learning at several phases, such as sentiment polarity detection and multimodal feature extraction and fusion, with the goal of reducing error rates and enhancing performance. Multiple data sources, such as text, audio, and video, are used into MSA to improve sentiment classification accuracy. Using cutting-edge deep learning algorithms, this work integrates text, audio, and video characteristics to examine multimodal sentiment analysis. After outlining a framework for feature extraction, fusion, and data pre-processing, we assess the framework's performance against industry-standard benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

38

Shahwan, younis ali, and Ibrahim Mahmood Dr. "The Role of NLP In Fake News Detection and Misinformation Mitigation." Engineering and Technology Journal 10, no. 05 (2025): 5087–99. https://doi.org/10.5281/zenodo.15471942.

Full text

Abstract:

In the era of rapid digital communication, the spread of fake news has emerged as a global challenge, influencing public opinion and undermining trust in information systems. This review explores the pivotal role of Natural Language Processing (NLP) in detecting fake news and mitigating misinformation. Various approaches integrate NLP techniques with machine learning (ML) and deep learning (DL) architectures to enhance detection accuracy and robustness. Commonly used text representation methods include TF-IDF, Word2Vec, GloVe, and BERT, often supported by syntactic and semantic features such as POS tagging, named entity recognition (NER), stylometry, and sentiment analysis. Advanced architectures like CNN-RNN hybrids, dual BERT models, and capsule networks have demonstrated high effectiveness, with performance metrics reaching up to 99.8% on benchmark datasets. Further strategies such as ensemble learning, stance detection, adversarial robustness, and the incorporation of external verification tools have been shown to improve credibility assessment. While many models achieve high accuracy in controlled environments, challenges persist in cross-domain generalization, multilingual adaptability, and ethical transparency. This review highlights the critical contributions of NLP in combating misinformation and recommends future systems to leverage multimodal data, real-time responsiveness, and explainable AI for more resilient and trustworthy detection frameworks.  

APA, Harvard, Vancouver, ISO, and other styles

39

Pakray, Partha, Alexander Gelbukh, and Sivaji Bandyopadhyay. "Natural language processing applications for low-resource languages." Natural Language Processing 31, no. 2 (2025): 183–97. https://doi.org/10.1017/nlp.2024.33.

Full text

Abstract:

AbstractNatural language processing (NLP) has significantly advanced our ability to model and interact with human language through technology. However, these advancements have disproportionately benefited high-resource languages with abundant data for training complex models. Low-resource languages, often spoken by smaller or marginalized communities, need help realizing the full potential of NLP applications. The primary challenges in developing NLP applications for low-resource languages stem from the need for large, well-annotated datasets, standardized tools, and linguistic resources. This scarcity of resources hinders the performance of data-driven approaches that have excelled in high-resource settings. Further, low-resource languages frequently exhibit complex grammatical structures, diverse vocabularies, and unique social contexts, which pose additional challenges for standard NLP techniques. Innovative strategies are emerging to address these challenges. Researchers are actively collecting and curating datasets, even utilizing community engagement platforms to expand data resources. Transfer learning, where models pre-trained on high-resource languages are adapted to low-resource settings, has shown significant promise. Multilingual models like Multilingual Bidirectional Encoder Representations from Transformers (mBERT) and Cross Lingual Models (XLM-R), trained on vast quantities of multilingual data, offer a powerful avenue for cross-lingual knowledge transfer. Additionally, researchers are exploring integrating multimodal approaches, combining textual data with images, audio, or video, to enhance NLP performance in low-resource language scenarios. This survey covers applications like part-of-speech tagging, morphological analysis, sentiment analysis, hate speech detection, dependency parsing, language identification, discourse annotation guidelines, question answering, machine translation, information retrieval, and predictive authoring for augmentative and alternative communication systems. The review also highlights machine learning approaches, deep learning approaches, Transformers, and cross-lingual transfer learning as practical techniques. Developing practical NLP applications for low-resource languages is crucial for preserving linguistic diversity, fostering inclusion within the digital world, and expanding our understanding of human language. While challenges remain, the strategies outlined in this survey demonstrate the ongoing progress and highlight the potential for NLP to empower communities that speak low-resource languages and contribute to a more equitable landscape within language technology.

APA, Harvard, Vancouver, ISO, and other styles

40

Sun, Yu, Yihang Qin, Wenhao Chen, Xuan Li, and Chunlian Li. "Context-Aware Multimodal Fusion with Sensor-Augmented Cross-Modal Learning: The BLAF Architecture for Robust Chinese Homophone Disambiguation in Dynamic Environments." Applied Sciences 15, no. 13 (2025): 7068. https://doi.org/10.3390/app15137068.

Full text

Abstract:

Chinese, a tonal language with inherent homophonic ambiguity, poses significant challenges for semantic disambiguation in natural language processing (NLP), hindering applications like speech recognition, dialog systems, and assistive technologies. Traditional static disambiguation methods suffer from poor adaptability in dynamic environments and low-frequency scenarios, limiting their real-world utility. To address these limitations, we propose BLAF—a novel MacBERT-BiLSTM Hybrid Architecture—that synergizes global semantic understanding with local sequential dependencies through dynamic multimodal feature fusion. This framework incorporates innovative mechanisms for the principled weighting of heterogeneous features, effective alignment of representations, and sensor-augmented cross-modal learning to enhance robustness, particularly in noisy environments. Employing a staged optimization strategy, BLAF achieves state-of-the-art performance on the SIGHAN 2015 (data fine-tuning and supplementation): 93.37% accuracy and 93.25% F1 score, surpassing pure BERT by 15.74% in accuracy. Ablation studies confirm the critical contributions of the integrated components. Furthermore, the sensor-augmented module significantly improves robustness under noise (speech SNR to 18.6 dB at 75 dB noise, 12.7% reduction in word error rates). By bridging gaps among tonal phonetics, contextual semantics, and computational efficiency, BLAF establishes a scalable paradigm for robust Chinese homophone disambiguation in industrial NLP applications. This work advances cognitive intelligence in Chinese NLP and provides a blueprint for adaptive disambiguation in resource-constrained and dynamic scenarios.

APA, Harvard, Vancouver, ISO, and other styles

41

Schmidt, Thomas, Manuel Burghardt, and Christian Wolff. "Toward Multimodal Sentiment Analysis of Historic Plays." Digital Humanities in the Nordic and Baltic Countries Publications 2, no. 1 (2019): 405–14. http://dx.doi.org/10.5617/dhnbpub.11114.

Full text

Abstract:

We present a case study as part of a work-in-progress project about multimodal sentiment analysis on historic German plays, taking Emilia Galotti by G. E. Lessing as our initial use case. We analyze the textual version and an audio version (audio book). We focus on ready-to-use sentiment analysis methods: For the textual component, we implement a naive lexicon-based approach and another approach that enhances the lexicon by means of several NLP methods. For the audio analysis, we use the free version of the Vokaturi tool. We compare the results of all approaches and evaluate them against the annotations of a human expert, which serves as a gold standard. For our use case, we can show that audio and text sentiment analysis behave very differently: textual sentiment analysis tends to predict sentiment as rather negative and audio sentiment as rather positive. Compared to the gold standard, the textual sentiment analysis achieves accuracies of 56% while the accuracy for audio sentiment analysis is only 32%. We discuss possible reasons for these mediocre results and give an outlook on further steps we want to pursue in the context of multimodal sentiment analysis on historic plays.

APA, Harvard, Vancouver, ISO, and other styles

42

Ayaz, Ahmed Faridi, and Hiwarkar Tryambak. "Multimodal Sentiment Analysis: A Systematic review of History, Datasets, Multimodal Fusion Methods, Applications, Challenges and Future Directions." Journal of Research & Development' 14, no. 20 (2022): 85–90. https://doi.org/10.5281/zenodo.7525024.

Full text

Abstract:

<strong>Abstract</strong> Sentiment analysis (SA), a buzzword in the fields of artificial intelligence (AI) and natural language processing (NLP), is gaining popularity. Due to numerous SA applications, there is an increasing need to automate the procedure of analysing the user's feelings concerning any products or services. Multimodal Sentiment Analysis (MSA), a branch of sentiment analysis that uses many modalities, is a rapidly growing topic of study as more and more opinions are expressed through videos rather than just text. Recent advances in machine learning are used by MSA to advance. At each stage of the MSA, the most recent developments in machine learning and deep learning are used, including sentiment polarity recognition, multimodal features extraction, and multimodal fusion with reduced error rates and increased speed. This research paper categorises several recent developments in MSA designs into 10 categories and focuses mostly on the primary taxonomy and recently published Multimodal Fusion architectures. The 10 categories are: early fusion, late fusion, hybrid, model-level fusion, tensor fusion, hierarchical, bi-modal, attention-based, quantum-based, and word-level fusion. The primary contribution of this manuscript is a study of the advantages and disadvantages of various architectural developments in MSA fusion. It also talks about future scope, uses in other industries, and research shortages.

APA, Harvard, Vancouver, ISO, and other styles

43

Nagaraju, Dr Regonda. "VOCAL MOOD DETECTION USING NATURAL LANGUAGE PROCESSING." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem45039.

Full text

Abstract:

ABSTRACT Vocal Mood Detector, a branch of Natural Language Processing (NLP), focuses on identifying emotions from spoken language using linguistic and acoustic features such as lexical content, pitch, tone, rhythm, and prosody. This process integrates advanced NLP techniques and speech signal processing to decode emotions like happiness, anger, sadness, and neutrality. Recent advancements leverage deep learning models, including recurrent and convolutional neural networks (RNNs and CNNs) and transformer-based architectures, to improve accuracy. These models capture temporal and semantic nuances, while multimodal approaches enhance performance by combining textual and acoustic data. Applications span customer service, virtual assistants, mental health monitoring, and adaptive learning systems. However, Additionally, detecting subtle emotional shifts and managing multi-speaker dialogues add complexity. This study reviews speech emotion analyzers' methodologies, tools, and applications, highlighting current limitations and future research directions. Enhanced systems promise to transform human-computer interaction by enabling more empathetic and adaptive AI.

APA, Harvard, Vancouver, ISO, and other styles

44

Nilesh Singh. "Leveraging NLP for real-time social media analytics: trends, sentiment, and insights." World Journal of Advanced Engineering Technology and Sciences 15, no. 1 (2025): 2172–85. https://doi.org/10.30574/wjaets.2025.15.1.0465.

Full text

Abstract:

Social media platforms have revolutionized how information flows in the digital age, creating unprecedented opportunities for analyzing public opinion and tracking emerging trends in real-time. This paper explores how Natural Language Processing (NLP) techniques can effectively process and analyze the vast unstructured data generated across social media channels. We examine advancements in sentiment analysis, entity recognition, topic modeling, and trend detection that transform noisy social media content into actionable insights. Through case studies spanning brand reputation monitoring, public health surveillance, and social movement analysis, we demonstrate practical applications of these techniques. The paper also addresses challenges inherent to social media text processing—including linguistic diversity, contextual understanding, multimodal content integration, and representativeness bias—while proposing emerging directions to overcome these limitations through cross-platform analytics, privacy-preserving methods, causal relationship identification, and improved misinformation detection systems.

APA, Harvard, Vancouver, ISO, and other styles

45

Mikołajewska, Emilia, and Jolanta Masiak. "Deep Learning Approaches to Natural Language Processing for Digital Twins of Patients in Psychiatry and Neurological Rehabilitation." Electronics 14, no. 10 (2025): 2024. https://doi.org/10.3390/electronics14102024.

Full text

Abstract:

Deep learning (DL) approaches to natural language processing (NLP) offer powerful tools for creating digital twins (DTs) of patients in psychiatry and neurological rehabilitation by processing unstructured textual data such as clinical notes, therapy transcripts, and patient-reported outcomes. Techniques such as transformer models (e.g., BERT, GPT) enable the analysis of nuanced language patterns to assess mental health, cognitive impairment, and emotional states. These models can capture subtle linguistic features that correlate with symptoms of degenerative disorders (e.g., aMCI) and mental disorders such as depression or anxiety, providing valuable insights for personalized treatment. In neurological rehabilitation, NLP models help track progress by analyzing a patient’s language during therapy, such as recovery from aphasia or cognitive decline caused by neurological deficits. DL methods integrate multimodal data by combining NLP with speech, gesture, and sensor data to create holistic DTs that simulate patient behavior and health trajectories. Recurrent neural networks (RNNs) and attention mechanisms are commonly used to analyze time-series conversational data, enabling long-term tracking of a patient’s mental health. These approaches support predictive analytics and early diagnosis by predicting potential relapses or adverse events by identifying patterns in patient communication over time. However, it is important to note that ethical considerations such as ensuring data privacy, avoiding bias, and ensuring explainability are crucial when implementing NLP models in clinical settings to ensure patient trust and safety. NLP-based DTs can facilitate collaborative care by summarizing patient insights and providing actionable recommendations to medical staff in real time. By leveraging DL, these DTs offer scalable, data-driven solutions to promote personalized care and improve outcomes in psychiatry and neurological rehabilitation.

APA, Harvard, Vancouver, ISO, and other styles

46

Vandana Kalra. "Coupling NLP for Intelligent Knowledge Management in Organizations: A Framework for AI-Powered Decision Support." Journal of Information Systems Engineering and Management 10, no. 10s (2025): 23–28. https://doi.org/10.52783/jisem.v10i10s.1337.

Full text

Abstract:

Knowledge management (KM) is crucial component for business development in modern enterprises and this type of management is facilitated through technology. Nevertheless, conventional knowledge management systems (KMS) face problems concerning, but not limited to, information silos, difficulty in accessing data, and the complexity in managing unstructured data. As new advancements are made towards Natural Language Processing (NLP), Artificial Intelligence (AI) technologies that allow for contextual knowledge discovery, intelligent search, automated summarization, and real time content classification become readily available. This research analyzes the application of NLP systems concerning their integration with knowledge systems in business, information retrieval, enterprise search, and knowledge recommendation systems. For these integrations to be successful, Name Entity Recognition (NER), semantic search, Retrieval-Augmented Generation (RAG), Optical Character Reader (OCR), and Explainable AI (XAI) technologies need to be utilized. This will assure that decision-making processes are secure and ethical. This paper also presents an NLP-Driven Knowledge Management Framework (NLP-KMF), which is a novel framework that helps manage knowledge. The paper discusses the real-world usage of NLP-powered knowledge management in corporate learning, customer service, and compliance with Google, Accenture, IBM, and JPMorgan Chase serving as the centers of case studies. Strategies to counter issues such as AI bias and misinformation alongside privacy threats are discussed as well. The last section of the paper analyzes the forthcoming research areas that could include topics such as multimodal AI for knowledge management, AI repositories that continuously learn, and decision intelligence driven by AI. This serves as a constructive and precise plan for organizations that wish to evolve from static knowledge databases to dynamic self-adapting AI systems.

APA, Harvard, Vancouver, ISO, and other styles

47

S, Saraswathi, Jeevithaa S, Vishwabharathy K, and Eyuvaraj D. "Deep Learning Multimodal Methods to Detect Fake News." June 2024 6, no. 2 (2024): 139–52. http://dx.doi.org/10.36548/jtcsst.2024.2.004.

Full text

Abstract:

Fake news, characterized by false information disseminated intentionally with malicious intent, has become a critical societal issue. Its impact spans political, economic, and social domains, fueled by the rapid proliferation of digital communication channels, particularly social media. To combat this menace, researchers have turned to automated mechanisms for detection, leveraging machine learning algorithms and curated datasets. In this exploratory research, the landscape of machine learning algorithms is employed in identifying fake news. Notably, the research focus on algorithms such as the Bidirectional Encoder Representations from Transformers (BERT) and Convolutional Neural Network (CNN) respectively. However, most of these studies rely on controlled datasets lacking real-time information from social networks—the very platforms where disinformation thrives. The findings underscore the need for research in social network environments, where fake news spreads most prolifically. Additionally, future investigations should extend beyond political news, considering hybrid methods that combine NLP and deep learning techniques. This study serves as a valuable resource for researchers, practitioners, and policymakers seeking insights into the evolving landscape of the ability to combat fake news effectively.

APA, Harvard, Vancouver, ISO, and other styles

48

Zhao, Yanxia, Yuhan Ding, and Xue Min. "Construction of a multimodal dialect corpus based on deep learning and digital twin technology: A case study on the Hangzhou dialect." Journal of Computational Methods in Sciences and Engineering 25, no. 2 (2024): 1448–60. https://doi.org/10.1177/14727978241299701.

Full text

Abstract:

Focused on the digital preservation and inheritance of dialects, this study illustrates the construction pathway of a digitised multimodal dialect corpus and the application of a dialect interactive learning model using the digital twin technology, taking the Hangzhou dialect as a representative. Initially, multimodal resources of the Hangzhou dialect were collected, and with the aid of digital techniques, these resources underwent annotation, segmentation, transcription, and synchronisation, culminating in the creation of the multimodal dialect corpus. Subsequently, features were extracted using Natural Language Processing (NLP) methodologies from deep learning, facilitating the construction of a Hangzhou dialect lexicon. With the annotated corpus as the foundation, combined with Feedforward Sequential Memory Networks (FSMN) and Long Short-Term Memory (LSTM) networks, acoustic and linguistic models for the Hangzhou dialect were developed, laying the groundwork for a Hangzhou dialect speech recognition system. Conclusively, by integrating digital twin technology, an autonomous dialect inheritance learning model was crafted. This model establishes a twin learning space and learning twin entity founded on auditory, visual, and tactile multimodal information. Utilising virtual reality technology, a dialect learning ecological model was designed to enhance learner agency, offering diverse learning modalities and personalised content, with the overarching goal of supporting the preservation and inheritance of dialects.

APA, Harvard, Vancouver, ISO, and other styles

49

Pritam, Kumar. "Advanced NLP Techniques for Sentiment Analysis and Text Summarization Using RNNs and Transformers." International Journal for Research in Applied Science and Engineering Technology 12, no. 6 (2024): 1485–94. http://dx.doi.org/10.22214/ijraset.2024.63358.

Full text

Abstract:

Abstract: This research focuses on leveraging artificial intelligence and neural network architectures to enhance the capability of machines in comprehending, interpreting, and summarizing text data in human languages. The study aims to improve natural language processing (NLP) tasks, specifically sentiment classification and text summarization. Key efforts include the development of neural network architectures such as Recurrent Neural Networks (RNNs) and Transformers to model linguistic contexts and sequences. The creation of annotated datasets for sentiment analysis and summarization was essential for training and evaluating these models. Additionally, transfer learning techniques were explored to pretrain language models on large corpora, enhancing their performance. The evaluation of neural network models utilized relevant NLP metrics like accuracy, ROC curve, and F1 score for sentiment classification tasks. The research also developed end-to-end NLP pipelines leveraging trained neural networks for document summarization and sentiment detection. The results confirmed that AI and neural networks could effectively perform sentiment analysis and text summarization. Training metrics indicated robust learning and generalization capabilities, with high accuracy and improved ROUGE and BERT scores. The findings underscore the potential of deep neural networks in understanding and summarizing textual content, suggesting promising directions for future work, including deeper neural networks, attention models, and multimodal data integration.

APA, Harvard, Vancouver, ISO, and other styles

50

Yang, Zhengbang, Haotian Xia, Jingxi Li, Zezhi Chen, Zhuangdi Zhu, and Weining Shen. "Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models Through Question Answering from Text to Video." Electronics 14, no. 3 (2025): 461. https://doi.org/10.3390/electronics14030461.

Full text

Abstract:

Understanding sports presents a fascinating challenge for Natural Language Processing (NLP) due to its intricate and ever-changing nature. Current NLP technologies struggle with the advanced cognitive demands required to reason over complex sports scenarios. To explore the current boundaries of this field, we extensively evaluated mainstream and emerging large models on various sports tasks and addressed the limitations of previous benchmarks. Our study ranges from answering simple queries about basic rules and historical facts to engaging in complex, context-specific reasoning using strategies like few-shot learning and chain-of-thought techniques. Beyond text-based analysis, we also explored the sports reasoning capabilities of mainstream video language models to bridge the gap in benchmarking multimodal sports understanding. Based on a comprehensive overview of main-stream large models on diverse sports understanding tasks, we presented a new benchmark, which highlighted the critical challenges of sports understanding for NLP and the varying capabilities of state-of-the-art large models on sports understanding. We also provided an extensive set of error analyses that pointed to detailed reasoning defects of large model reasoning which model-based error analysis failed to reveal. We hope the benchmark and the error analysis set will help identify future research priorities in this field.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!