Log in

Relevant bibliographies by topics / Multimodal Sentiment Analysis / Journal articles

To see the other types of publications on this topic, follow the link: Multimodal Sentiment Analysis.

Journal articles on the topic 'Multimodal Sentiment Analysis'

Author: Grafiati

Published: 7 June 2025

Last updated: 16 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multimodal Sentiment Analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Qu, Saiying. "A Thematic Analysis of English and American Literature Works Based on Text Mining and Sentiment Analysis." Journal of Electrical Systems 20, no. 6s (2024): 1575–86. http://dx.doi.org/10.52783/jes.3076.

Full text

Abstract:

A theme analysis model integrating text mining and sentiment analysis has emerged as a powerful tool for understanding English and American literary works. By employing techniques such as topic modeling, keyword extraction, and sentiment analysis, this model can identify recurring themes, motifs, and emotional tones within texts. Through text mining, it extracts key concepts and topics, while sentiment analysis discerns the underlying emotions conveyed by the authors. By combining these approaches, researchers can uncover deeper insights into the thematic elements and cultural contexts of English and American literature. This paper explores the application of text mining and sentiment analysis techniques to analyze a dataset comprising American literary works. With computational methods such as bi-gram analysis, multimodal feature extraction, and sentiment analysis using the Bi-gram Multimodal Sentimental Analysis (Bi-gramMSA) approach. With the proposed Bi-gramMSA model the multimodal features in the American Literature are examined to investigate the thematic, emotional, and multimodal aspects of the literature. Through our analysis, we uncover significant bi-grams, extract multimodal features, and assess sentiment distribution across the texts. The results highlight the effectiveness of these computational methodologies in uncovering patterns, sentiments, and features within the literary corpus. The proposed Bi-gramMSA model achives a higher score for the different scores in the Chinese Literature.

APA, Harvard, Vancouver, ISO, and other styles

2

Kaur, Ramandeep, and Sandeep Kautish. "Multimodal Sentiment Analysis." International Journal of Service Science, Management, Engineering, and Technology 10, no. 2 (2019): 38–58. http://dx.doi.org/10.4018/ijssmet.2019040103.

Full text

Abstract:

Multimodal sentiments have become the challenge for the researchers and are equally sophisticated for an appliance to understand. One of the studies that support MS problems is a MSA, which is the training of emotions, attitude, and opinion from the audiovisual format. This survey article covers the comprehensive overview of the last update in this field. Many recently proposed algorithms and various MSA applications are presented briefly in this survey. The article is categorized according to their contributions in the various MSA techniques. The main purpose of this survey is to provide a full image of the MSA opportunities and difficulties and related field with brief details. The main contribution of this article includes the sophisticated categorizations of a large number of recent articles and the illustration of the recent trend of research in the MSA and its related areas.

APA, Harvard, Vancouver, ISO, and other styles

3

Prathi, Ms S. "Multimodal Sentiment Analysis." International Journal of Scientific Research and Engineering Trends 11, no. 2 (2025): 983–89. https://doi.org/10.61137/ijsret.vol.11.issue2.202.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Zhu, Linlin, Heli Sun, Qunshu Gao, Yuze Liu, and Liang He. "Aspect Enhancement and Text Simplification in Multimodal Aspect-Based Sentiment Analysis for Multi-Aspect and Multi-Sentiment Scenarios." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 2 (2025): 1683–91. https://doi.org/10.1609/aaai.v39i2.32161.

Full text

Abstract:

Multimodal Aspect-Based Sentiment Analysis (MABSA) plays a pivotal role in the advancement of sentiment analysis technology. Although current methods strive to integrate multimodal information to enhance the performance of sentiment analysis, they still face two critical challenges when dealing with multi-aspect and multi-sentiment data: i) the importance of aspect terms within multimodal data is often overlooked, and ii) models fail to accurately associate specific aspect terms with corresponding sentiment words in multi-aspect and multi-sentiment sentences. To tackle these problems, we propose a novel multimodal aspect-based sentiment analysis method that combines Aspect Enhancement and Text Simplification (AETS). Specifically, we develop an aspect enhancement module that boosts the ability of model to discern relevant aspect terms. Concurrently, we employ text simplification module to simplify and restructure multi-aspect and multi-sentiment texts, accurately capturing aspects and their corresponding sentiments while reducing irrelevant information. Leveraging this method, we perform three tasks including multimodal aspect term extraction, multimodal aspect sentiment classification, and joint multimodal aspect-based sentiment analysis. Experimental results indicate that our proposed AETS model achieved state-of-the-art performance on two benchmark datasets.

APA, Harvard, Vancouver, ISO, and other styles

5

Zhang, Kang, Yushui Geng, Jing Zhao, Jianxin Liu, and Wenxiao Li. "Sentiment Analysis of Social Media via Multimodal Feature Fusion." Symmetry 12, no. 12 (2020): 2010. http://dx.doi.org/10.3390/sym12122010.

Full text

Abstract:

In recent years, with the popularity of social media, users are increasingly keen to express their feelings and opinions in the form of pictures and text, which makes multimodal data with text and pictures the con tent type with the most growth. Most of the information posted by users on social media has obvious sentimental aspects, and multimodal sentiment analysis has become an important research field. Previous studies on multimodal sentiment analysis have primarily focused on extracting text and image features separately and then combining them for sentiment classification. These studies often ignore the interaction between text and images. Therefore, this paper proposes a new multimodal sentiment analysis model. The model first eliminates noise interference in textual data and extracts more important image features. Then, in the feature-fusion part based on the attention mechanism, the text and images learn the internal features from each other through symmetry. Then the fusion features are applied to sentiment classification tasks. The experimental results on two common multimodal sentiment datasets demonstrate the effectiveness of the proposed model.

APA, Harvard, Vancouver, ISO, and other styles

6

Zhang, Yifei, Zhiqing Zhang, Shi Feng, and Daling Wang. "Visual Enhancement Capsule Network for Aspect-based Multimodal Sentiment Analysis." Applied Sciences 12, no. 23 (2022): 12146. http://dx.doi.org/10.3390/app122312146.

Full text

Abstract:

Multimodal sentiment analysis, which aims to recognize the emotions expressed in multimodal data, has attracted extensive attention in both academia and industry. However, most of the current studies on user-generated reviews classify the overall sentiments of reviews and hardly consider the aspects of user expression. In addition, user-generated reviews on social media are usually dominated by short texts expressing opinions, sometimes attached with images to complement or enhance the emotion. Based on this observation, we propose a visual enhancement capsule network (VECapsNet) based on multimodal fusion for the task of aspect-based sentiment analysis. Firstly, an adaptive mask memory capsule network is designed to extract the local clustering information from opinion text. Then, an aspect-guided visual attention mechanism is constructed to obtain the image information related to the aspect phrases. Finally, a multimodal fusion module based on interactive learning is presented for multimodal sentiment classification, which takes the aspect phrases as the query vectors to continuously capture the multimodal features correlated to the affective entities in multi-round iterative learning. Otherwise, due to the limited number of multimodal aspect-based sentiment review datasets at present, we build a large-scale multimodal aspect-based sentiment dataset of Chinese restaurant reviews, called MTCom. The extensive experiments both on the single-modal and multimodal datasets demonstrate that our model can better capture the local aspect-based sentiment features and is more applicable for general multimodal user reviews than existing methods. The experimental results verify the effectiveness of our proposed VECapsNet.

APA, Harvard, Vancouver, ISO, and other styles

7

Jiang, Tianyue, Sanhong Deng, Peng Wu, and Haibi Jiang. "Real-Time Human-Music Emotional Interaction Based on Deep Learning and Multimodal Sentiment Analysis." Wireless Communications and Mobile Computing 2023 (April 14, 2023): 1–12. http://dx.doi.org/10.1155/2023/4939048.

Full text

Abstract:

Music, as an integral component of culture, holds a prominent position and is widely accessible. There has been growing interest in studying sentiment represented by music and its emotional effects on its audiences, however, much of the existing literature is subjective and overlooks the impact of music on the real-time expression of emotion. In this article, two labeled datasets for music sentiment classification and multimodal sentiment classification were developed. Deep learning is used to classify music sentiment, while decision-level fusion is used to classify the multimodal sentiment of real-time listeners. We combine sentiment analysis with a conventional online music playback system and propose an innovative human-music emotional interaction system based on multimodal sentiment analysis and deep learning. It has been demonstrated through individual observation and questionnaire studies that the interaction between human and musical sentiments has a positive impact on the negative emotions of listeners.

APA, Harvard, Vancouver, ISO, and other styles

8

Peng, Heng, Xue Gu, Jian Li, Zhaodan Wang, and Hao Xu. "Text-Centric Multimodal Contrastive Learning for Sentiment Analysis." Electronics 13, no. 6 (2024): 1149. http://dx.doi.org/10.3390/electronics13061149.

Full text

Abstract:

Multimodal sentiment analysis aims to acquire and integrate sentimental cues from different modalities to identify the sentiment expressed in multimodal data. Despite the widespread adoption of pre-trained language models in recent years to enhance model performance, current research in multimodal sentiment analysis still faces several challenges. Firstly, although pre-trained language models have significantly elevated the density and quality of text features, the present models adhere to a balanced design strategy that lacks a concentrated focus on textual content. Secondly, prevalent feature fusion methods often hinge on spatial consistency assumptions, neglecting essential information about modality interactions and sample relationships within the feature space. In order to surmount these challenges, we propose a text-centric multimodal contrastive learning framework (TCMCL). This framework centers around text and augments text features separately from audio and visual perspectives. In order to effectively learn feature space information from different cross-modal augmented text features, we devised two contrastive learning tasks based on instance prediction and sentiment polarity; this promotes implicit multimodal fusion and obtains more abstract and stable sentiment representations. Our model demonstrates performance that surpasses the current state-of-the-art methods on both the CMU-MOSI and CMU-MOSEI datasets.

APA, Harvard, Vancouver, ISO, and other styles

9

Huang, Ju, Wenkang Chen, Fangyi Wang, and Haijun Zhang. "Heterogeneous Hierarchical Fusion Network for Multimodal Sentiment Analysis in Real-World Environments." Electronics 13, no. 20 (2024): 4137. http://dx.doi.org/10.3390/electronics13204137.

Full text

Abstract:

Multimodal sentiment analysis models can determine users’ sentiments by utilizing rich information from various sources (e.g., textual, visual, and audio). However, there are two key challenges when deploying the model in real-world environments: (1) the limitations of relying on the performance of automatic speech recognition (ASR) models can lead to errors in recognizing sentiment words, which may mislead the sentiment analysis of the textual modality, and (2) variations in information density across modalities complicate the development of a high-quality fusion framework. To address these challenges, this paper proposes a novel Multimodal Sentiment Word Optimization Module and a heterogeneous hierarchical fusion (MSWOHHF) framework. Specifically, the proposed Multimodal Sentiment Word Optimization Module optimizes the sentiment words extracted from the textual modality by the ASR model, thereby reducing sentiment word recognition errors. In the multimodal fusion phase, a heterogeneous hierarchical fusion network architecture is introduced, which first utilizes a Transformer Aggregation Module to fuse the visual and audio modalities, enhancing the high-level semantic features of each modality. A Cross-Attention Fusion Module then integrates the textual modality with the audiovisual fusion. Next, a Feature-Based Attention Fusion Module is proposed that enables fusion by dynamically tuning the weights of both the combined and unimodal representations. It then predicts sentiment polarity using a nonlinear neural network. Finally, the experimental results on the MOSI-SpeechBrain, MOSI-IBM, and MOSI-iFlytek datasets show that the MSWOHHF outperforms several baselines, demonstrating better performance.

APA, Harvard, Vancouver, ISO, and other styles

10

Wang, Peicheng, Shuxian Liu, and Jinyan Chen. "CCDA: A Novel Method to Explore the Cross-Correlation in Dual-Attention for Multimodal Sentiment Analysis." Applied Sciences 14, no. 5 (2024): 1934. http://dx.doi.org/10.3390/app14051934.

Full text

Abstract:

With the development of the Internet, the content that people share contains types of text, images, and videos, and utilizing these multimodal data for sentiment analysis has become an important area of research. Multimodal sentiment analysis aims to understand and perceive emotions or sentiments in different types of data. Currently, the realm of multimodal sentiment analysis faces various challenges, with a major emphasis on addressing two key issues: (1) inefficiency when modeling the intramodality and intermodality dynamics and (2) inability to effectively fuse multimodal features. In this paper, we propose the CCDA (cross-correlation in dual-attention) model, a novel method to explore dynamics between different modalities and fuse multimodal features efficiently. We capture dynamics at intra- and intermodal levels by using two types of attention mechanisms simultaneously. Meanwhile, the cross-correlation loss is introduced to capture the correlation between attention mechanisms. Moreover, the relevant coefficient is proposed to integrate multimodal features effectively. Extensive experiments were conducted on three publicly available datasets, CMU-MOSI, CMU-MOSEI, and CH-SIMS. The experimental results fully confirm the effectiveness of our proposed method, and, compared with the current optimal method (SOTA), our model shows obvious advantages in most of the key metrics, proving its better performance in multimodal sentiment analysis.

APA, Harvard, Vancouver, ISO, and other styles

11

Wang, Weihan. "Exploration of the Application of Multimodal Model in Psychological Analysis." Applied and Computational Engineering 112, no. 1 (2024): 115–22. http://dx.doi.org/10.54254/2755-2721/2024.17918.

Full text

Abstract:

Multimodal sentiment analysis is one of the important research areas in the field of artificial intelligence today. Multimodal sentiment analysis is to extract features from various human modalities such as facial expressions, body movements, and voice information, perform modal fusion, and finally classify and predict emotions. This technology can be used in multiple scenarios such as stock prediction, product analysis, movie box office prediction, etc., especially psychological state analysis, and has important research significance. This paper introduces two important datasets in multimodal sentiment analysis, namely CMU-MOSEI and IEMOCAP. It also introduces the feature-level fusion, model-level fusion, decision-level fusion and other fusion methods in multimodal fusion methods, and also introduces the semantic feature fusion neural network and sentiment word perception fusion network in multimodal sentiment analysis related models. Finally, the application of multimodal sentiment analysis models in depression and other related mental illnesses and the challenges of multimodal sentiment analysis models in the future are introduced. This paper hopes that the above research will be helpful for multimodal sentiment analysis.

APA, Harvard, Vancouver, ISO, and other styles

12

Deng, Lujuan, Boyi Liu, Zuhe Li, Jiangtao Ma, and Hanbing Li. "Context-Dependent Multimodal Sentiment Analysis Based on a Complex Attention Mechanism." Electronics 12, no. 16 (2023): 3516. http://dx.doi.org/10.3390/electronics12163516.

Full text

Abstract:

Multimodal sentiment analysis aims to understand people’s attitudes and opinions from different data forms. Traditional modality fusion methods for multimodal sentiment analysis con-catenate or multiply various modalities without fully utilizing context information and the correlation between modalities. To solve this problem, this article provides a new model based on a multimodal sentiment analysis framework based on a recurrent neural network with a complex attention mechanism. First, after the raw data is preprocessed, the numerical feature representation is obtained using feature extraction. Next, the numerical features are input into the recurrent neural network, and the output results are multimodally fused using a complex attention mechanism layer. The objective of the complex attention mechanism is to leverage enhanced non-linearity to more effectively capture the inter-modal correlations, thereby improving the performance of multimodal sentiment analysis. Finally, the processed results are fed into the classification layer and the sentiment output is obtained using the classification layer. This process can effectively capture the semantic information and contextual relationship of the input sequence and fuse different pieces of modal information. Our model was tested on the CMU-MOSEI datasets, achieving an accuracy of 82.04%.

APA, Harvard, Vancouver, ISO, and other styles

13

Li, Mengyao, Yonghua Zhu, Wenjing Gao, Meng Cao, and Shaoxiu Wang. "Joint Sentiment Part Topic Regression Model for Multimodal Analysis." Information 11, no. 10 (2020): 486. http://dx.doi.org/10.3390/info11100486.

Full text

Abstract:

The development of multimodal media compensates for the lack of information expression in a single modality and thus gradually becomes the main carrier of sentiment. In this situation, automatic assessment for sentiment information in multimodal contents is of increasing importance for many applications. To achieve this, we propose a joint sentiment part topic regression model (JSP) based on latent Dirichlet allocation (LDA), with a sentiment part, which effectively utilizes the complementary information between the modalities and strengthens the relationship between the sentiment layer and multimodal content. Specifically, a linear regression module is developed to share implicit variables between image–text pairs, so that one modality can predict the other. Moreover, a sentiment label layer is added to model the relationship between sentiment distribution parameters and multimodal contents. Experimental results on several datasets verify the feasibility of our proposed approach for multimodal sentiment analysis.

APA, Harvard, Vancouver, ISO, and other styles

14

Huang, Ju, Pengtao Lu, Shuifa Sun, and Fangyi Wang. "Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network." Electronics 12, no. 16 (2023): 3504. http://dx.doi.org/10.3390/electronics12163504.

Full text

Abstract:

In the real world, multimodal sentiment analysis (MSA) enables the capture and analysis of sentiments by fusing multimodal information, thereby enhancing the understanding of real-world environments. The key challenges lie in handling the noise in the acquired data and achieving effective multimodal fusion. When processing the noise in data, existing methods utilize the combination of multimodal features to mitigate errors in sentiment word recognition caused by the performance limitations of automatic speech recognition (ASR) models. However, there still remains the problem of how to more efficiently utilize and combine different modalities to address the data noise. In multimodal fusion, most existing fusion methods have limited adaptability to the feature differences between modalities, making it difficult to capture the potential complex nonlinear interactions that may exist between modalities. To overcome the aforementioned issues, this paper proposes a new framework named multimodal-word-refinement and cross-modal-hierarchy (MWRCMH) fusion. Specifically, we utilized a multimodal word correction module to reduce sentiment word recognition errors caused by ASR. During multimodal fusion, we designed a cross-modal hierarchical fusion module that employed cross-modal attention mechanisms to fuse features between pairs of modalities, resulting in fused bimodal-feature information. Then, the obtained bimodal information and the unimodal information were fused through the nonlinear layer to obtain the final multimodal sentiment feature information. Experimental results on the MOSI-SpeechBrain, MOSI-IBM, and MOSI-iFlytek datasets demonstrated that the proposed approach outperformed other comparative methods, achieving Has0-F1 scores of 76.43%, 80.15%, and 81.93%, respectively. Our approach exhibited better performance, as compared to multiple baselines.

APA, Harvard, Vancouver, ISO, and other styles

15

Narkhede, Mohini, Prof. Pallavi P. Rane, and Prof. Nilesh N. Shingne. "A Systematic Research on Emotion Recognition from Facial Expressions Using Machine Learning Techniques." International Journal of Ingenious Research, Invention and Development (IJIRID) 3, no. 6 (2025): 631–38. https://doi.org/10.5281/zenodo.14790039.

Full text

Abstract:

<em>In recent years, with the popularity of social media, users are increasingly keen to express their feelings and opinions in the form of pictures and text, which makes multimodal data with text and pictures the con tent type with the most growth. Most of the information posted by users on social media has obvious sentimental aspects, and multimodal sentiment analysis has become an important research field. Previous studies on multimodal sentiment analysis have primarily focused on extracting text and image features separately and then combining them for sentiment classification. These studies often ignore the interaction between text and images. Therefore, this project proposes a new multimodal sentiment analysis model. The model first eliminates noise interference in textual data and extracts more important image features. Then, in the feature-fusion part based on the attention mechanism, the text and images learn the internal features from each other through symmetry. Then the fusion features are applied to sentiment classification tasks. The experimental results on emotion recognition sentiment datasets demonstrate the effectiveness of the proposed model.</em>

APA, Harvard, Vancouver, ISO, and other styles

16

Luo, Haozhi. "Research and Application of Key Techniques of Multimodal Sentiment Analysis." Theoretical and Natural Science 105, no. 1 (2025): 89–95. https://doi.org/10.54254/2753-8818/2025.22725.

Full text

Abstract:

Multimodal sentiment analysis is an important research direction in the field of natural language processing and artificial intelligence, aiming to improve the accuracy of emotion recognition by using multimodal data such as text, speech and image. Under the background of today's big data era, the technology of multi-modal sentiment analysis is becoming more and more important, and it is also widely used in all aspects of people's lives. This paper first introduces the basic concept and research background of multimodal sentiment analysis. Then, a series of key technologies such as feature fusion, feature extraction, sentiment analysis, data fusion and deep learning are discussed in the current situation. Next, it analyzes the typical applications of multimodal sentiment analysis in the fields of social media and human-computer interaction. At the same time, it also analyzes and evaluates the future development direction of multimodal sentiment analysis since the limitations of the current application of the technology, and provides certain ideas for the future development of multimodal sentiment analysis.

APA, Harvard, Vancouver, ISO, and other styles

17

Truong, Quoc-Tuan, and Hady W. Lauw. "VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 305–12. http://dx.doi.org/10.1609/aaai.v33i01.3301305.

Full text

Abstract:

Detecting the sentiment expressed by a document is a key task for many applications, e.g., modeling user preferences, monitoring consumer behaviors, assessing product quality. Traditionally, the sentiment analysis task primarily relies on textual content. Fueled by the rise of mobile phones that are often the only cameras on hand, documents on the Web (e.g., reviews, blog posts, tweets) are increasingly multimodal in nature, with photos in addition to textual content. A question arises whether the visual component could be useful for sentiment analysis as well. In this work, we propose Visual Aspect Attention Network or VistaNet, leveraging both textual and visual components. We observe that in many cases, with respect to sentiment detection, images play a supporting role to text, highlighting the salient aspects of an entity, rather than expressing sentiments independently of the text. Therefore, instead of using visual information as features, VistaNet relies on visual information as alignment for pointing out the important sentences of a document using attention. Experiments on restaurant reviews showcase the effectiveness of visual aspect attention, vis-à-vis visual features or textual attention.

APA, Harvard, Vancouver, ISO, and other styles

18

Mingyu, Ji, Zhou Jiawei, and Wei Ning. "AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model." PLOS ONE 17, no. 9 (2022): e0273936. http://dx.doi.org/10.1371/journal.pone.0273936.

Full text

Abstract:

Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasoning and mathematical operations after learning multimodal emotional features. For the problem of how to consider the effective fusion of multimodal data and the relevance of multimodal data in multimodal sentiment analysis, we propose an attention-based mechanism feature relevance fusion multimodal sentiment analysis model (AFR-BERT). In the data pre-processing stage, text features are extracted using the pre-trained language model BERT (Bi-directional Encoder Representation from Transformers), and the BiLSTM (Bi-directional Long Short-Term Memory) is used to obtain the internal information of the audio. In the data fusion phase, the multimodal data fusion network effectively fuses multimodal features through the interaction of text and audio information. During the data analysis phase, the multimodal data association network analyzes the data by exploring the correlation of fused information between text and audio. In the data output phase, the model outputs the results of multimodal sentiment analysis. We conducted extensive comparative experiments on the publicly available sentiment analysis datasets CMU-MOSI and CMU-MOSEI. The experimental results show that AFR-BERT improves on the classical multimodal sentiment analysis model in terms of relevant performance metrics. In addition, ablation experiments and example analysis show that the multimodal data analysis network in AFR-BERT can effectively capture and analyze the sentiment features in text and audio.

APA, Harvard, Vancouver, ISO, and other styles

19

Veluswamy, Anusha Sowbarnika, Nagamani A, SilpaRaj M, Yobu D, Ashwitha M, and Mangaiyarkaras V. "Natural Language Processing for Sentiment Analysis in Socialmedia Techniques and Case Studies." ITM Web of Conferences 76 (2025): 05004. https://doi.org/10.1051/itmconf/20257605004.

Full text

Abstract:

Social media platforms have become a significant medium for expressing opinions, emotions, and sentiments, making sentiment analysis a crucial task in Natural Language Processing (NLP). While various sentiment analysis techniques have been proposed, existing studies often face challenges such as language dependency, platform-specific biases, lack of real-time processing, and limited multimodal analysis. This research explores the evolution of sentiment analysis in social media by leveraging cutting-edge NLP techniques, including transformer-based models (BERT, RoBERTa, GPT) and multimodal approaches. By addressing the limitations of previous studies, our research proposes a real-time, multilingual, and cross-platform sentiment analysis model capable of analyzing textual, audio, and visual content from diverse social media platforms (e.g., Twitter, Facebook, Instagram, and TikTok). Additionally, this study investigates the effectiveness of domain-specific sentiment analysis (e.g., political discourse, health-related discussions) to improve sentiment classification in specialized contexts. Benchmark datasets and experimental validation will be used to compare existing sentiment analysis models with our proposed approach. Our findings aim to enhance scalability, accuracy, and real-time adaptability of sentiment analysis in social media applications, ultimately contributing to improved decision-making in social monitoring, brand analysis, and crisis management.

APA, Harvard, Vancouver, ISO, and other styles

20

Sun, Lin. "Multimodal Emotion Recognition and Fluctuation: A Study on Sentiment Analysis of Online Public Opinion." Frontiers in Computing and Intelligent Systems 3, no. 1 (2023): 38–41. http://dx.doi.org/10.54097/fcis.v3i1.6021.

Full text

Abstract:

Emotion is an important way for individuals to express their views on the Internet and an important variable that shapes public opinion. Considering the multimodality of data, such as text, picture and video, and the subtlety of emotional expression, a multimodal sentiment analysis model that addresses content involving difference senses, such as sight, hearing and touch at the same time is very necessary. This study outlines the basic steps, classification strategies and research methods of sentimental analysis and acknowledges the differences between sentimental analyses on text, picture and video. As multimodal sentiment recognition is still in its initial stage, there’s still room for improvement in cross-disciplinary research on multimodal data of text, picture, audio, video in terms of weighted scoring, complex emotion and intensity recognition. It’s concluded that future studies should focus on the intensity of different emotions, multimodal data fusion and how weighted scoring influences an emotion recognition model and explore application possibilities.

APA, Harvard, Vancouver, ISO, and other styles

21

Silva, Nelson, Pedro J. S. Cardoso, and João M. F. Rodrigues. "Multimodal Sentiment Classifier Framework for Different Scene Contexts." Applied Sciences 14, no. 16 (2024): 7065. http://dx.doi.org/10.3390/app14167065.

Full text

Abstract:

Sentiment analysis (SA) is an effective method for determining public opinion. Social media posts have been the subject of much research, due to the platforms’ enormous and diversified user bases that regularly share thoughts on nearly any subject. However, on posts composed by a text–image pair, the written description may or may not convey the same sentiment as the image. The present study uses machine learning models for the automatic sentiment evaluation of pairs of text and image(s). The sentiments derived from the image and text are evaluated independently and merged (or not) to form the overall sentiment, returning the sentiment of the post and the discrepancy between the sentiments represented by the text–image pair. The image sentiment classification is divided into four categories—“indoor” (IND), “man-made outdoors” (OMM), “non-man-made outdoors” (ONMM), and “indoor/outdoor with persons in the background” (IOwPB)—and then ensembled into an image sentiment classification model (ISC), that can be compared with a holistic image sentiment classifier (HISC), showing that the ISC achieves better results than the HISC. For the Flickr sub-data set, the sentiment classification of images achieved an accuracy of 68.50% for IND, 83.20% for OMM, 84.50% for ONMM, 84.80% for IOwPB, and 76.45% for ISC, compared to 65.97% for the HISC. For the text sentiment classification, in a sub-data set of B-T4SA, an accuracy of 92.10% was achieved. Finally, the text–image combination, in the authors’ private data set, achieved an accuracy of 78.84%.

APA, Harvard, Vancouver, ISO, and other styles

22

Xu, Nan, Wenji Mao, and Guandan Chen. "Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 371–78. http://dx.doi.org/10.1609/aaai.v33i01.3301371.

Full text

Abstract:

As a fundamental task of sentiment analysis, aspect-level sentiment analysis aims to identify the sentiment polarity of a specific aspect in the context. Previous work on aspect-level sentiment analysis is text-based. With the prevalence of multimodal user-generated content (e.g. text and image) on the Internet, multimodal sentiment analysis has attracted increasing research attention in recent years. In the context of aspect-level sentiment analysis, multimodal data are often more important than text-only data, and have various correlations including impacts that aspect brings to text and image as well as the interactions associated with text and image. However, there has not been any related work carried out so far at the intersection of aspect-level and multimodal sentiment analysis. To fill this gap, we are among the first to put forward the new task, aspect based multimodal sentiment analysis, and propose a novel Multi-Interactive Memory Network (MIMN) model for this task. Our model includes two interactive memory networks to supervise the textual and visual information with the given aspect, and learns not only the interactive influences between cross-modality data but also the self influences in single-modality data. We provide a new publicly available multimodal aspect-level sentiment dataset to evaluate our model, and the experimental results demonstrate the effectiveness of our proposed model for this new task.

APA, Harvard, Vancouver, ISO, and other styles

23

Schmidt, Thomas, Manuel Burghardt, and Christian Wolff. "Toward Multimodal Sentiment Analysis of Historic Plays." Digital Humanities in the Nordic and Baltic Countries Publications 2, no. 1 (2019): 405–14. http://dx.doi.org/10.5617/dhnbpub.11114.

Full text

Abstract:

We present a case study as part of a work-in-progress project about multimodal sentiment analysis on historic German plays, taking Emilia Galotti by G. E. Lessing as our initial use case. We analyze the textual version and an audio version (audio book). We focus on ready-to-use sentiment analysis methods: For the textual component, we implement a naive lexicon-based approach and another approach that enhances the lexicon by means of several NLP methods. For the audio analysis, we use the free version of the Vokaturi tool. We compare the results of all approaches and evaluate them against the annotations of a human expert, which serves as a gold standard. For our use case, we can show that audio and text sentiment analysis behave very differently: textual sentiment analysis tends to predict sentiment as rather negative and audio sentiment as rather positive. Compared to the gold standard, the textual sentiment analysis achieves accuracies of 56% while the accuracy for audio sentiment analysis is only 32%. We discuss possible reasons for these mediocre results and give an outlook on further steps we want to pursue in the context of multimodal sentiment analysis on historic plays.

APA, Harvard, Vancouver, ISO, and other styles

24

Wang, Ziyue, and Junjun Guo. "Self-adaptive attention fusion for multimodal aspect-based sentiment analysis." Mathematical Biosciences and Engineering 21, no. 1 (2023): 1305–20. http://dx.doi.org/10.3934/mbe.2024056.

Full text

Abstract:

<abstract><p>Multimodal aspect term extraction (MATE) and multimodal aspect-oriented sentiment classification (MASC) are two crucial subtasks in multimodal sentiment analysis. The use of pretrained generative models has attracted increasing attention in aspect-based sentiment analysis (ABSA). However, the inherent semantic gap between textual and visual modalities poses a challenge in transferring text-based generative pretraining models to image-text multimodal sentiment analysis tasks. To tackle this issue, this paper proposes a self-adaptive cross-modal attention fusion architecture for joint multimodal aspect-based sentiment analysis (JMABSA), which is a generative model based on an image-text selective fusion mechanism that aims to bridge the semantic gap between text and image representations and adaptively transfer a textual-based pretraining model to the multimodal JMASA task. We conducted extensive experiments on two benchmark datasets, and the experimental results show that our model significantly outperforms other state of the art approaches by a significant margin.</p></abstract>

APA, Harvard, Vancouver, ISO, and other styles

25

Xiang, Yanxiong. "Analyzing sentiment and its application in deep learning: Consistent behavior across multiple occasions." Applied and Computational Engineering 33, no. 1 (2024): 18–27. http://dx.doi.org/10.54254/2755-2721/33/20230226.

Full text

Abstract:

This article offers a systematic review of the evolution in sentiment analysis techniques, moving from unimodal to multimodal to multi-occasion methodologies, with an emphasis on the integration and application of deep learning in sentiment analysis. Firstly, the paper presents the theoretical foundation of sentiment analysis, including the definition and classification of affect and emotion. It then delves into the pivotal technologies used in unimodal sentiment analysis, specifically within the domains of text, speech, and image analysis, examining feature extraction, representation, and classification models. Subsequently, the focus shifts to multimodal sentiment analysis. The paper offers a survey of widely utilized multimodal sentiment datasets, feature representation and fusion techniques, as well as deep learning-based multimodal sentiment analysis models such as attention networks and graph neural networks. It further addresses the application of these multimodal sentiment analysis techniques in social media, product reviews, and public opinion monitoring. Lastly, the paper underscores that challenges persist in the area of multimodal sentiment fusion, including data imbalance and disparities in feature expression. It calls for further research into cross-modal feature expression, dataset augmentation, and explainable modeling to enhance the performance of complex sentiment analysis across multiple occasions.

APA, Harvard, Vancouver, ISO, and other styles

26

Tang, Zixuan. "Review of Multimodal Sentiment Analysis Techniques." Applied and Computational Engineering 120, no. 1 (2024): 88–97. https://doi.org/10.54254/2755-2721/2025.18747.

Full text

Abstract:

By integrating multimodal data such as text, image, audio and video, the multimodal emotion analysis technology has significantly improved the accuracy and robustness of emotion recognition, and has become a research hotspot in the field of artificial intelligence. This paper summarizes the research background, main research content and application of multimodal emotion analysis, and discusses the future development trend. This paper emphasizes the importance of emotion analysis in understanding user intention, improving user experience and optimizing decision support system, and points out the limitations of traditional unimodal emotion analysis, and leads to the necessity of multimodal emotion analysis. It systematically reviews the key technologies and methods of multimodal emotion analysis, including data fusion strategies, deep learning models, attention mechanisms, and cross-modal association learning, and analyzes the challenges in current research, such as data set imbalance, inter-modal inconsistency, and real-time requirements. This paper provides a comprehensive research idea for subsequent researchers, including constructing high-quality datasets, optimizing algorithms, expanding application scenarios, strengthening interdisciplinary cooperation, and emphasizing ethics and privacy protection, to promote the further development of multimodal sentiment analysis technology in the field of artificial intelligence. With the improvement of computing power and the innovation of algorithms, multimodal emotion analysis technology will be even more important in the field of artificial intelligence in the future.

APA, Harvard, Vancouver, ISO, and other styles

27

Vinitha V. "Transforming Multimodal Sentiment Analysis and Classification with Fusion-Centric Deep Learning Techniques." Journal of Information Systems Engineering and Management 10, no. 10s (2025): 556–65. https://doi.org/10.52783/jisem.v10i10s.1419.

Full text

Abstract:

Multimodal Sentiment Analysis (MSA) has has become an important field of research, integrating information since text, visuals, video, and speech modalities to derive thorough physiological insights. Despite substantial advancements, current methodologies frequently regard various modalities uniformly, neglecting the preeminent impact of text during sentiment analysis and ignoring address of redundant and irrelevant data generated during multimodal fusion. This study proposes the Enhanced Multi-modal spatiotemporal attention network (EMSAN), to integrate key features across modalities designed to develop the robustness and generalization of sentiment and emotion predictionfrom video data. It consists various phases such as multimodal feature extraction, fusion, and detection of sentiment polarityto integrate key features across modalities. Extensive experiments carried out on the publicly available Multimodal Emotion Lines Dataset (MELD) that show that the suggested method performs with an accuracy of 92.28% in capturing complicated sentiment and emotion. The comparison showed that the suggested method worked better than other baseline models, which made it possible to develop sentiment analysis in a number of different multimodal frameworks.

APA, Harvard, Vancouver, ISO, and other styles

28

KHOLMATOV, SOBITBEK. "Multimodal Sentiment Analysis: A Study on Emotion Understanding and Classification by Integrating Text and Images." Academic Journal of Natural Science 1, no. 1 (2024): 51–56. https://doi.org/10.5281/zenodo.13909963.

Full text

Abstract:

The advent of social media and the proliferation of multimodal content have led to the growing importance of understanding sentiment in both text and images. Traditional sentiment analysis relies heavily on textual data, but recent trends indicate that integrating visual information can significantly improve sentiment prediction accuracy. This paper explores multimodal sentiment analysis, specifically focusing on emotion understanding and classification by integrating textual and image-based features. We review existing approaches, develop a hybrid deep learning model utilizing attention mechanisms and transformer architectures for multimodal sentiment classification, and evaluate its performance on benchmark datasets, including Twitter and Instagram data. Our findings suggest that multimodal approaches outperform text-only models, especially in more nuanced sentiment cases such as sarcasm, irony, or mixed emotions. Moreover, we address key challenges like feature fusion, domain adaptation, and the contextual alignment of visual and textual information. The results provide insights into optimizing multimodal fusion techniques to enhance real-world application performance.

APA, Harvard, Vancouver, ISO, and other styles

29

Li, Yutong. "Literature Review of Text and Multimodal Sentiment Analysis." Applied and Computational Engineering 150, no. 1 (2025): None. https://doi.org/10.54254/2755-2721/2025.22525.

Full text

Abstract:

Sentiment analysis, also known as opinion mining, is a crucial branch of Natural language processing, which focuses on recognizing, extracting, and quantifying sentiment tendencies, emotional intensity and specific emotion types in textual data. With the rapid development of the internet and communication, analyzing sentiment contained in textual data becomes important and crucial for understanding public opinion, consumer behavior, and emotional trends. This paper provides a comprehensive review of sentiment analysis in the range of its application, evolution, task types, methodology and future development by analyzing the literature of this field. Sentiment analysis has developed from traditional lexicon-based methods to modern deep learning methods like CNN, RNN and transformer model, which have significantly improved accuracy and robustness. This paper also discussed challenges in sentiment analysis like sarcasm detection and cross-lingual analysis, and proposed potential solutions. The findings aim to provide comprehensive insight for researchers and contribute to innovations in sentiment analysis.

APA, Harvard, Vancouver, ISO, and other styles

30

Najadat, Hassan, and Ftoon Abushaqra. "Multimodal Sentiment Analysis of Arabic Videos." Journal of Image and Graphics 6, no. 1 (2018): 39–43. http://dx.doi.org/10.18178/joig.6.1.39-43.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Jazyah, Yahia Hasan, and Intisar O. Hussien. "Multimodal Sentiment Analysis: A Comparison Study." Journal of Computer Science 14, no. 6 (2018): 804–18. http://dx.doi.org/10.3844/jcssp.2018.804.818.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Kumar, Akshi, and Geetanjali Garg. "Sentiment analysis of multimodal twitter data." Multimedia Tools and Applications 78, no. 17 (2019): 24103–19. http://dx.doi.org/10.1007/s11042-019-7390-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Soleymani, Mohammad, David Garcia, Brendan Jou, Björn Schuller, Shih-Fu Chang, and Maja Pantic. "A survey of multimodal sentiment analysis." Image and Vision Computing 65 (September 2017): 3–14. http://dx.doi.org/10.1016/j.imavis.2017.08.003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Shan, Qishang, Xiangsen Wei, and Ziyun Cai. "Modality-Invariant and -Specific Representations with Crossmodal Transformer for Multimodal Sentiment Analysis." Journal of Physics: Conference Series 2224, no. 1 (2022): 012024. http://dx.doi.org/10.1088/1742-6596/2224/1/012024.

Full text

Abstract:

Abstract Human emotion judgments usually receive information from multiple modalities such as language, audio, as well as facial expressions and gestures. Because different modalities are represented differently, multimodal data exhibit redundancy and complementarity, so a reasonable multimodal fusion approach is essential to improve the accuracy of sentiment analysis. Inspired by the Crossmodal Transformer for multimodal data fusion in the MulT (Multimodal Transformer) model, this paper adds the Crossmodal transformer for modal enhancement of different modal data in the fusion part of the MISA (Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis) model, and proposes three MISA-CT models. Tested on two publicly available multimodal sentiment analysis datasets MOSI and MOSEI, the experimental results of the models outperformed the original MISA model.

APA, Harvard, Vancouver, ISO, and other styles

35

Qi, Qingfu, Liyuan Lin, and Rui Zhang. "Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis." Information 12, no. 9 (2021): 342. http://dx.doi.org/10.3390/info12090342.

Full text

Abstract:

Multimodal sentiment analysis and emotion recognition represent a major research direction in natural language processing (NLP). With the rapid development of online media, people often express their emotions on a topic in the form of video, and the signals it transmits are multimodal, including language, visual, and audio. Therefore, the traditional unimodal sentiment analysis method is no longer applicable, which requires the establishment of a fusion model of multimodal information to obtain sentiment understanding. In previous studies, scholars used the feature vector cascade method when fusing multimodal data at each time step in the middle layer. This method puts each modal information in the same position and does not distinguish between strong modal information and weak modal information among multiple modalities. At the same time, this method does not pay attention to the embedding characteristics of multimodal signals across the time dimension. In response to the above problems, this paper proposes a new method and model for processing multimodal signals, which takes into account the delay and hysteresis characteristics of multimodal signals across the time dimension. The purpose is to obtain a multimodal fusion feature emotion analysis representation. We evaluate our method on the multimodal sentiment analysis benchmark dataset CMU Multimodal Opinion Sentiment and Emotion Intensity Corpus (CMU-MOSEI). We compare our proposed method with the state-of-the-art model and show excellent results.

APA, Harvard, Vancouver, ISO, and other styles

36

Liang, Yi, Turdi Tohti, Wenpeng Hu, et al. "Dynamic Tuning and Multi-Task Learning-Based Model for Multimodal Sentiment Analysis." Applied Sciences 15, no. 11 (2025): 6342. https://doi.org/10.3390/app15116342.

Full text

Abstract:

Multimodal sentiment analysis aims to uncover human affective states by integrating data from multiple sensory sources. However, previous studies have focused on optimizing model architecture, neglecting the impact of objective function settings on model performance. Given this, this study introduces a new framework, DMMSA, which utilizes the intrinsic correlation of sentiment signals and enhances the model’s understanding of complex sentiments. DMMSA incorporates coarse-grained sentiment analysis to reduce task complexity. Meanwhile, it embeds a contrastive learning mechanism within the modality, which decomposes unimodal features into similar and dissimilar ones, thus allowing for the simultaneous consideration of both unimodal and multimodal emotions. We tested DMMSA on the CH-SIMS, MOSI, and MOEI datasets. When only changing the optimization objectives, DMMSA achieved accuracy gains of 3.2%, 1.57%, and 1.95% over the baseline in five-class and seven-class classification tasks. In regression tasks, DMMSA reduced the Mean Absolute Error (MAE) by 1.46%, 1.5%, and 2.8% compared to the baseline.

APA, Harvard, Vancouver, ISO, and other styles

37

Das, Mala. "Exploring Sentiment Analysis across Text, Audio, and Video: A Comprehensive Approach and Future Directions." International Journal for Research in Applied Science and Engineering Technology 12, no. 5 (2024): 3129–34. http://dx.doi.org/10.22214/ijraset.2024.62244.

Full text

Abstract:

Abstract: This study presents a comprehensive exploration of sentiment analysis techniques across text, audio, and video modalities. Leveraging natural language processing (NLP), speech recognition, and computer vision algorithms, the research demonstrates the versatility and adaptability of sentiment analysis across diverse data sources. The necessity of such an approach lies in its ability to provide deeper insights into user emotions and opinions expressed in various mediums, including written text, spoken language, and visual content. Moreover, the study highlights the importance of sentiment analysis in understanding customer feedback, market trends, social media sentiments, and sentiment-aware recommendation systems. Future directions include advancing algorithmic accuracy and efficiency, integrating multimodal fusion techniques, and exploring applications in diverse domains, thereby paving the way for enhanced sentiment analysis capabilities and broader realworld applications

APA, Harvard, Vancouver, ISO, and other styles

38

Chen, Tianang. "A review of multimodal aspect-based sentiment analysis." Advances in Engineering Innovation 16, no. 6 (2025): None. https://doi.org/10.54254/2977-3903/2025.23984.

Full text

Abstract:

In the era of digital communication, the exponential growth of user-generated content across social media and online platforms has intensified the demand for effective emotion analysis tools. Traditional text-based sentiment analysis methods, however, often fall short in accurately capturing the nuances of human emotions due to their reliance on a single modality. Motivated by the need for more comprehensive and context-aware emotion recognition, this study systematically reviews the literature on both unimodal and multimodal aspect-level sentiment analysis. By comparing different approaches within the multimodal domain, we identify existing challenges and emerging trends in this research area. Our findings highlight the potential of integrating multiple modalitiessuch as text, images, and audioto enhance the precision of sentiment detection and suggest future directions for advancing multimodal sentiment analysis.

APA, Harvard, Vancouver, ISO, and other styles

39

Papti, Mr Madhu Kumar. "Multimodal Content Analysis Using Deep Learning." International Journal for Research in Applied Science and Engineering Technology 12, no. 5 (2024): 564–69. http://dx.doi.org/10.22214/ijraset.2024.61566.

Full text

Abstract:

Abstract: The multimodal content analysis platform combines sentiment analysis and neural style transfertechniques to process and improve various types of digital content. The sentiment analysis module utilizes natural language processing (NLP) algorithms, such as recurrent neural networks (RNNs) or transformer models like BERT, to extract emotional signals from textual, visual, and auditory inputs. Signals are classified into predefined sentiment categories, providing granular insights into the emotional context of the content. The platform employs neural style transfer algorithms, such as style transfer networks (NSTNs) or generative adversarial networks (GANs), to transfer stylistic attributes between texts. By training on a diverse range of artistic styles, the system learns to apply these styles to input text while preserving semantic meaning. This process enhances the visual representation of textual content, making it more appealing and engaging to users.

APA, Harvard, Vancouver, ISO, and other styles

40

He, Xilin, Haijian Liang, Boyi Peng, et al. "MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 2 (2025): 1309–17. https://doi.org/10.1609/aaai.v39i2.32120.

Full text

Abstract:

Multimodal sentiment analysis, which learns a model to process multiple modalities simultaneously and predict a sentiment value, is an important area of affective computing. Modeling sequential intra-modal information and enhancing cross-modal interactions are crucial to multimodal sentiment analysis. In this paper, we propose MSAmba, a novel hybrid Mamba-based architecture for multimodal sentiment analysis, consisting of two core blocks: Intra-Modal Sequential Mamba (ISM) block and Cross-Modal Hybrid Mamba (CHM) block, to comprehensively address the above-mentioned challenges with hybrid state space models. Firstly, the ISM block models the sequential information within each modality in a bi-directional manner with the assistance of global information. Subsequently, the CHM blocks explicitly model centralized cross-modal interaction with a hybrid combination of Mamba and attention mechanism to facilitate information fusion across modalities. Finally, joint learning of the intra-modal tokens and cross-modal tokens is utilized to predict the sentiment values. This paper serves as one of the pioneering works to unravel the outstanding performances and great research potential of Mamba-based methods in the task of multimodal sentiment analysis. Experiments on CMU-MOSI, CMU-MOSEI and CH-SIMS demonstrate the superior performance of the proposed MSAmba over prior Transformer-based and CNN-based methods.

APA, Harvard, Vancouver, ISO, and other styles

41

Suryavanshi, Pallavi. "Deep Learning for Multimodal Sentiment Analysis Integrating Text, Audio, and Video." International Journal of Recent Development in Engineering and Technology 14, no. 2 (2025): 1–5. https://doi.org/10.54380/ijrdet0225_01.

Full text

Abstract:

In the fields of artificial intelligence (AI) and natural language processing (NLP), sentiment analysis (SA) has become increasingly popular. Demand for automating user sentiment analysis of goods and services is rising. Videos, as opposed to just text, are becoming more and more common online for sharing opinions. This has made the use of various modalities in SA, known as Multimodal Sentiment Analysis (MSA), a significant field of study. MSA uses the most recent developments in deep learning and machine learning at several phases, such as sentiment polarity detection and multimodal feature extraction and fusion, with the goal of reducing error rates and enhancing performance. Multiple data sources, such as text, audio, and video, are used into MSA to improve sentiment classification accuracy. Using cutting-edge deep learning algorithms, this work integrates text, audio, and video characteristics to examine multimodal sentiment analysis. After outlining a framework for feature extraction, fusion, and data pre-processing, we assess the framework's performance against industry-standard benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

42

Xu, Dan, and Chi Zhang. "Exploring Feature Interactions for Multimodal Sentiment Analysis." International Journal of Cognitive Informatics and Natural Intelligence 19, no. 1 (2025): 1–19. https://doi.org/10.4018/ijcini.383754.

Full text

Abstract:

The study presents a sentiment analysis model to tackle two key challenges in multimodal sentiment analysis. The first challenge focuses on effectively capturing both modality-specific and modality-invariant features, which demands deep interactions across various modalities. The second challenge is to minimize interference among modalities, as such interference can degrade predictive accuracy. To address these issues, the modal feature interaction model utilizes RoBERTa and long short-term memory for feature extraction and analysis across text, audio, and video data. For the first challenge, the model employs self-attention and crossmodal attention mechanisms to facilitate modal feature interaction, enriching both intramodal and intermodal representations. To overcome the second challenge, the model reduces the L2 distance between multimodal representations during fusion, enabling seamless integration of intra- and intermodal features while capturing sentiment-related information for precise emotion prediction. Experimental results on two datasets reveal that the modal feature interaction model surpasses existing baseline models in sentiment analysis tasks.

APA, Harvard, Vancouver, ISO, and other styles

43

Wu, Jun, Xinli Zheng, Jiangpeng Wang, Junwei Wu, and Ji Wang. "AB-GRU: An attention-based bidirectional GRU model for multimodal sentiment fusion and analysis." Mathematical Biosciences and Engineering 20, no. 10 (2023): 18523–44. http://dx.doi.org/10.3934/mbe.2023822.

Full text

Abstract:

<abstract><p>Multimodal sentiment analysis is an important area of artificial intelligence. It integrates multiple modalities such as text, audio, video and image into a compact multimodal representation and obtains sentiment information from them. In this paper, we improve two modules, i.e., feature extraction and feature fusion, to enhance multimodal sentiment analysis and finally propose an attention-based two-layer bidirectional GRU (AB-GRU, gated recurrent unit) multimodal sentiment analysis method. For the feature extraction module, we use a two-layer bidirectional GRU network and connect two layers of attention mechanisms to enhance the extraction of important information. The feature fusion part uses low-rank multimodal fusion, which can reduce the multimodal data dimensionality and improve the computational rate and accuracy. The experimental results demonstrate that the AB-GRU model can achieve 80.9% accuracy on the CMU-MOSI dataset, which exceeds the same model type by at least 2.5%. The AB-GRU model also possesses a strong generalization capability and solid robustness.</p></abstract>

APA, Harvard, Vancouver, ISO, and other styles

44

Pyate, Prapullakumar Gowtham, and Balaji Srinivasan. "Advancing Automotive Business Strategy through Multimodal Aspect-Based Sentiment Analysis Using SSLU-GRU and YOLO." Involvement International Journal of Business 2, no. 3 (2025): 176–91. https://doi.org/10.62569/iijb.v2i3.137.

Full text

Abstract:

Sentiment analysis (SA) has become a key tool in understanding consumer feedback in the automotive industry. However, most existing models are limited to unimodal data and fail to capture fine-grained, aspect-level sentiments from multimodal sources such as text, images, and video. Additionally, privacy concerns related to user-generated content remain under-addressed. This study proposes a novel Multimodal Aspect-Based Sentiment Analysis (MASA) framework that integrates textual, visual, and video data for business decision-making in the automotive sector. The framework includes a BERT-based aspect dictionary for extracting domain-specific features, SCV-YOLOv5 for object segmentation in images and videos, and a GRU model enhanced with the Sinu-Sigmoidal Linear Unit (SSLU) activation function for sentiment classification. A K-Anonymity method augmented by Kendall's Tau and Spearman's Rank Correlation is employed to protect user privacy in sentiment data. The framework was evaluated using the MuSe Car dataset, encompassing over 60 car brands and 10,000 data samples per brand. The proposed model achieved 98.94% classification accuracy, outperforming baseline models such as BiLSTM and CNN in terms of Mean Absolute Error (0.14), RMSE (1.01), and F1-score (98.15%). Privacy-preservation tests also showed superior performance, with a 98% privacy-preserving rate and lower information loss than traditional methods. The results demonstrate that integrating multimodal input with deep learning and privacy-aware techniques significantly enhances the accuracy and reliability of sentiment analysis in automotive business contexts. The framework enables better alignment of consumer feedback with strategic decisions such as product development and targeted marketing.

APA, Harvard, Vancouver, ISO, and other styles

45

Zhou, Bainan, and Xu Li. "Multimodal Emotion Analysis Model based on Interactive Attention Mechanism." Frontiers in Computing and Intelligent Systems 3, no. 2 (2023): 67–73. http://dx.doi.org/10.54097/fcis.v3i2.7512.

Full text

Abstract:

In traditional multi-modal sentiment analysis, feature fusion is usually achieved by simple splicing, and multi-modal sentiment analysis is only trained as a single task, without considering the contribution of inter-modal information interaction to sentiment analysis and the correlation and constraint relationship between multi-modal and single-modal (text, video and audio) tasks. Therefore, a multi-task model based on interactive attention mechanism is proposed in this paper, which uses inter-modal attention mechanism and single-modal self-attention mechanism to train multi-modal sentiment analysis and single-modal sentiment analysis together, so as to make full use of inter-modal and inter-task information sharing, mutual complement, and reduce noise to improve the overall recognition performance. Experiments show that the proposed model performs well on MOSI and MOSEI common data sets for multimodal sentiment analysis.

APA, Harvard, Vancouver, ISO, and other styles

46

Liu, Yong, and Shiqiu Yu. "Web Semantic-Enhanced Multimodal Sentiment Analysis Using Multilayer Cross-Attention Fusion." International Journal on Semantic Web and Information Systems 20, no. 1 (2024): 1–29. http://dx.doi.org/10.4018/ijswis.360653.

Full text

Abstract:

Aiming at the existing multimodal sentiment analysis approaches, which include inadequate extraction of unimodal features, redundancy of independent modal features, insufficient analysis of semantic correlation between data and insufficient fusion, a Web-Semantic Enhanced Multimodal Sentiment Analysis Using Multilayer Cross-Attention Fusion is proposed. The model utilizes deep learning (including XLNet, ResNeSt, and convolutional neural networks) to extract high-level features from text, audio, and visual modes through self-attention mechanisms, and improves the accuracy of emotion classification through multimodal fusion. The results of experiments demonstrate that the suggested MCFMSA can achieve Acc-2, Acc-3, F1, and MAE values of 89.7%, 85.2%, 89.3%, and 0.466 on the CMU-MOSI dataset, respectively; and on the CMU-MOSEI dataset, Acc-2, Acc-3, F1, and MAE values of 88.7%, 82.5%, 86.5%, and 0.475. All of them are significantly improved compared to several other advanced multimodal sentiment analysis methods, which can enhance the accuracy of sentiment classification.

APA, Harvard, Vancouver, ISO, and other styles

47

Zhang, Tianzhi, Gang Zhou, Jicang Lu, Zhibo Li, Hao Wu, and Shuo Liu. "Text-image semantic relevance identification for aspect-based multimodal sentiment analysis." PeerJ Computer Science 10 (April 12, 2024): e1904. http://dx.doi.org/10.7717/peerj-cs.1904.

Full text

Abstract:

Aspect-based multimodal sentiment analysis (ABMSA) is an emerging task in the research of multimodal sentiment analysis, which aims to identify the sentiment of each aspect mentioned in multimodal sample. Although recent research on ABMSA has achieved some success, most existing models only adopt attention mechanism to interact aspect with text and image respectively and obtain sentiment output through multimodal concatenation, they often neglect to consider that some samples may not have semantic relevance between text and image. In this article, we propose a Text-Image Semantic Relevance Identification (TISRI) model for ABMSA to address the problem. Specifically, we introduce a multimodal feature relevance identification module to calculate the semantic similarity between text and image, and then construct an image gate to dynamically control the input image information. On this basis, an image auxiliary information is provided to enhance the semantic expression ability of visual feature representation to generate more intuitive image representation. Furthermore, we employ attention mechanism during multimodal feature fusion to obtain the text-aware image representation through text-image interaction to prevent irrelevant image information interfering our model. Experiments demonstrate that TISRI achieves competitive results on two ABMSA Twitter datasets, and then validate the effectiveness of our methods.

APA, Harvard, Vancouver, ISO, and other styles

48

Hu, Xiaoran, and Masayuki Yamamura. "Global Local Fusion Neural Network for Multimodal Sentiment Analysis." Applied Sciences 12, no. 17 (2022): 8453. http://dx.doi.org/10.3390/app12178453.

Full text

Abstract:

With the popularity of social networking services, people are increasingly inclined to share their opinions and feelings on social networks, leading to the rapid increase in multimodal posts on various platforms. Therefore, multimodal sentiment analysis has become a crucial research field for exploring users’ emotions. The complex and complementary interactions between images and text greatly heighten the difficulty of sentiment analysis. Previous works conducted rough fusion operations and ignored the study for fine fusion features for the sentiment task, which did not obtain sufficient interactive information for sentiment analysis. This paper proposes a global local fusion neural network (GLFN), which comprehensively considers global and local fusion features, aggregating these features to analyze user sentiment. The model first extracts overall fusion features by attention modules as modality-based global features. Then, coarse-to-fine fusion learning is applied to build local fusion features effectively. Specifically, the cross-modal module is used for rough fusion, and fine-grained fusion is applied to capture the interaction information between objects and words. Finally, we integrate all features to achieve a more reliable prediction. Extensive experimental results, comparisons, and visualization of public datasets demonstrate the effectiveness of the proposed model for multimodal sentiment classification.

APA, Harvard, Vancouver, ISO, and other styles

49

Bishwo Prakash Pokharel and Roshan Koju. "Sentiment analysis using Hierarchical Multimodal Fusion (HMF)." World Journal of Advanced Research and Reviews 14, no. 3 (2022): 296–303. http://dx.doi.org/10.30574/wjarr.2022.14.3.0549.

Full text

Abstract:

The rapid rise of platforms like YouTube and Facebook is due to the spread of tablets, smartphones, and other electronic devices. Massive volumes of data are collected every second on such a platform, demanding large-scale data processing. Because these data come in a variety of modalities, including text, audio, and video, sentiment categorization in various modalities and emotional computing are the most researched fields in today's scenario. Companies are striving to make use of this information by developing automated systems for a variety of purposes, such as automated customer feedback collection from user assessments, where the underlying challenge is to mine user sentiment connected to a specific product or service. The use of efficient and effective sentiment analysis tools is required to solve such a complex problem with such a big volume of data. The sentiment analysis of videos is investigated in this study, with data available in three modalities: audio, video, and text. In today's world, modality fusion is a major problem. This study introduces a novel approach to speaker-independent fusion: utilizing deep learning to fuse in a hierarchical fashion. The work tried to obtain improvement over simple concatenation-based fusion.

APA, Harvard, Vancouver, ISO, and other styles

50

Bishwo, Prakash Pokharel, and Koju Roshan. "Sentiment analysis using Hierarchical Multimodal Fusion (HMF)." World Journal of Advanced Research and Reviews 14, no. 3 (2022): 296–303. https://doi.org/10.5281/zenodo.7731548.

Full text

Abstract:

The rapid rise of platforms like YouTube and Facebook is due to the spread of tablets, smartphones, and other electronic devices. Massive volumes of data are collected every second on such a platform, demanding large-scale data processing. Because these data come in a variety of modalities, including text, audio, and video, sentiment categorization in various modalities and emotional computing are the most researched fields in today's scenario. Companies are striving to make use of this information by developing automated systems for a variety of purposes, such as automated customer feedback collection from user assessments, where the underlying challenge is to mine user sentiment connected to a specific product or service. The use of efficient and effective sentiment analysis tools is required to solve such a complex problem with such a big volume of data. The sentiment analysis of videos is investigated in this study, with data available in three modalities: audio, video, and text. In today's world, modality fusion is a major problem. This study introduces a novel approach to speaker-independent fusion: utilizing deep learning to fuse in a hierarchical fashion. The work tried to obtain improvement over simple concatenation-based fusion.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!