Academic literature on the topic 'Video based modality'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Video based modality.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Video based modality"

1

Oh, Changhyeon, and Yuseok Ban. "Cross-Modality Interaction-Based Traffic Accident Classification." Applied Sciences 14, no. 5 (2024): 1958. http://dx.doi.org/10.3390/app14051958.

Full text
Abstract:
Traffic accidents on the road lead to serious personal and material damage. Furthermore, preventing secondary accidents caused by traffic accidents is crucial. As various technologies for detecting traffic accidents in videos using deep learning are being researched, this paper proposes a method to classify accident videos based on a video highlight detection network. To utilize video highlight detection for traffic accident classification, we generate information using the existing traffic accident videos. Moreover, we introduce the Car Crash Highlights Dataset (CCHD). This dataset contains a variety of weather conditions, such as snow, rain, and clear skies, as well as multiple types of traffic accidents. We compare and analyze the performance of various video highlight detection networks in traffic accident detection, thereby presenting an efficient video feature extraction method according to the accident and the optimal video highlight detection network. For the first time, we have applied video highlight detection networks to the task of traffic accident classification. In the task, the most superior video highlight detection network achieves a classification performance of up to 79.26% when using video, audio, and text as inputs, compared to using video and text alone. Moreover, we elaborated the analysis of our approach in the aspects of cross-modality interaction, self-attention and cross-attention, feature extraction, and negative loss.
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Xingrun, Xiushan Nie, Xingbo Liu, Binze Wang, and Yilong Yin. "Modality correlation-based video summarization." Multimedia Tools and Applications 79, no. 45-46 (2020): 33875–90. http://dx.doi.org/10.1007/s11042-020-08690-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jang, Jaeyoung, Yuseok Ban, and Kyungjae Lee. "Dual-Modality Cross-Interaction-Based Hybrid Full-Frame Video Stabilization." Applied Sciences 14, no. 10 (2024): 4290. http://dx.doi.org/10.3390/app14104290.

Full text
Abstract:
This study aims to generate visually useful imagery by preventing cropping while maintaining resolution and minimizing the degradation of stability and distortion to enhance the stability of a video for Augmented Reality applications. The focus is placed on conducting research that balances maintaining execution speed with performance improvements. By processing Inertial Measurement Unit (IMU) sensor data using the Versatile Quaternion-based Filter algorithm and optical flow, our research first applies motion compensation to frames of input video. To address cropping, PCA-flow-based video stabilization is then performed. Furthermore, to mitigate distortion occurring during the full-frame video creation process, neural rendering is applied, resulting in the output of stabilized frames. The anticipated effect of using an IMU sensor is the production of full-frame videos that maintain visual quality while increasing the stability of a video. Our technique contributes to correcting video shakes and has the advantage of generating visually useful imagery at low cost. Thus, we propose a novel hybrid full-frame video stabilization algorithm that produces full-frame videos after motion compensation with an IMU sensor. Evaluating our method against three metrics, the Stability score, Distortion value, and Cropping ratio, results indicated that stabilization was more effectively achieved with robustness to flow inaccuracy when effectively using an IMU sensor. In particular, among the evaluation outcomes, within the “Turn” category, our method exhibited an 18% enhancement in the Stability score and a 3% improvement in the Distortion value compared to the average results of previously proposed full-frame video stabilization-based methods, including PCA flow, neural rendering, and DIFRINT.
APA, Harvard, Vancouver, ISO, and other styles
4

Nur, Azmina Rahmad, Amir As'ari Muhammad, Fathiah Ghazali Nurul, Shahar Norazman, and Anis Jasmin Sufri Nur. "A Survey of Video Based Action Recognition in Sports." Indonesian Journal of Electrical Engineering and Computer Science 11, no. 3 (2018): 987–93. https://doi.org/10.11591/ijeecs.v11.i3.pp987-993.

Full text
Abstract:
Sport performance analysis which is crucial in sport practice is used to improve the performance of athletes during the games. Many studies and investigation have been done in detecting different movements of player for notational analysis using either sensor based or video based modality. Recently, vision based modality has become the research interest due to the vast development of video transmission online. There are tremendous experimental studies have been done using vision based modality in sport but only a few review study has been done previously. Hence, we provide a review study on the video based technique to recognize sport action toward establishing the automated notational analysis system. The paper will be organized into four parts. Firstly, we provide an overview of the current existing technologies of the video based sports intelligence systems. Secondly, we review the framework of action recognition in all fields before we further discuss the implementation of deep learning in vision based modality for sport actions. Finally, the paper summarizes the further trend and research direction in action recognition for sports using video approach. We believed that this review study would be very beneficial in providing a complete overview on video based action recognition in sports.
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Beibei, Tongwei Ren, and Gangshan Wu. "Text-Guided Nonverbal Enhancement Based on Modality-Invariant and -Specific Representations for Video Speaking Style Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 21 (2025): 22354–62. https://doi.org/10.1609/aaai.v39i21.34391.

Full text
Abstract:
Video speaking style recognition (VSSR) aims to classify different types of conversations in videos, contributing significantly to understanding human interactions. A significant challenge in VSSR is the inherent similarity among conversation videos, which makes it difficult to distinguish between different speaking styles. Existing VSSR methods commit to providing available multimodal information to enhance the differentiation of conversation videos. Nevertheless, treating each modality equally leads to a suboptimal result for these methods due to text is inherently more aligned with conversation understanding compared to nonverbal modalities. To address this issue, we propose a text-guided nonverbal enhancement method, TNvE, which is composed of two core modules: 1) a text-guided nonverbal representation selection module employs cross-modal attention based on modality-invariant representations, picking out critical nonverbal information via textual guide; and 2) a modality-invariant and -specific representation decoupling module incorporates modality-specific representations and decouples them from modality-invariant representations, enabling a more comprehensive understanding of multimodal data. The former module encourages multimodal representations close to each other, while the latter module provides unique characteristics of each modality as a supplement. Extensive experiments are conducted on long-form video understanding datasets to demonstrate that TNvE is highly effective for VSSR, achieving a new state-of-the-art.
APA, Harvard, Vancouver, ISO, and other styles
6

Zong, Linlin, Wenmin Lin, Jiahui Zhou, et al. "Text-Guided Fine-grained Counterfactual Inference for Short Video Fake News Detection." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 1 (2025): 1237–45. https://doi.org/10.1609/aaai.v39i1.32112.

Full text
Abstract:
Detecting fake news in short videos is crucial for combating misinformation. Existing methods utilize topic modeling and co-attention mechanism, overlooking the modality heterogeneity and resulting in suboptimal performance. To address this issue, we introduce Text-Guided Fine-grained Counterfactual Inference for Short Video Fake News detection (TGFC-SVFN). TGFC-SVFN leverages modality bias removal and teacher-model-enhanced inter-modal knowledge distillation to integrate the heterogeneous modalities in short videos. Specifically, we use causality-based reasoning prompts guided text as teacher model, which then transfers knowledge to the video and audio student models. Subsequently, a multi-head attention mechanism is employed to fuse information from different modalities. In each module, we utilize fine-grained counterfactual inference based on a diffusion model to eliminate modality bias. Experimental results on publicly available fake short video news datasets demonstrate that our method outperforms state-of-the-art techniques.
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Yun, Su Wang, Jiawei Mo, and Xin Wei. "An Underwater Multi-Label Classification Algorithm Based on a Bilayer Graph Convolution Learning Network with Constrained Codec." Electronics 13, no. 16 (2024): 3134. http://dx.doi.org/10.3390/electronics13163134.

Full text
Abstract:
Within the domain of multi-label classification for micro-videos, utilizing terrestrial datasets as a foundation, researchers have embarked on profound endeavors yielding extraordinary accomplishments. The research into multi-label classification based on underwater micro-video datasets is still in the preliminary stage. There are some challenges: the severe color distortion and visual blurring in underwater visual imaging due to water molecular scattering and absorption, the difficulty in acquiring underwater short video datasets, the sparsity of underwater short video modality features, and the formidable task of achieving high-precision underwater multi-label classification. To address these issues, a bilayer graph convolution learning network based on constrained codec (BGCLN) is established in this paper. Specifically, modality-common representation is constructed to complete the representation of common information and specific information based on the constrained codec network. Then, the attention-driven double-layer graph convolutional network module is designed to mine the correlation information between labels and enhance the modality representation. Finally, the combined modality representation fusion and multi-label classification module are used to obtain the category classifier prediction. In the underwater video multi-label classification dataset (UVMCD), the effectiveness and high classification accuracy of the proposed BGCLN have been proved by numerous experiments.
APA, Harvard, Vancouver, ISO, and other styles
8

Rahmad, Nur Azmina, Muhammad Amir As'ari, Nurul Fathiah Ghazali, Norazman Shahar, and Nur Anis Jasmin Sufri. "A Survey of Video Based Action Recognition in Sports." Indonesian Journal of Electrical Engineering and Computer Science 11, no. 3 (2018): 987. http://dx.doi.org/10.11591/ijeecs.v11.i3.pp987-993.

Full text
Abstract:
<p class="Abstract">Sport performance analysis which is crucial in sport practice is used to improve the performance of athletes during the games. Many studies and investigation have been done in detecting different movements of player for notational analysis using either sensor based or video based modality. Recently, vision based modality has become the research interest due to the vast development of video transmission online. There are tremendous experimental studies have been done using vision based modality in sport but only a few review study has been done previously. Hence, we provide a review study on the video based technique to recognize sport action toward establishing the automated notational analysis system. The paper will be organized into four parts. Firstly, we provide an overview of the current existing technologies of the video based sports intelligence systems. Secondly, we review the framework of action recognition in all fields before we further discuss the implementation of deep learning in vision based modality for sport actions. Finally, the paper summarizes the further trend and research direction in action recognition for sports using video approach. We believed that this review study would be very beneficial in providing a complete overview on video based action recognition in sports.</p>
APA, Harvard, Vancouver, ISO, and other styles
9

Zawali, Bako, Richard A. Ikuesan, Victor R. Kebande, Steven Furnell, and Arafat A-Dhaqm. "Realising a Push Button Modality for Video-Based Forensics." Infrastructures 6, no. 4 (2021): 54. http://dx.doi.org/10.3390/infrastructures6040054.

Full text
Abstract:
Complexity and sophistication among multimedia-based tools have made it easy for perpetrators to conduct digital crimes such as counterfeiting, modification, and alteration without being detected. It may not be easy to verify the integrity of video content that, for example, has been manipulated digitally. To address this perennial investigative challenge, this paper proposes the integration of a forensically sound push button forensic modality (PBFM) model for the investigation of the MP4 video file format as a step towards automated video forensic investigation. An open-source multimedia forensic tool was developed based on the proposed PBFM model. A comprehensive evaluation of the efficiency of the tool against file alteration showed that the tool was capable of identifying falsified files, which satisfied the underlying assertion of the PBFM model. Furthermore, the outcome can be used as a complementary process for enhancing the evidence admissibility of MP4 video for forensic investigation.
APA, Harvard, Vancouver, ISO, and other styles
10

Waykar, Sanjay B., and C. R. Bharathi. "Multimodal Features and Probability Extended Nearest Neighbor Classification for Content-Based Lecture Video Retrieval." Journal of Intelligent Systems 26, no. 3 (2017): 585–99. http://dx.doi.org/10.1515/jisys-2016-0041.

Full text
Abstract:
AbstractDue to the ever-increasing number of digital lecture libraries and lecture video portals, the challenge of retrieving lecture videos has become a very significant and demanding task in recent years. Accordingly, the literature presents different techniques for video retrieval by considering video contents as well as signal data. Here, we propose a lecture video retrieval system using multimodal features and probability extended nearest neighbor (PENN) classification. There are two modalities utilized for feature extraction. One is textual information, which is determined from the lecture video using optical character recognition. The second modality utilized to preserve video content is local vector pattern. These two modal features are extracted, and the retrieval of videos is performed using the proposed PENN classifier, which is the extension of the extended nearest neighbor classifier, by considering the different weightages for the first-level and second-level neighbors. The performance of the proposed video retrieval is evaluated using precision, recall, and F-measure, which are computed by matching the retrieved videos and the manually classified videos. From the experimentation, we proved that the average precision of the proposed PENN+VQ is 78.3%, which is higher than that of the existing methods.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Video based modality"

1

Lepareur, Céline. "L’évaluation dans les enseignements scientifiques fondés sur l’investigation : effets de différentes modalités d'évaluation formative sur l’autorégulation des apprentissages." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAH019/document.

Full text
Abstract:
Pour de nombreux pays européens, ces dernières années ont été marquées par l’introduction des Enseignements Scientifiques Fondés sur l’Investigation (ESFI) dans les programmes scolaires. Deux objectifs sont poursuivis : offrir une image « plus conforme » de l’activité scientifique et éveiller l’intérêt des élèves en mettant l’accent sur leur rôle actif dans le processus d’apprentissage. Parallèlement à l’introduction de ces démarches, la mise en œuvre d’évaluations formatives ouvre une voie prometteuse pour répondre à ces objectifs. Elles sont en effet susceptibles de développer l’autorégulation des apprentissages des élèves et de constituer une aide pertinente pour les enseignants afin de réguler leur enseignement (Allal &amp; Mottier Lopez, 2007 ; Clark, 2012 ; Wiliam, 2010). Cette recherche doctorale traite des impacts de différentes modalités d’évaluation formative sur l’autorégulation des apprentissages des élèves dans le cadre spécifique des ESFI. Deux enjeux ont guidé ce travail. Le premier, empirique, concernait l’analyse des pratiques évaluatives des enseignants et de leurs effets sur les processus d’autorégulation. Le second visait la mise au point d’une méthodologie qui permette d’analyser les variables étudiées. Pour ce faire, nous avons procédé par enregistrements vidéo de séances de classe et construit des grilles d’indicateurs nous permettant d’analyser les processus in situ. Différentes situations d’enseignement ont été comparées. La première correspondait aux évaluations formatives telles que les enseignants la mettent en œuvre dans leurs pratiques quotidiennes. La deuxième concernait les pratiques évaluatives mises en œuvre par les mêmes enseignants l’année suivante, après qu’un retour réflexif sur leur séance ait été conduit. Nous montrons un meilleur équilibre dans l’usage des différentes modalités d’évaluation formative dans la deuxième situation, notamment vers une plus grande responsabilisation des élèves et mise en avant des pairs en tant que ressource. Les élèves font aussi preuve d’une autorégulation de leur comportement plus efficace au regard d’un temps passé à produire des stratégies de résolution plus important et d’un meilleur engagement dans la tâche. Des manques sont néanmoins relevés quant à la façon d’intégrer formellement l’outil d’autoévaluation à l’activité des élèves. Des pistes d’action pour combiner efficacement l’évaluation aux différentes tâches d’apprentissage sont alors proposées<br>Since a few years, in many European countries, Inquiry-Based Science Education (IBSE) has impacted science curriculums. Two goals are at stake: to provide an image of scientific activity more consistent with the actual activity of scientists, and to arouse students' interest by emphasizing their active role in the learning process. With the introduction of these measures, the implementation of formative assessments opens a promising way to meet these goals. They are in fact likely to develop students’ self-regulation and to provide relevant feedbacks for teachers to regulate their teaching (Allal &amp; Mottier Lopez, 2007; Clark, 2012; Wiliam, 2010). This doctoral research focuses on the impacts of different modalities of formative assessment on students’ self-regulation in the specific context of IBSE. Two issues have guided this work. The first one, of empirical nature, aimed at analyzing the evaluation practices of teachers and their effects on the self-regulatory process. The second one was the development of a methodology to analyze the variables at stake. To do this, we proceeded by recording videos of class sessions and constructed an indicator grid which allowed us to analyze in situ process. Different teaching situations were compared. The first corresponded to formative assessments such as teachers implement it in their daily practices. The second concerned the assessment practices implemented by the same teachers the following year, after a workshop where teachers were invited to reflect on their practice. Our results show a better balance in the use of different formative assessment methods in the second situation, especially towards a greater empowerment of students and better taking account peers as resources. Students also demonstrate more efficient self-regulation of their behavior in the light of a greater time spent to produce solving strategies and a better commitment to the task. The question of how to formally integrate the self-assessment tool to student activity is still pending. Some ideas to effectively combine the evaluation with different learning tasks are thus proposed
APA, Harvard, Vancouver, ISO, and other styles
2

"Comparison of Video and Audio Rating Modalities for Assessment of Provider Fidelity to a Family-Centered, Evidence-Based Program." Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.62639.

Full text
Abstract:
abstract: The current study assessed whether the interrater reliability and predictive validity of fidelity ratings differed significantly across the modalities of audio and video recordings. As empirically supported programs are moving to scale, attention to fidelity, the extent to which a program is delivered as intended, is essential because high fidelity is needed for positive program effects. Consequently, an important issue for prevention science is the development of feasible and acceptable methods for assessing fidelity. Currently, fidelity monitoring is rarely practiced, as the typical way of measuring fidelity, which uses video of sessions, is expensive, time-consuming, and intrusive. Audio recording has multiple advantages over video recording: 1) it is less intrusive; 2) equipment is less expensive; 3) recording procedures are simpler; 4) files are smaller so it takes less time to upload data and storage is less expensive; 5) recordings contain less identifying information; and 6) both clients and providers may be more willing to have sensitive interactions recorded with audio only. For these reasons, the use of audio recording may facilitate the monitoring of fidelity and increase the acceptability of both the intervention and implementation models, which may serve to broaden the scope of the families reached and improve the quality of the services provided. The current study compared the reliability and validity of fidelity ratings across audio and video rating modalities using 77 feedback sessions drawn from a larger randomized controlled trial of the Family Check-Up (FCU). Coders rated fidelity and caregiver in-session engagement at the age 2 feedback session. The composite fidelity and caregiver engagement scores were tested using path analysis to examine whether they predicted parenting behavior at age 3. Twenty percent of the sessions were double coded to assess interrater reliability. The interrater reliability and predictive validity of fidelity scores and caregiver engagement did not significantly differ across rating modality. However, caution must be used in interpreting these results because the interrater reliabilities in both conditions were low. Possible explanations for the low reliability, limitations of the current study, and directions for future research are discussed.<br>Dissertation/Thesis<br>Doctoral Dissertation Psychology 2019
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Video based modality"

1

Chang, Ting-Hsun, and Shaogang Gong. "Bayesian Modality Fusion for Tracking Multiple People with a Multi-Camera System." In Video-Based Surveillance Systems. Springer US, 2002. http://dx.doi.org/10.1007/978-1-4615-0913-4_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Farooq, Sehar Shahzad, Abdullah Aziz, Hammad Mukhtar, et al. "Multi-modality Based Affective Video Summarization for Game Players." In Communications in Computer and Information Science. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-81638-4_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Nguyen, Vuong D., Pranav Mantini, and Shishir K. Shah. "Cross-Modality Complementary Learning for Video-Based Cloth-Changing Person Re-identification." In Lecture Notes in Computer Science. Springer Nature Singapore, 2024. https://doi.org/10.1007/978-981-96-0885-0_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Zhu, Guangyu, Qingming Huang, and Yihong Gong. "Highlight Ranking for Broadcast Tennis Video Based on Multi-modality Analysis and Relevance Feedback." In Advances in Multimedia Information Processing - PCM 2008. Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-89796-5_69.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lerose, Luigi. "Identifying the Phonological Errors of Second-Modality, Second-Language (M2-L2) Novice Signers Through Video-Based Mock Tests." In Educational Linguistics. Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-33541-9_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Megala, G., P. Swarnalatha, S. Prabu, R. Venkatesan, and Anantharajah Kaneswaran. "Content-Based Video Retrieval With Temporal Localization Using a Deep Bimodal Fusion Approach." In Advances in Computational Intelligence and Robotics. IGI Global, 2023. http://dx.doi.org/10.4018/978-1-6684-8098-4.ch002.

Full text
Abstract:
Content-based video retrieval is a research field that aims to develop advanced techniques for automatically analyzing and retrieving video content. This process involves identifying and localizing specific moments in a video and retrieving videos with similar content. Deep bimodal fusion (DBF) is proposed that uses modified convolution neural networks (CNNs) to achieve considerable visual modality. This deep bimodal fusion approach relies on the integration of information from both visual and audio modalities. By combining information from both modalities, a more accurate model is developed for analyzing and retrieving video content. The main objective of this research is to improve the efficiency and effectiveness of video retrieval systems. By accurately identifying and localizing specific moments in videos, the proposed method has higher precision, recall, F1-score, and accuracy in precise searching that retrieves relevant videos more quickly and effectively.
APA, Harvard, Vancouver, ISO, and other styles
7

Xiao Zheng, Zhang Huimin, Wang Le, and Du Jiayi. "Multimodal Sentiment Analysis Based on Feature Selection and Recurrent Neural Network." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2018. https://doi.org/10.3233/978-1-61499-927-0-325.

Full text
Abstract:
Sentiment analysis is significant for social media. Although many achievements have been made, most of these are focused on either only text modality or only audio modality. In this paper, we proposed an architecture of multimodal sentiment analysis based on RNN and feature selection. It fully uses joint textual, audio and video features representation to perform multimodal sentiment analysis. By designing a feature selection component, we can select the informative features from the redundant and heterogeneous unimodal features to improve the performance of sentiment analysis model. At the same time, the additional RNN architecture can capture the dependency and information flow among the utterances of a video in a single modality and perform modality fusion at every timestep in the feature-level. The proposed method achieves better performance in sentiment prediction and shows improvement in performance over the baseline.
APA, Harvard, Vancouver, ISO, and other styles
8

Verma, Gyanendra K. "Spontaneous Emotion Recognition From Audio-Visual Signals." In Multimodal Affective Computing: Affective Information Representation, Modelling, and Analysis. BENTHAM SCIENCE PUBLISHERS, 2023. http://dx.doi.org/10.2174/9789815124453123010011.

Full text
Abstract:
This chapter introduces an emotion recognition system based on audio and video cues. For audio-based emotion recognition, we have explored various aspects of feature extraction and classification strategy and found that wavelet analysis is sound. We have shown comparative results for discriminating capabilities of various combinations of features using the Fisher Discriminant Analysis (FDA). Finally, we have combined the audio and video features using a feature-level fusion approach. All the experiments are performed with eNTERFACE and RML databases. Though we have applied multiple classifiers, SVM shows significantly improved performance with a single modality and fusion. The results obtained using fusion outperformed in contrast results based on a single modality of audio or video. We can conclude that fusion approaches are best as it is using complementary information from multiple modalities.
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Yang, Huanqin Ping, Dong Zhang, Qingying Sun, Shoushan Li, and Guodong Zhou. "Comment-Aware Multi-Modal Heterogeneous Pre-Training for Humor Detection in Short-Form Videos." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2023. http://dx.doi.org/10.3233/faia230438.

Full text
Abstract:
Conventional humor analysis normally focuses on text, text-image pair, and even long video (e.g., monologue) scenarios. However, with the recent rise of short-form video sharing, humor detection in this scenario has not yet gained much exploration. To the best of our knowledge, there are two primary issues associated with short-form video humor detection (SVHD): 1) At present, there are no ready-made humor annotation samples in this scenario, and it takes a lot of manpower and material resources to obtain a large number of annotation samples; 2) Unlike the more typical audio and visual modalities, the titles (as opposed to simultaneous transcription in the lengthy film) and associated interactive comments in short-form videos may convey apparent humorous clues. Therefore, in this paper, we first collect and annotate a video dataset from DouYin (aka. TikTok in the world), namely DY24h, with hierarchical comments. Then, we also design a novel approach with comment-aided multi-modal heterogeneous pre-training (CMHP) to introduce comment modality in SVHD. Extensive experiments and analysis demonstrate that our CMHP beats several existing video-based approaches on DY24h, and that the comments modality further aids a better comprehension of humor. Our dataset, code and pre-trained models are available at https://github.com/yliu-cs/CMHP.
APA, Harvard, Vancouver, ISO, and other styles
10

Nair, S. Anu H., and P. Aruna. "Fingerprint Iris Palmprint Multimodal Biometric Watermarking System Using Genetic Algorithm-Based Bacterial Foraging Optimization Algorithm." In Emerging Technologies in Intelligent Applications for Image and Video Processing. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-4666-9685-3.ch014.

Full text
Abstract:
With the wide spread utilization of Biometric identification systems, establishing the authenticity of biometric data itself has emerged as an important issue. In this chapter, a novel approach for creating a multimodal biometric system has been suggested. The multimodal biometric system is implemented using the different fusion schemes such as Average Fusion, Minimum Fusion, Maximum Fusion, Principal Component Analysis Fusion, Discrete Wavelet Transform Fusion, Stationary Wavelet Transform Fusion, Intensity Hue Saturation Fusion, Laplacian Gradient Fusion, Pyramid Gradient Fusion and Sparse Representation Fusion. In modality extraction level, the information extracted from different modalities is stored in vectors on the basis of their modality. These are then blended to produce a joint template which is the basis for the watermarking system. The fused image is applied as input along with the cover image to the Genetic Algorithm based Bacterial Foraging Optimization Algorithm watermarking system. The standard images are used as cover images and performance was compared.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Video based modality"

1

Zhang, Wendong, Pu Sun, Peng Lan, and Zhifang Liao. "MVC: Multi-stage video caption generation model based on multi-modality." In 2025 4th International Symposium on Computer Applications and Information Technology (ISCAIT). IEEE, 2025. https://doi.org/10.1109/iscait64916.2025.11010499.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Renkai, Xin Yuan, Wei Liu, and Xin Xu. "Event-based Video Person Re-identification via Cross-Modality and Temporal Collaboration." In ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025. https://doi.org/10.1109/icassp49660.2025.10889628.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Gutev, Alexander, and Carl James Debono. "A Fused Modality Human Action Recognition System Based on Motion Saliency in RGBD Videos." In 2025 3rd Cognitive Models and Artificial Intelligence Conference (AICCONF). IEEE, 2025. https://doi.org/10.1109/aicconf64766.2025.11064223.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shen, Xiaobo, Qianxin Huang, Long Lan, and Yuhui Zheng. "Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval." In Thirty-Third International Joint Conference on Artificial Intelligence {IJCAI-24}. International Joint Conferences on Artificial Intelligence Organization, 2024. http://dx.doi.org/10.24963/ijcai.2024/136.

Full text
Abstract:
As video-based social networks continue to grow exponentially, there is a rising interest in video retrieval using natural language. Cross-modal hashing, which learns compact hash code for encoding multi-modal data, has proven to be widely effective in large-scale cross-modal retrieval, e.g., image-text retrieval, primarily due to its computation and storage efficiency. However, when applied to video-text retrieval, existing cross-modal hashing methods generally extract features at the frame- or word-level for videos and texts individually, thereby ignoring their long-term dependencies. To address this issue, we propose Contrastive Transformer Cross-Modal Hashing (CTCH), a novel approach designed for video-text retrieval task. CTCH employs bidirectional transformer encoder to encode video and text and leverages their long-term dependencies. CTCH further introduces supervised multi-modality contrastive loss that effectively exploits inter-modality and intra-modality similarities among videos and texts. The experimental results on three video benchmark datasets demonstrate that CTCH outperforms the state-of-the-arts in video-text retrieval tasks.
APA, Harvard, Vancouver, ISO, and other styles
5

Lu, Guo, Tianxiong Zhong, Jing Geng, Qiang Hu, and Dong Xu. "Learning based Multi-modality Image and Video Compression." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00599.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Xiao, Jinchuan, Yinhang Tang, Jianzhu Guo, et al. "3DMA: A Multi-modality 3D Mask Face Anti-spoofing Database." In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2019. http://dx.doi.org/10.1109/avss.2019.8909845.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jianxun, Zhao, and Wu Bo. "Video Semantic Concept Detection Based on Multi-modality Fusion." In 2012 International Conference on Computer Science and Electronics Engineering (ICCSEE). IEEE, 2012. http://dx.doi.org/10.1109/iccsee.2012.83.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bhatt, Parth Lalitkumar, Dhruva Shah, Christopher Silver, Wandong Zhang, and Thangarajah Akilan. "Dual-Modality Deep Feature-based Anomaly Detection for Video Surveillance." In 2023 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE). IEEE, 2023. http://dx.doi.org/10.1109/ccece58730.2023.10288767.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Saha, Priyabrata, Burhan A. Mudassar, and Saibal Mukhopadhyay. "Adaptive Control of Camera Modality with Deep Neural Network-Based Feedback for Efficient Object Tracking." In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2018. http://dx.doi.org/10.1109/avss.2018.8639423.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Weber, Raphaël, Vincent Barrielle, Catherine Soladié, and Renaud Séguier. "High-Level Geometry-based Features of Video Modality for Emotion Prediction." In MM '16: ACM Multimedia Conference. ACM, 2016. http://dx.doi.org/10.1145/2988257.2988262.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!