Log in

Relevant bibliographies by topics / Video text / Journal articles

To see the other types of publications on this topic, follow the link: Video text.

Journal articles on the topic 'Video text'

Author: Grafiati

Published: 25 May 2024

Last updated: 31 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Video text.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Huang, Bin, Xin Wang, Hong Chen, Houlun Chen, Yaofei Wu, and Wenwu Zhu. "Identity-Text Video Corpus Grounding." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 4 (2025): 3608–16. https://doi.org/10.1609/aaai.v39i4.32375.

Full text

Abstract:

Video corpus grounding (VCG), which aims to retrieve relevant video moments from a video corpus, has attracted significant attention in the multimedia research community. However, the existing VCG setting primarily focuses on matching textual descriptions with videos and ignores the distinct visual identities in the videos, thus resulting in inaccurate understanding of video content and deteriorated retrieval performances. To address this limitation, we introduce a novel task, Identity-Text Video Corpus Grounding (ITVCG), which simultaneously utilize textual descriptions and visual identities

APA, Harvard, Vancouver, ISO, and other styles

2

Avinash, N. Bhute, and Meshram B.B. "Text Based Approach For Indexing And Retrieval Of Image And Video: A Review." Advances in Vision Computing: An International Journal (AVC) 1, no. 1 (2014): 27–38. https://doi.org/10.5281/zenodo.3554868.

Full text

Abstract:

Text data present in multimedia contain useful information for automatic annotation, indexing. Extracted information used for recognition of the overlay or scene text from a given video or image. The Extracted text can be used for retrieving the videos and images. In this paper, firstly, we are discussed the different techniques for text extraction from images and videos. Secondly, we are reviewed the techniques for indexing and retrieval of image and videos by using extracted text.

APA, Harvard, Vancouver, ISO, and other styles

3

Avinash, N. Bhute, and Meshram B.B. "Text Based Approach For Indexing And Retrieval Of Image And Video: A Review." Advances in Vision Computing: An International Journal (AVC) 1, no. 1 (2014): 27–38. https://doi.org/10.5281/zenodo.3357696.

Full text

Abstract:

Text data present in multimedia contain useful information for automatic annotation, indexing. Extracted information used for recognition of the overlay or scene text from a given video or image. The Extracted text can be used for retrieving the videos and images. In this paper, firstly, we are discussed the different techniques for text extraction from images and videos. Secondly, we are reviewed the techniques for indexing and retrieval of image and videos by using extracted text.

APA, Harvard, Vancouver, ISO, and other styles

4

V, Divya, Prithica G, and Savija J. "Text Summarization for Education in Vernacular Languages." International Journal for Research in Applied Science and Engineering Technology 11, no. 7 (2023): 175–78. http://dx.doi.org/10.22214/ijraset.2023.54589.

Full text

Abstract:

Abstract: This project proposes a video summarizing system based on natural language processing (NLP) and Machine Learning to summarize the YouTube video transcripts without losing the key elements. The quantity of videos available on web platforms is steadily expanding. The content is made available globally, primarily for educational purposes. Additionally, educational content is available on YouTube, Facebook, Google, and Instagram. A significant issue of extracting information from videos is that unlike an image, where data can be collected from a single frame, a viewer must watch the enti

APA, Harvard, Vancouver, ISO, and other styles

5

Namrata, Dave, and S. Holia Mehfuza. "News Story Retrieval Based on Textual Query." International Journal of Engineering and Advanced Technology (IJEAT) 9, no. 3 (2021): 2918–22. https://doi.org/10.5281/zenodo.5589205.

Full text

Abstract:

This paper presents news video retrieval using text query for Gujarati language news videos. Due to the fact that Broadcasted Video in India is lacking in metadata information such as closed captioning, transcriptions etc., retrieval of videos based on text data is trivial task for most of the Indian language video. To retrieve specific story based on text query in regional language is the key idea behind our approach. Broadcast video is segmented to get shots representing small news stories. To represent each shot efficiently, key frame extraction using singular value decomposition and rank o

APA, Harvard, Vancouver, ISO, and other styles

6

Doran, Michael, Adrian Barnett, Joan Leach, William Lott, Katie Page, and Will Grant. "Can video improve grant review quality and lead to more reliable ranking?" Research Ideas and Outcomes 3 (February 1, 2017): e11931. https://doi.org/10.3897/rio.3.e11931.

Full text

Abstract:

Multimedia video is rapidly becoming mainstream, and many studies indicate that it is a more effective communication medium than text. In this project we AIM to test if videos can be used, in place of text-based grant proposals, to improve communication and increase the reliability of grant ranking. We will test if video improves reviewer comprehension (AIM 1), if external reviewer grant scores are more consistent with video (AIM 2), and if mock Australian Research Council (ARC) panels award more consistent scores when grants are presented as videos (AIM 3). This will be the first study to eva

APA, Harvard, Vancouver, ISO, and other styles

7

Jiang, Ai Wen, and Gao Rong Zeng. "Multi-information Integrated Method for Text Extraction from Videos." Advanced Materials Research 225-226 (April 2011): 827–30. http://dx.doi.org/10.4028/www.scientific.net/amr.225-226.827.

Full text

Abstract:

Video text provides important semantic information in video content analysis. However, video text with complex background has a poor recognition performance for OCR. Most of the previous approaches to extracting overlay text from videos are based on traditional binarization and give little attention on multi-information integration, especially fusing the background information. This paper presents an effective method to precisely extract characters from videos to enable it for OCR with a good recognition performance. The proposed method combines multi-information together including background

APA, Harvard, Vancouver, ISO, and other styles

8

Ma, Fan, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, and Yi Yang. "Stitching Segments and Sentences towards Generalization in Video-Text Pre-training." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (2024): 4080–88. http://dx.doi.org/10.1609/aaai.v38i5.28202.

Full text

Abstract:

Video-language pre-training models have recently achieved remarkable results on various multi-modal downstream tasks. However, most of these models rely on contrastive learning or masking modeling to align global features across modalities, neglecting the local associations between video frames and text tokens. This limits the model’s ability to perform fine-grained matching and generalization, especially for tasks that selecting segments in long videos based on query texts. To address this issue, we propose a novel stitching and matching pre-text task for video-language pre-training that enco

APA, Harvard, Vancouver, ISO, and other styles

9

Liu, Yang, Shudong Huang, Deng Xiong, and Jiancheng Lv. "Learning Dynamic Similarity by Bidirectional Hierarchical Sliding Semantic Probe for Efficient Text Video Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 6 (2025): 5667–75. https://doi.org/10.1609/aaai.v39i6.32604.

Full text

Abstract:

Text-video retrieval is a foundation task in multi-modal research which aims to align texts and videos in the embedding space. The key challenge is to learn the similarity between videos and texts. A conventional approach involves directly aligning video-text pairs using cosine similarity. However, due to the disparity in the information conveyed by videos and texts, i.e., a single video can be described from multiple perspectives, the retrieval accuracy is suboptimal. An alternative approach employs cross-modal interaction to enable videos to dynamically acquire distinct features from various

APA, Harvard, Vancouver, ISO, and other styles

10

Sun, Shangkun, Xiaoyu Liang, Songlin Fan, Wenxu Gao, and Wei Gao. "VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 7 (2025): 7105–13. https://doi.org/10.1609/aaai.v39i7.32763.

Full text

Abstract:

Text-driven video editing has recently experienced rapid development. Despite this, evaluating edited videos remains a considerable challenge. Current metrics tend to fail to align with human perceptions, and effective quantitative metrics for video editing are still notably absent. To address this, we introduce VE-Bench, a benchmark suite tailored to the assessment of text-driven video editing. This suite includes VE-Bench DB, a video quality assessment (VQA) database for video editing. VE-Bench DB encompasses a diverse set of source videos featuring various motions and subjects, along with m

APA, Harvard, Vancouver, ISO, and other styles

11

Yariv, Guy, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, and Yossi Adi. "Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (2024): 6639–47. http://dx.doi.org/10.1609/aaai.v38i7.28486.

Full text

Abstract:

We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresponding segment of that video. We utilize an existing text-conditioned video generation model and a pre-trained audio encoder model. The proposed method is based on a lightweight adaptor network, which

APA, Harvard, Vancouver, ISO, and other styles

12

Rachidi, Youssef. "Text Detection in Video for Video Indexing." International Journal of Computer Trends and Technology 68, no. 4 (2020): 96–99. http://dx.doi.org/10.14445/22312803/ijctt-v68i4p117.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Cao, Shuqiang, Bairui Wang, Wei Zhang, and Lin Ma. "Visual Consensus Modeling for Video-Text Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (2022): 167–75. http://dx.doi.org/10.1609/aaai.v36i1.19891.

Full text

Abstract:

In this paper, we propose a novel method to mine the commonsense knowledge shared between the video and text modalities for video-text retrieval, namely visual consensus modeling. Different from the existing works, which learn the video and text representations and their complicated relationships solely based on the pairwise video-text data, we make the first attempt to model the visual consensus by mining the visual concepts from videos and exploiting their co-occurrence patterns within the video and text modalities with no reliance on any additional concept annotations. Specifically, we buil

APA, Harvard, Vancouver, ISO, and other styles

14

Liu, Yi, Yue Zhang, Haidong Hu, Xiaodong Liu, Lun Zhang, and Ruijun Liu. "An Extended Text Combination Classification Model for Short Video Based on Albert." Journal of Sensors 2021 (October 16, 2021): 1–7. http://dx.doi.org/10.1155/2021/8013337.

Full text

Abstract:

With the rise and rapid development of short video sharing websites, the number of short videos on the Internet has been growing explosively. The organization and classification of short videos have become the basis for the effective use of short videos, which is also a problem faced by major short video platforms. Aiming at the characteristics of complex short video content categories and rich extended text information, this paper uses methods in the text classification field to solve the short video classification problem. Compared with the traditional way of classifying and understanding sh

APA, Harvard, Vancouver, ISO, and other styles

15

Chiu, Chih-Yi, Po-Chih Lin, Sheng-Yang Li, Tsung-Han Tsai, and Yu-Lung Tsai. "Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment." IEEE Transactions on Circuits and Systems for Video Technology 22, no. 7 (2012): 999–1013. http://dx.doi.org/10.1109/tcsvt.2012.2189478.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Bodyanskaya, Alisa, and Kapitalina Sinegubova. "Music Video as a Poetic Interpretation." Virtual Communication and Social Networks 2023, no. 2 (2023): 47–55. http://dx.doi.org/10.21603/2782-4799-2023-2-2-47-55.

Full text

Abstract:

This article introduces the phenomenon of videopoetry as a hybrid product of mass media whose popularity is based on intermediality, i.e., the cumulative effect on different perception channels. Videopoetry is a productive form of verbal creativity in the contemporary media culture with its active reception of art. The research featured poems by W. B. Yeats, T. S. Eliot, and W. H. Auden presented as videos and the way they respond to someone else's poetic word. The authors analyzed 15 videos by comparing the original text and the video sequence in line with the method developed by N. V. Barkov

APA, Harvard, Vancouver, ISO, and other styles

17

Letroiwen, Kornelin, Aunurrahman ., and Indri Astuti. "PENGEMBANGAN VIDEO ANIMASI UNTUK MENINGKATKAN KEMAMPUAN READING COMPREHENSION FACTUAL REPORT TEXT." Jurnal Teknologi Pendidikan (JTP) 16, no. 1 (2023): 16. http://dx.doi.org/10.24114/jtp.v16i1.44842.

Full text

Abstract:

Abstrak: Penelitian ini bertujuan untuk mengembangkan desain video animasi untuk pembelajaran Bahasa Inggris materi factual report text. Metode penelitian ini adalah Research and Development dengan model desain pengembangan ASSURE. Sebanyak 42 siswa kelas XI SMKN 1 Ngabang terlibat dalam penelitian ini. Adapun data yang diperoleh dianalisis secara kualitatif dan kuantitatif. Profil video animasi menampilkan video animasi dengan karakter animasi 2D dan terdiri dari cover, profil pengembang, salam (greeting), kompetensi dasar, tujuan pembelajaran, definisi, fungsi sosial, struktur teks, unsur ke

APA, Harvard, Vancouver, ISO, and other styles

18

Ghorpade, Jayshree, Raviraj Palvankar, Ajinkya Patankar, and Snehal Rathi. "Extracting Text from Video." Signal & Image Processing : An International Journal 2, no. 2 (2011): 103–12. http://dx.doi.org/10.5121/sipij.2011.2209.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Wadaskar, Ghanshyam, Sanghdip Udrake, Vipin Bopanwar, Shravani Upganlawar, and Prof Minakshi Getkar. "Extract Text from Video." International Journal for Research in Applied Science and Engineering Technology 12, no. 5 (2024): 2881–83. http://dx.doi.org/10.22214/ijraset.2024.62287.

Full text

Abstract:

Abstract: The code import the YoutubeTranscriptionApi from youtube_transcription_api libray, The YouTube video ID is defined. The transcription data for the given video ID is fetched using get_transcription method. The transcription text is extracted from the data and stroed in the transcription variable. The transcriptipn is split into lines and thenjoined back into a single string. Finally the processed transcript is writen into a text file name “Love.text” with UTF-8 encoding. The commented-out code block is an alternative way to write the transcript into a text file using the open function

APA, Harvard, Vancouver, ISO, and other styles

20

Vishwashanthi, M. "Text-To-Video Generator." International Scientific Journal of Engineering and Management 04, no. 05 (2025): 1–9. https://doi.org/10.55041/isjem03655.

Full text

Abstract:

Abstract: The integration of artificial intelligence in multimedia content creation has paved the way for innovative applications like text-to-video generation. This research presents an advanced Text-to-Video Generator capable of converting textual inputs into coherent video narratives. The system is further enhanced with multilingual support for Indian languages and the inclusion of subtitles, broadening its accessibility and user engagement. By leveraging natural language processing and machine learning techniques, the application ensures accurate interpretation and representation of divers

APA, Harvard, Vancouver, ISO, and other styles

21

Luo, Dezhao, Shaogang Gong, Jiabo Huang, Hailin Jin, and Yang Liu. "Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 6 (2025): 5847–55. https://doi.org/10.1609/aaai.v39i6.32624.

Full text

Abstract:

Video moment retrieval (VMR) aims to locate the most likely video moment(s) corresponding to a text query in untrimmed videos. Training of existing methods is limited by the lack of diverse and generalisable VMR datasets, hindering their ability to generalise moment-text associations to queries containing novel semantic concepts (unseen both visually and textually in a training source domain). For model generalisation to novel semantics, existing methods rely heavily on assuming to have access to both video and text sentence pairs from a target domain in addition to the source domain pair-wise

APA, Harvard, Vancouver, ISO, and other styles

22

Godha, Ashima, and Puja Trivedi. "CNN Filter based Text Region Segmentation from Lecture Video and Extraction using NeuroOCR." SMART MOVES JOURNAL IJOSCIENCE 5, no. 7 (2019): 7. http://dx.doi.org/10.24113/ijoscience.v5i7.218.

Full text

Abstract:

Lecture videos are rich with textual information and to be able to understand the text is quite useful for larger video understanding/analysis applications. Though text recognition from images have been an active research area in computer vision, text in lecture videos has mostly been overlooked. In this paper, text extraction from lecture videos are focused. For text extraction from different types of lecture videos such as slides, whiteboard lecture videos, paper lecture videos, etc. The text extraction, the text regions are segmented in video frames and extracted using recurrent neural netw

APA, Harvard, Vancouver, ISO, and other styles

23

Tahwiana, Zein, Regina Regina, Eka Fajar Rahmani, Yohanes Gatot Sutapa Yuliana, and Wardah Wardah. "The ENHANCING NARRATIVE WRITING SKILLS THROUGH ANIMATION VIDEOS IN THE EFL CLASSROOM." Getsempena English Education Journal 12, no. 1 (2025): 1–13. https://doi.org/10.46244/geej.v12i1.2902.

Full text

Abstract:

This study examined the use of animation videos to teach narrative text writing to SMP Negeri 21 Pontianak eighth-grade students. The study used the 8B class of SMP Negeri 21 Pontianak as the research sample, consisting of 35 students taken from cluster random sampling from a population of 209 students. This pre-experimental study also used a group pre-test and post-test design, consisting of three procedures: pre-test, treatment, and post-test. This study was conducted in two treatments for 120 minutes per meeting by using animation videos to teach narrative text. Two methods were used in the

APA, Harvard, Vancouver, ISO, and other styles

24

Nazmun, Nessa Moon, Salehin Imrus, Parvin Masuma, et al. "Natural language processing based advanced method of unnecessary video detection." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 6 (2021): 5411–19. https://doi.org/10.11591/ijece.v11i6.pp5411-5419.

Full text

Abstract:

In this study we have described the process of identifying unnecessary video using an advanced combined method of natural language processing and machine learning. The system also includes a framework that contains analytics databases and which helps to find statistical accuracy and can detect, accept or reject unnecessary and unethical video content. In our video detection system, we extract text data from video content in two steps, first from video to MPEG-1 audio layer 3 (MP3) and then from MP3 to WAV format. We have used the text part of natural language processing to analyze and prepare

APA, Harvard, Vancouver, ISO, and other styles

25

Alabsi, Thuraya. "Effects of Adding Subtitles to Video via Apps on Developing EFL Students’ Listening Comprehension." Theory and Practice in Language Studies 10, no. 10 (2020): 1191. http://dx.doi.org/10.17507/tpls.1010.02.

Full text

Abstract:

It is unclear if using videos and education apps in learning adds additional value to students’ listening comprehension. This study assesses the impact of adding text to videos on English as a Foreign Language (EFL) learners’ listening comprehension. The participants were 76 prep college EFL students from Taibah University, divided into two groups. The semi-experimental measure was employed to compare the experimental group and the control group. The experimental group watched an English learning video and then wrote text subtitles relating to the video using apps, and later took a listening t

APA, Harvard, Vancouver, ISO, and other styles

26

Wu, Peng, Wanshun Su, Xiangteng He, Peng Wang, and Yanning Zhang. "VarCMP: Adapting Cross-Modal Pre-Training Models for Video Anomaly Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 8 (2025): 8423–31. https://doi.org/10.1609/aaai.v39i8.32909.

Full text

Abstract:

Video anomaly retrieval (VAR) aims to retrieve pertinent abnormal or normal videos from collections of untrimmed and long videos through cross-modal requires such as textual descriptions and synchronized audios. Cross-modal pre-training (CMP) models, by pre-training on large-scale cross-modal pairs, e.g., image and text, can learn the rich associations between different modalities, and this cross-modal association capability gives CMP an advantage in conventional retrieval tasks. Inspired by this, how to utilize the robust cross-modal association capabilities of CMP in VAR to search crucial vi

APA, Harvard, Vancouver, ISO, and other styles

27

Bi, Xiuli, Jian Lu, Bo Liu, et al. "CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 2 (2025): 1871–79. https://doi.org/10.1609/aaai.v39i2.32182.

Full text

Abstract:

Benefiting from large-scale pre-training of text-video pairs, current text-to-video (T2V) diffusion models can generate high-quality videos from the text description. Besides, given some reference images or videos, the parameter-efficient fine-tuning method, i.e. LoRA, can generate high-quality customized concepts, e.g., the specific subject or the motions from a reference video. However, combining the trained multiple concepts from different references into a single network shows obvious artifacts. To this end, we propose CustomTTT, where we can joint custom the appearance and the motion of t

APA, Harvard, Vancouver, ISO, and other styles

28

Gawade, Shruti. "A Deep Learning Approach to Text-to-Video Generation." International Journal for Research in Applied Science and Engineering Technology 12, no. 6 (2024): 2489–93. http://dx.doi.org/10.22214/ijraset.2024.63513.

Full text

Abstract:

Abstract: In the ever-evolving landscape of multimedia content creation, there is a growing demand for automated tools that can seamlessly transform textual descriptions into engaging and realistic videos. This research paper introduces a state-of-the-art Text to Video Generation Model, a groundbreaking approach designed to bridge the gap between textual input and visually compelling video output. Leveraging advanced deep learning techniques, the proposed model not only captures the semantic nuances of the input text but also generates dynamic and contextually relevant video sequences. The mod

APA, Harvard, Vancouver, ISO, and other styles

29

张, 宇. "Video Retrieval Model Based on Video Text Alignment." Journal of Image and Signal Processing 14, no. 03 (2025): 349–61. https://doi.org/10.12677/jisp.2025.143032.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

P, Ilampiray, Naveen Raju D, Thilagavathy A, et al. "Video Transcript Summarizer." E3S Web of Conferences 399 (2023): 04015. http://dx.doi.org/10.1051/e3sconf/202339904015.

Full text

Abstract:

In today’s world, a large number of videos are uploaded in everyday, which contains information about something. The major challenge is to find the right video and understand the correct content, because there are lot of videos available some videos will contain useless content and even though the perfect content available that content should be required to us. If we not found right one it wastes your full effort and full time to extract the correct usefull information. We propose an innovation idea which uses NLP processing for text extraction and BERT Summarization for Text Summarization. Th

APA, Harvard, Vancouver, ISO, and other styles

31

Choudhary, Waffa. "Text Extraction from Videos Using the Combination of Edge-Based and Stroke Filter Techniques." Advanced Materials Research 403-408 (November 2011): 1068–74. http://dx.doi.org/10.4028/www.scientific.net/amr.403-408.1068.

Full text

Abstract:

A novel method by combining the edge-based and stroke filter based text extraction in the videos is presented. Several researchers have used edge-based and filter based text extraction in the video frames. However, these individual techniques are having their own advantages and disadvantages to extract text in the video frames. Combination of these two techniques fetches good result as compared to individual techniques. In this paper, the canny edge-based and stroke filter for text extraction in the video frames are amalgamated. The effectiveness of the proposed method is evaluated over the in

APA, Harvard, Vancouver, ISO, and other styles

32

Ilaslan, Muhammet Furkan, Ali Köksal, Kevin Qinghong Lin, Burak Satar, Mike Zheng Shou, and Qianli Xu. "VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 4 (2025): 3886–94. https://doi.org/10.1609/aaai.v39i4.32406.

Full text

Abstract:

Large Language Model (LLM)-based agents have shown promise in procedural tasks, but the potential of multimodal instructions augmented by texts and videos to assist users remains under-explored. To address this gap, we propose the Visually Grounded Text-Video Prompting (VG-TVP) method which is a novel LLM-empowered Multimodal Procedural Planning (MPP) framework. It generates cohesive text and video procedural plans given a specified high-level objective. The main challenges are achieving textual and visual informativeness, temporal coherence, and accuracy in procedural plans. VG-TVP leverages

APA, Harvard, Vancouver, ISO, and other styles

33

Wu, Yihong, Mingli Lin, and Wenlong Yao. "The Influence of Titles on YouTube Trending Videos." Communications in Humanities Research 29, no. 1 (2024): 285–94. http://dx.doi.org/10.54254/2753-7064/29/20230835.

Full text

Abstract:

The global video platform market has been growing in a remarkable way in recent years. As a part of a video, title can compel people to view. However, few scholars have studied the relationship between video trendiness and title at present. This work studies the influence of sentiment polarity of videos using Valence Aware Dictionary Sentiment Reasoner (VADER) and investigated the feasibility of the application of video titles text on YouTube trending videos research using Doc2Vec. It is found that the text in YouTube trend video titles possesses predictive value for video trendiness, but it r

APA, Harvard, Vancouver, ISO, and other styles

34

Rushikesh, Chandrakant Konapure, and L.M.R.J. Lobo Dr. "Text Data Analysis for Advertisement Recommendation System Using Multi-label Classification of Machine Learning." Journal of Data Mining and Management 5, no. 1 (2020): 1–6. https://doi.org/10.5281/zenodo.3600112.

Full text

Abstract:

<em>Everyone today can access the streaming content on their mobile phones, laptops very easily and video has been a very important and popular content on the internet. Nowadays, people are making their content and uploading it on the streaming platforms so the size of the video dataset became massive compared to text, audio and image datasets. So, providing advertisements on the video related to the topic of video will help to boost business. In this proposed system the title and description of video will be taken as input to classify the video using a natural language processing text classif

APA, Harvard, Vancouver, ISO, and other styles

35

Frobenius, Maximiliane. "Pointing gestures in video blogs." Text & Talk 33, no. 1 (2013): 1–23. http://dx.doi.org/10.1515/text-2013-0001.

Full text

Abstract:

AbstractVideo blogs are a form of CMC (computer-mediated communication) that feature speakers who talk into a camera, and thereby produce a viewer-directed performance. Pointing gestures are part of the resources that the medium affords to design vlogs for the absent recipients. Based on a corpus of 40 vlogs, this research categorizes different kinds of common pointing actions in vlogs. Close analysis reveals the role multimodal factors such as gaze and body posture play along with deictic gestures and verbal reference in the production of a viewer-directed monologue. Those instances where vlo

APA, Harvard, Vancouver, ISO, and other styles

36

Puspita, Widya, Teti Sobari, and Wikanengsih Wikanengsih. "Improving Students Writing Skills Explanation Text using Animated Video." JLER (Journal of Language Education Research) 6, no. 1 (2023): 35–60. http://dx.doi.org/10.22460/jler.v6i1.10198.

Full text

Abstract:

This study focuses on the influence of an animated video on students' ability to write explanation text. This research uses descriptive qualitative research method. The purpose of this study is to find out whether the animated video used can help students improve their explanation text writing skills and see the differences in students' abilities before and after using animated videos in Indonesian language learning. The subjects in this study came from 20 students of class VII A at MTs Pasundan Cimahi, and the objects in this study were obtained from the results of the pre-test and post-test

APA, Harvard, Vancouver, ISO, and other styles

37

S, Ramacharan, Akshara Reddy P., Rukmini Reddy R, and Ch.Chathurya. "Script Abstract from Video Clip." Journal of Advancement in Software Engineering and Testing 5, no. 3 (2022): 1–4. https://doi.org/10.5281/zenodo.7321898.

Full text

Abstract:

In a world where technology is developing at a tremendously fast pace, the educational field has witnessed various new technologies that help in better learning, teaching and understanding. Video tutorials are playing a major role in helping students and learners understand new concepts at a much faster rate and at their own comfort level, but watching long tutorial or lecture videos can be time consuming and tiring the solution for this can be found through a video to text summarization application. With the help of advance NLP and machine learning we can summarize a video tutorial, this summ

APA, Harvard, Vancouver, ISO, and other styles

38

Sanjeeva, Polepaka, Vanipenta Balasri Nitin Reddy, Jagirdar Indraj Goud, Aavula Guru Prasad, and Ashish Pathani. "TEXT2AV – Automated Text to Audio and Video Conversion." E3S Web of Conferences 430 (2023): 01027. http://dx.doi.org/10.1051/e3sconf/202343001027.

Full text

Abstract:

The paper aims to develop a machine learning-based system that can automatically convert text to audio and text to video as per the user’s request. Suppose Reading a large text is difficult for anyone, but this TTS model makes it easy by converting text into audio by producing the audio output by an avatar with lip sync to make it look more attractive and human-like interaction in many languages. The TTS model is built based on Waveform Recurrent Neural Networks (WaveRNN). It is a type of auto-regressive model that predicts future data based on the present. The system identifies the keywords i

APA, Harvard, Vancouver, ISO, and other styles

39

Creamer, MeLisa, Heather R. Bowles, Belinda von Hofe, Kelley Pettee Gabriel, Harold W. Kohl, and Adrian Bauman. "Utility of Computer-Assisted Approaches for Population Surveillance of Physical Activity." Journal of Physical Activity and Health 11, no. 6 (2014): 1111–19. http://dx.doi.org/10.1123/jpah.2012-0266.

Full text

Abstract:

Background:Computer-assisted techniques may be a useful way to enhance physical activity surveillance and increase accuracy of reported behaviors.Purpose:Evaluate the reliability and validity of a physical activity (PA) self-report instrument administered by telephone and internet.Methods:The telephone-administered Active Australia Survey was adapted into 2 forms for internet self-administration: survey questions only (internet-text) and with videos demonstrating intensity (internet-video). Data were collected from 158 adults (20–69 years, 61% female) assigned to telephone (telephone-interview

APA, Harvard, Vancouver, ISO, and other styles

40

Du, Wanru, Xiaochuan Jing, Quan Zhu, Xiaoyin Wang, and Xuan Liu. "A cross-modal conditional mechanism based on attention for text-video retrieval." Mathematical Biosciences and Engineering 20, no. 11 (2023): 20073–92. http://dx.doi.org/10.3934/mbe.2023889.

Full text

Abstract:

<abstract><p>Current research in cross-modal retrieval has primarily focused on aligning the global features of videos and sentences. However, video conveys a much more comprehensive range of information than text. Thus, text-video matching should focus on the similarities between frames containing critical information and text semantics. This paper proposes a cross-modal conditional feature aggregation model based on the attention mechanism. It includes two innovative modules: (1) A cross-modal attentional feature aggregation module, which uses the semantic text features as condit

APA, Harvard, Vancouver, ISO, and other styles

41

Hua, Hang, Yunlong Tang, Chenliang Xu, and Jiebo Luo. "V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 4 (2025): 3599–607. https://doi.org/10.1609/aaai.v39i4.32374.

Full text

Abstract:

Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective training of advanced large vision-language models (VLMs). Additionally, most existing datasets are created for video-to-video summarization, overlooking the contemporary need for multimodal video content summarization. Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks ba

APA, Harvard, Vancouver, ISO, and other styles

42

Adams, Aubrie, and Weimin Toh. "Student Emotion in Mediated Learning: Comparing a Text, Video, and Video Game." Electronic Journal of e-Learning 19, no. 6 (2021): pp575–587. http://dx.doi.org/10.34190/ejel.19.6.2546.

Full text

Abstract:

Although serious games are generally praised by scholars for their potential to enhance teaching and e-learning practices, more empirical evidence is needed to support these accolades. Existing research in this area tends to show that gamified teaching experiences do contribute to significant effects to improve student cognitive, motivational, and behavioural learning outcomes, but these effects are usually small. In addition, less research examines how different types of mediated learning tools compare to one another in influencing student outcomes associated with learning and motivation. As

APA, Harvard, Vancouver, ISO, and other styles

43

Chen, Yizhen, Jie Wang, Lijian Lin, Zhongang Qi, Jin Ma, and Ying Shan. "Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (2023): 396–404. http://dx.doi.org/10.1609/aaai.v37i1.25113.

Full text

Abstract:

Vision-language alignment learning for video-text retrieval arouses a lot of attention in recent years. Most of the existing methods either transfer the knowledge of image-text pretraining model to video-text retrieval task without fully exploring the multi-modal information of videos, or simply fuse multi-modal features in a brute force manner without explicit guidance. In this paper, we integrate multi-modal information in an explicit manner by tagging, and use the tags as the anchors for better video-text alignment. Various pretrained experts are utilized for extracting the information of m

APA, Harvard, Vancouver, ISO, and other styles

44

Huang, Hong-Bo, Yao-Lin Zheng, and Zhi-Ying Hu. "Video Abnormal Action Recognition Based on Multimodal Heterogeneous Transfer Learning." Advances in Multimedia 2024 (January 19, 2024): 1–12. http://dx.doi.org/10.1155/2024/4187991.

Full text

Abstract:

Human abnormal action recognition is crucial for video understanding and intelligent surveillance. However, the scarcity of labeled data for abnormal human actions often hinders the development of high-performance models. Inspired by the multimodal approach, this paper proposes a novel approach that leverages text descriptions associated with abnormal human action videos. Our method exploits the correlation between the text domain and the video domain in the semantic feature space and introduces a multimodal heterogeneous transfer learning framework from the text domain to the video domain. Th

APA, Harvard, Vancouver, ISO, and other styles

45

Mochurad, Lesia. "A NEW APPROACH FOR TEXT RECOGNITION ON A VIDEO CARD." Computer systems and information technologies, no. 3 (September 28, 2022): 22–30. http://dx.doi.org/10.31891/csit-2022-3-3.

Full text

Abstract:

An important task is to develop a computer system that can automatically read text content from images or videos with a complex background. Due to a large number of calculations, it is quite difficult to apply them in real-time. Therefore, the use of parallel and distributed computing in the development of real-time or near real-time systems is relevant. The latter is especially relevant in such areas as automation of video recording of traffic violations, text recognition, machine vision, fingerprint recognition, speech, and more. The paper proposes a new approach to text recognition on a vid

APA, Harvard, Vancouver, ISO, and other styles

46

Lokkondra, Chaitra Yuvaraj, Dinesh Ramegowda, Gopalakrishna Madigondanahalli Thimmaiah, Ajay Prakash Bassappa Vijaya, and Manjula Hebbaka Shivananjappa. "ETDR: An Exploratory View of Text Detection and Recognition in Images and Videos." Revue d'Intelligence Artificielle 35, no. 5 (2021): 383–93. http://dx.doi.org/10.18280/ria.350504.

Full text

Abstract:

Images and videos with text content are a direct source of information. Today, there is a high need for image and video data that can be intelligently analyzed. A growing number of researchers are focusing on text identification, making it a hot issue in machine vision research. Since this opens the way, several real-time-based applications such as text detection, localization, and tracking have become more prevalent in text analysis systems. To find out more about how text information may be extracted, have a look at our survey. This study presents a trustworthy dataset for text identificatio

APA, Harvard, Vancouver, ISO, and other styles

47

Chen, Yupeng, Penglin Chen, Xiaoyu Zhang, Yixian Huang, and Qian Xie. "EditBoard: Towards a Comprehensive Evaluation Benchmark for Text-Based Video Editing Models." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 15 (2025): 15975–83. https://doi.org/10.1609/aaai.v39i15.33754.

Full text

Abstract:

The rapid development of diffusion models has significantly advanced AI-generated content (AIGC), particularly in Text-to-Image (T2I) and Text-to-Video (T2V) generation. Text-based video editing, leveraging these generative capabilities, has emerged as a promising field, enabling precise modifications to videos based on text prompts. Despite the proliferation of innovative video editing models, there is a conspicuous lack of comprehensive evaluation benchmarks that holistically assess these models’ performance across various dimensions. Existing evaluations are limited and inconsistent, typica

APA, Harvard, Vancouver, ISO, and other styles

48

Aljorani, Reem, and Boshra Zopon. "Encapsulation Video Classification and Retrieval Based on Arabic Text." Diyala Journal For Pure Science 17, no. 4 (2021): 20–36. http://dx.doi.org/10.24237/djps.17.04.558b.

Full text

Abstract:

Since Arabic video classification is not a popular field and there isn’t a lot of researches in this area especially in the educational field. A system was proposed to solve this problem and to make the educational Arabic videos more available to the students. A survey was fulfilled to study several papers in order to design and implement a system that classifies videos operative in the Arabic language by extracting its audio features using azure cognitive services which produce text transcripts. Several preprocessing operations are then applied to process the text transcript. A stochastic gra

APA, Harvard, Vancouver, ISO, and other styles

49

Krishnamoorthy, Niveda, Girish Malkarnenkar, Raymond Mooney, Kate Saenko, and Sergio Guadarrama. "Generating Natural-Language Video Descriptions Using Text-Mined Knowledge." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (2013): 541–47. http://dx.doi.org/10.1609/aaai.v27i1.8679.

Full text

Abstract:

We present a holistic data-driven technique that generates natural-language descriptions for videos. We combine the output of state-of-the-art object and activity detectors with "real-world' knowledge to select the most probable subject-verb-object triplet for describing a video. We show that this knowledge, automatically mined from web-scale text corpora, enhances the triplet selection algorithm by providing it contextual information and leads to a four-fold increase in activity identification. Unlike previous methods, our approach can annotate arbitrary videos without requiring the expensive

APA, Harvard, Vancouver, ISO, and other styles

50

CHEN, DATONG, JEAN-MARC ODOBEZ, and JEAN-PHILIPPE THIRAN. "MONTE CARLO VIDEO TEXT SEGMENTATION." International Journal of Pattern Recognition and Artificial Intelligence 19, no. 05 (2005): 647–61. http://dx.doi.org/10.1142/s0218001405004216.

Full text

Abstract:

This paper presents a probabilistic algorithm for segmenting and recognizing text embedded in video sequences based on adaptive thresholding using a Bayes filtering method. The algorithm approximates the posterior distribution of segmentation thresholds of video text by a set of weighted samples. The set of samples is initialized by applying a classical segmentation algorithm on the first video frame and further refined by random sampling under a temporal Bayesian framework. This framework allows us to evaluate a text image segmentor on the basis of recognition result instead of visual segment

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!