Log in

Relevant bibliographies by topics / Keyframes / Journal articles

To see the other types of publications on this topic, follow the link: Keyframes.

Journal articles on the topic 'Keyframes'

Author: Grafiati

Published: 4 June 2021

Last updated: 18 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Keyframes.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Younessian, Ehsan, and Deepu Rajan. "Content-Based Keyframe Clustering Using Near Duplicate Keyframe Identification." International Journal of Multimedia Data Engineering and Management 2, no. 1 (2011): 1–21. http://dx.doi.org/10.4018/jmdem.2011010101.

Full text

Abstract:

In this paper, the authors propose an effective content-based clustering method for keyframes of news video stories using the Near Duplicate Keyframe (NDK) identification concept. Initially, the authors investigate the near-duplicate relationship, as a content-based visual similarity across keyframes, through the Near-Duplicate Keyframe (NDK) identification algorithm presented. The authors assign a near-duplicate score to each pair of keyframes within the story. Using an efficient keypoint matching technique followed by matching pattern analysis, this NDK identification algorithm can handle extreme zooming and significant object motion. In the second step, the weighted adjacency matrix is determined for each story based on assigned near duplicate score. The authors then use the spectral clustering scheme to remove outlier keyframes and partition remainders. Two sets of experiments are carried out to evaluate the NDK identification method and assess the proposed keyframe clustering method performance.

APA, Harvard, Vancouver, ISO, and other styles

2

Jiang, Hongda, Marc Christie, Xi Wang, Libin Liu, Bin Wang, and Baoquan Chen. "Camera keyframing with style and control." ACM Transactions on Graphics 40, no. 6 (2021): 1–13. http://dx.doi.org/10.1145/3478513.3480533.

Full text

Abstract:

We present a novel technique that enables 3D artists to synthesize camera motions in virtual environments following a camera style , while enforcing user-designed camera keyframes as constraints along the sequence. To solve this constrained motion in-betweening problem, we design and train a camera motion generator from a collection of temporal cinematic features (camera and actor motions) using a conditioning on target keyframes. We further condition the generator with a style code to control how to perform the interpolation between the keyframes. Style codes are generated by training a second network that encodes different camera behaviors in a compact latent space, the camera style space. Camera behaviors are defined as temporal correlations between actor features and camera motions and can be extracted from real or synthetic film clips. We further extend the system by incorporating a fine control of camera speed and direction via a hidden state mapping technique. We evaluate our method on two aspects: i) the capacity to synthesize style-aware camera trajectories with user defined keyframes; and ii) the capacity to ensure that in-between motions still comply with the reference camera style while satisfying the keyframe constraints. As a result, our system is the first style-aware keyframe in-betweening technique for camera control that balances style-driven automation with precise and interactive control of keyframes.

APA, Harvard, Vancouver, ISO, and other styles

3

Sharma, R. Rajesh. "Two-Stage Frame Extraction in Video Analysis for Accurate Prediction of Object Tracking by Improved Deep Learning." Journal of Innovative Image Processing 3, no. 4 (2021): 322–35. http://dx.doi.org/10.36548/jiip.2021.4.004.

Full text

Abstract:

Recently, the information extraction from graphics and video summarizing using keyframes have benefited from a recent look at the visual content-based method. Analysis of keyframes in a movie may be done by extracting visual elements from the video clips. In order to accurately anticipate the path of an item in real-time, the visible components are utilized. The frame variations with low-level properties such as color and structure are the basis of the rapid and reliable approach. This research work contains 3 phases: preprocessing, two-stage extraction, and video prediction module. Besides, this framework on object track estimation uses the probabilistic deterministic process to arrive at an estimate of the object. Keyframes for the whole video sequence are extracted using a proposed two-stage feature extraction approach by CNN feature extraction. An alternate sequence is first constructed by comparing the color characteristics of neighboring frames in the original series to those of the generated one. When an alternate arrangement is compared to the final keyframe sequence, it is found that there are substantial structural changes between consecutive frames. Three keyframe extraction techniques based on on-time behavior have been employed in this study. A keyframe extraction optimization phase termed as "Adam" optimizer, dependent on the number of final keyframes is then introduced. The proposed technique outperforms the prior methods in computational cost and resilience across a wide range of video formats, video resolutions, and other parameters. Finally, this research compares SSIM, MAE, and RMSE performance metrics with the traditional approach.

APA, Harvard, Vancouver, ISO, and other styles

4

Wei, Dong, Xiaoning Sun, Huaijiang Sun, et al. "Enhanced Fine-Grained Motion Diffusion for Text-Driven Human Motion Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 6 (2024): 5876–84. http://dx.doi.org/10.1609/aaai.v38i6.28401.

Full text

Abstract:

The emergence of text-driven motion synthesis technique provides animators with great potential to create efficiently. However, in most cases, textual expressions only contain general and qualitative motion descriptions, while lack fine depiction and sufficient intensity, leading to the synthesized motions that either (a) semantically compliant but uncontrollable over specific pose details, or (b) even deviates from the provided descriptions, bringing animators with undesired cases. In this paper, we propose DiffKFC, a conditional diffusion model for text-driven motion synthesis with KeyFrames Collaborated, enabling realistic generation with collaborative and efficient dual-level control: coarse guidance at semantic level, with only few keyframes for direct and fine-grained depiction down to body posture level. Unlike existing inference-editing diffusion models that incorporate conditions without training, our conditional diffusion model is explicitly trained and can fully exploit correlations among texts, keyframes and the diffused target frames. To preserve the control capability of discrete and sparse keyframes, we customize dilated mask attention modules where only partial valid tokens participate in local-to-global attention, indicated by the dilated keyframe mask. Additionally, we develop a simple yet effective smoothness prior, which steers the generated frames towards seamless keyframe transitions at inference. Extensive experiments show that our model not only achieves state-of-the-art performance in terms of semantic fidelity, but more importantly, is able to satisfy animator requirements through fine-grained guidance without tedious labor.

APA, Harvard, Vancouver, ISO, and other styles

5

Duan, Ran, Yurong Feng, and Chih-Yung Wen. "Deep Pose Graph-Matching-Based Loop Closure Detection for Semantic Visual SLAM." Sustainability 14, no. 19 (2022): 11864. http://dx.doi.org/10.3390/su141911864.

Full text

Abstract:

This work addresses the loop closure detection issue by matching the local pose graphs for semantic visual SLAM. We propose a deep feature matching-based keyframe retrieval approach. The proposed method treats the local navigational maps as images. Thus, the keyframes may be considered keypoints of the map image. The descriptors of the keyframes are extracted using a convolutional neural network. As a result, we convert the loop closure detection problem to a feature matching problem so that we can solve the keyframe retrieval and pose graph matching concurrently. This process in our work is carried out by modified deep feature matching (DFM). The experimental results on the KITTI and Oxford RobotCar benchmarks show the feasibility and capabilities of accurate loop closure detection and the potential to extend to multiagent applications.

APA, Harvard, Vancouver, ISO, and other styles

6

Kim, Nam Hee, Hung Yu Ling, Zhaoming Xie, and Michiel van de Panne. "Flexible Motion Optimization with Modulated Assistive Forces." Proceedings of the ACM on Computer Graphics and Interactive Techniques 4, no. 3 (2021): 1–25. http://dx.doi.org/10.1145/3480144.

Full text

Abstract:

Animated motions should be simple to direct while also being plausible. We present a flexible keyframe-based character animation system that generates plausible simulated motions for both physically-feasible and physically-infeasible motion specifications. We introduce a novel control parameterization, optimizing over internal actions, external assistive-force modulation, and keyframe timing. Our method allows for emergent behaviors between keyframes, does not require advance knowledge of contacts or exact motion timing, supports the creation of physically impossible motions, and allows for near-interactive motion creation. The use of a shooting method allows for the use of any black-box simulator. We present results for a variety of 2D and 3D characters and motions, using sparse and dense keyframes. We compare our control parameterization scheme against other possible approaches for incorporating external assistive forces.

APA, Harvard, Vancouver, ISO, and other styles

7

Fang, Q. S., Z. Peng, and P. Yan. "Fire Detection and Localization Method Based on Deep Learning in Video Surveillance." Journal of Physics: Conference Series 2278, no. 1 (2022): 012024. http://dx.doi.org/10.1088/1742-6596/2278/1/012024.

Full text

Abstract:

Abstract Fire detection and localization in video surveillance had become a particularly important part of disaster rescue. Considering the fire detection and localization of slow detection speed, low detection accuracy, and low localization precision in video surveillance, we proposed a fire detection and localization method based on deep learning. The first thing we improved the SuperPoint method to extract video keyframe in video surveillance. The next thing we employed Convolutional Neural Network (CNN) model to detect the fire on the extracted video keyframes. The last thing we located the fire via superpixel and CNN on the extracted video keyframes which broke out a fire. The experimental results on open fire dataset revealed that the recall of keyframe extraction reached 0.83, the precision of fire detection reached 0.96 and the F1-score of fire localization reached 0.90. Our method realized rapid and accurate detection and precise localization of fire in video surveillance.

APA, Harvard, Vancouver, ISO, and other styles

8

D., Rajeshwari, and Victoria Priscilla C. "An Enhanced Spatio-Temporal Human Detected Keyframe Extraction." International journal of electrical and computer engineering systems 14, no. 9 (2023): 985–92. http://dx.doi.org/10.32985/ijeces.14.9.3.

Full text

Abstract:

Due to the immense availability of Closed-Circuit Television surveillance, it is quite difficult for crime investigation due to its huge storage and complex background. Content-based video retrieval is an excellent method to identify the best Keyframes from these surveillance videos. As the crime surveillance reports numerous action scenes, the existing keyframe extraction is not exemplary. At this point, the Spatio-temporal Histogram of Oriented Gradients - Support Vector Machine feature method with the combination of Background Subtraction is appended over the recovered crime video to highlight the human presence in surveillance frames. Additionally, the Visual Geometry Group trains these frames for the classification report of human-detected frames. These detected frames are processed to extract the keyframe by manipulating an inter-frame difference with its threshold value to favor the requisite human-detected keyframes. Thus, the experimental results of HOG-SVM illustrate a compression ratio of 98.54%, which is preferable to the proposed work's compression ratio of 98.71%, which supports the criminal investigation.

APA, Harvard, Vancouver, ISO, and other styles

9

Man, Guangyi, and Xiaoyan Sun. "Interested Keyframe Extraction of Commodity Video Based on Adaptive Clustering Annotation." Applied Sciences 12, no. 3 (2022): 1502. http://dx.doi.org/10.3390/app12031502.

Full text

Abstract:

Keyframe recognition in video is very important for extracting pivotal information from videos. Numerous studies have been successfully carried out on identifying frames with motion objectives as keyframes. The definition of “keyframe” can be quite different for different requirements. In the field of E-commerce, the keyframes of the products videos should be those interested by a customer and help the customer make correct and quick decisions, which is greatly different from the existing studies. Accordingly, here, we first define the key interested frame of commodity video from the viewpoint of user demand. As there are no annotations on the interested frames, we develop a fast and adaptive clustering strategy to cluster the preprocessed videos into several clusters according to the definition and make an annotation. These annotated samples are utilized to train a deep neural network to obtain the features of key interested frames and achieve the goal of recognition. The performance of the proposed algorithm in effectively recognizing the key interested frames is demonstrated by applying it to some commodity videos fetched from the E-commerce platform.

APA, Harvard, Vancouver, ISO, and other styles

10

Saqib, Shazia, and Syed Kazmi. "Video Summarization for Sign Languages Using the Median of Entropy of Mean Frames Method." Entropy 20, no. 10 (2018): 748. http://dx.doi.org/10.3390/e20100748.

Full text

Abstract:

Multimedia information requires large repositories of audio-video data. Retrieval and delivery of video content is a very time-consuming process and is a great challenge for researchers. An efficient approach for faster browsing of large video collections and more efficient content indexing and access is video summarization. Compression of data through extraction of keyframes is a solution to these challenges. A keyframe is a representative frame of the salient features of the video. The output frames must represent the original video in temporal order. The proposed research presents a method of keyframe extraction using the mean of consecutive k frames of video data. A sliding window of size k / 2 is employed to select the frame that matches the median entropy value of the sliding window. This is called the Median of Entropy of Mean Frames (MME) method. MME is mean-based keyframes selection using the median of the entropy of the sliding window. The method was tested for more than 500 videos of sign language gestures and showed satisfactory results.

APA, Harvard, Vancouver, ISO, and other styles

11

B. Taher, Hazeem, and Amal H. Awadh. "Video Summarization for Surveillance System Using key-frame Extraction based on Cluster." Journal of Education for Pure Science- University of Thi-Qar 11, no. 1 (2021): 54–65. http://dx.doi.org/10.32792/jeps.v11i1.91.

Full text

Abstract:

The amount of data has grown in recent years due to the use of a vast number of videos, which requires time to access them in addition to the difficulty of browsing and retrieving the video content. To fix this issue, it was proposed that the videos be summarized for easy access and that the content of the videos is browsed easier. The primary objective of the video summary is to provide a simple description of the video by removing the redundancy and extracting keyframes from the video. This paper will clarify the four ways that are using to summing up the video based on the keyframe extraction. In frames extraction, the first two methods rely on the threshold value, while the second two methods rely on clustering to extract the keyframes.

APA, Harvard, Vancouver, ISO, and other styles

12

Hu, Bo, and Jingwen Luo. "A Robust Semi-Direct 3D SLAM for Mobile Robot Based on Dense Optical Flow in Dynamic Scenes." Biomimetics 8, no. 4 (2023): 371. http://dx.doi.org/10.3390/biomimetics8040371.

Full text

Abstract:

Dynamic objects bring about a large number of error accumulations in pose estimation of mobile robots in dynamic scenes, and result in the failure to build a map that is consistent with the surrounding environment. Along these lines, this paper presents a robust semi-direct 3D simultaneous localization and mapping (SLAM) algorithm for mobile robots based on dense optical flow. First, a preliminary estimation of the robot’s pose is conducted using the sparse direct method and the homography matrix is utilized to compensate for the current frame image to reduce the image deformation caused by rotation during the robot’s motion. Then, by calculating the dense optical flow field of two adjacent frames and segmenting the dynamic region in the scene based on the dynamic threshold, the local map points projected within the dynamic regions are eliminated. On this basis, the robot’s pose is optimized by minimizing the reprojection error. Moreover, a high-performance keyframe selection strategy is developed, and keyframes are inserted when the robot’s pose is successfully tracked. Meanwhile, feature points are extracted and matched to the keyframes for subsequent optimization and mapping. Considering that the direct method is subject to tracking failure in practical application scenarios, the feature points and map points of keyframes are employed in robot relocation. Finally, all keyframes and map points are used as optimization variables for global bundle adjustment (BA) optimization, so as to construct a globally consistent 3D dense octree map. A series of simulations and experiments demonstrate the superior performance of the proposed algorithm.

APA, Harvard, Vancouver, ISO, and other styles

13

Manasa, Smt B. "Hybrid CNN-Transformer Architecture for Robust Deepfake Detection: A Keyframe-Based Evaluation." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem46782.

Full text

Abstract:

Abstract - The proliferation of Deepfake content presents a significant threat to digital integrity and media authenticity. To address this challenge, we present a comprehensive evaluation of four deep learning architectures—Convolutional Neural Networks (CNN), Transformer-based models, CNN integrated with Long Short-Term Memory (CNN+LSTM), and a novel hybrid CNN–Transformer model—specifically applied to Deepfake detection using keyframes. Keyframes were extracted from the FaceForensics++ dataset, preserving high-resolution information crucial for robust detection. Each model was trained and tested under identical conditions to ensure fair comparison. The hybrid architecture, combining the local feature extraction capabilities of CNNs with the global contextual modelling power of Transformers, achieved the highest performance across all metrics, including accuracy, precision, recall, F1-score, and AUC. Our findings highlight the superiority of multi-perspective feature learning and reinforce the importance of keyframe utilization in compressed video-based Deepfake detection. This work provides a solid benchmark and foundation for future research on real-time and cross-dataset Deepfake detection frameworks. Key Words: Deepfake Detection, Convolutional Neural Networks (CNN), Transformer Networks, CNN+LSTM, CNN–Transformer Hybrid, Face Forensics++ (FF++), Keyframe Extraction, Deep learning.

APA, Harvard, Vancouver, ISO, and other styles

14

Lin, Yuan, Haiqing Dong, Wentao Ye, Xue Dong, and Shuogui Xu. "InfoLa-SLAM: Efficient Lidar-Based Lightweight Simultaneous Localization and Mapping with Information-Based Keyframe Selection and Landmarks Assisted Relocalization." Remote Sensing 15, no. 18 (2023): 4627. http://dx.doi.org/10.3390/rs15184627.

Full text

Abstract:

This work reports an information-based landmarks assisted simultaneous localization and mapping (InfoLa-SLAM) in large-scale scenes using single-line lidar. The solution employed two novel designs. The first design was a keyframe selection method based on Fisher information, which reduced the computational cost of the nonlinear optimization for the back-end of SLAM by selecting a relatively small number of keyframes while ensuring the accuracy of mapping. The Fisher information was acquired from the point cloud registration between the current frame and the previous keyframe. The second design was an efficient global descriptor for place recognition, which was achieved by designing a unique graphical feature ID to effectively match the local map with the global one. The results showed that compared with traditional keyframe selection strategies (e.g., based on time, angle, or distance), the proposed method allowed for a 35.16% reduction in the number of keyframes in a warehouse with an area of about 10,000 m2. The relocalization module demonstrated a high probability (96%) of correction even under high levels of measurement noise (0.05 m), while the time consumption for relocalization was below 28 ms. The proposed InfoLa-SLAM was also compared with Cartographer under the same dataset. The results showed that InfoLa-SLAM achieved very similar mapping accuracy to Cartographer but excelled in lightweight performance, achieving a 9.11% reduction in the CPU load and a significant 56.67% decrease in the memory consumption.

APA, Harvard, Vancouver, ISO, and other styles

15

Qu, Zhong, and Teng Fei Gao. "An Improved Algorithm of Keyframe Extraction for Video Summarization." Advanced Materials Research 225-226 (April 2011): 807–11. http://dx.doi.org/10.4028/www.scientific.net/amr.225-226.807.

Full text

Abstract:

Video segmentation and keyframe extraction are the basis of Content-based Video Retrieval (CBVR), in which keyframe selection plays the central role in CBVR. In this paper, as the initialization of keyframe extraction, we proposed an improved approach of key-frame extraction for video summarization. In our approach, videos were firstly segmented into shots according to video content, by our improved histogram-based method, with the use of histogram intersection and nonuniform partitioning and weighting. Then, within each shot, keyframes were determined with the calculation of image entropy as a reflection of the quantity of image information in HSV color space of every frame. Our simulation results in section 4 prove that extracted key frames with our method are compact and faithful to the original video.

APA, Harvard, Vancouver, ISO, and other styles

16

Li, Hengzi, and Xingli Huang. "Intelligent Dance Motion Evaluation: An Evaluation Method Based on Keyframe Acquisition According to Musical Beat Features." Sensors 24, no. 19 (2024): 6278. http://dx.doi.org/10.3390/s24196278.

Full text

Abstract:

Motion perception is crucial in competitive sports like dance, basketball, and diving. However, evaluations in these sports heavily rely on professionals, posing two main challenges: subjective assessments are uncertain and can be influenced by experience, making it hard to guarantee timeliness and accuracy, and increasing labor costs with multi-expert voting. While video analysis methods have alleviated some pressure, challenges remain in extracting key points/frames from videos and constructing a suitable, quantifiable evaluation method that aligns with the static–dynamic nature of movements for accurate assessment. Therefore, this study proposes an innovative intelligent evaluation method aimed at enhancing the accuracy and processing speed of complex video analysis tasks. Firstly, by constructing a keyframe extraction method based on musical beat detection, coupled with prior knowledge, the beat detection is optimized through a perceptually weighted window to accurately extract keyframes that are highly correlated with dance movement changes. Secondly, OpenPose is employed to detect human joint points in the keyframes, quantifying human movements into a series of numerically expressed nodes and their relationships (i.e., pose descriptions). Combined with the positions of keyframes in the time sequence, a standard pose description sequence is formed, serving as the foundational data for subsequent quantitative evaluations. Lastly, an Action Sequence Evaluation method (ASCS) is established based on all action features within a single action frame to precisely assess the overall performance of individual actions. Furthermore, drawing inspiration from the Rouge-L evaluation method in natural language processing, a Similarity Measure Approach based on Contextual Relationships (SMACR) is constructed, focusing on evaluating the coherence of actions. By integrating ASCS and SMACR, a comprehensive evaluation of dancers is conducted from both the static and dynamic dimensions. During the method validation phase, the research team judiciously selected 12 representative samples from the popular dance game Just Dance, meticulously classifying them according to the complexity of dance moves and physical exertion levels. The experimental results demonstrate the outstanding performance of the constructed automated evaluation method. Specifically, this method not only achieves the precise assessments of dance movements at the individual keyframe level but also significantly enhances the evaluation of action coherence and completeness through the innovative SMACR. Across all 12 test samples, the method accurately selects 2 to 5 keyframes per second from the videos, reducing the computational load to 4.1–10.3% compared to traditional full-frame matching methods, while the overall evaluation accuracy only slightly decreases by 3%, fully demonstrating the method’s combination of efficiency and precision. Through precise musical beat alignment, efficient keyframe extraction, and the introduction of intelligent dance motion analysis technology, this study significantly improves upon the subjectivity and inefficiency of traditional manual evaluations, enhancing the scientificity and accuracy of assessments. It provides robust tool support for fields such as dance education and competition evaluations, showcasing broad application prospects.

APA, Harvard, Vancouver, ISO, and other styles

17

Jiang, Zhengkai, Peng Gao, Chaoxu Guo, Qian Zhang, Shiming Xiang, and Chunhong Pan. "Video Object Detection with Locally-Weighted Deformable Neighbors." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8529–36. http://dx.doi.org/10.1609/aaai.v33i01.33018529.

Full text

Abstract:

Deep convolutional neural networks have achieved great success on various image recognition tasks. However, it is nontrivial to transfer the existing networks to video due to the fact that most of them are developed for static image. Frame-byframe processing is suboptimal because temporal information that is vital for video understanding is totally abandoned. Furthermore, frame-by-frame processing is slow and inefficient, which can hinder the practical usage. In this paper, we propose LWDN (Locally-Weighted Deformable Neighbors) for video object detection without utilizing time-consuming optical flow extraction networks. LWDN can latently align the high-level features between keyframes and keyframes or nonkeyframes. Inspired by (Zhu et al. 2017a) and (Hetang et al. 2017) who propose to aggregate features between keyframes and keyframes, we adopt brain-inspired memory mechanism to propagate and update the memory feature from keyframes to keyframes. We call this process Memory-Guided Propagation. With such a memory mechanism, the discriminative ability of features in keyframes and non-keyframes are both enhanced, which helps to improve the detection accuracy. Extensive experiments on VID dataset demonstrate that our method achieves superior performance in a speed and accuracy trade-off, i.e., 76.3% on the challenging VID dataset while maintaining 20fps in speed on Titan X GPU.

APA, Harvard, Vancouver, ISO, and other styles

18

Zhang, Y., C. Lan, Q. Shi, Z. Cui, and W. Sun. "VIDEO IMAGE TARGET RECOGNITION AND GEOLOCATION METHOD FOR UAV BASED ON LANDMARKS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W16 (September 17, 2019): 285–91. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w16-285-2019.

Full text

Abstract:

<p><strong>Abstract.</strong> Relying on landmarks for robust geolocation of drone and targets is one of the most important ways in GPS-denied environments. For small drones，there is no direct orientation capability without high-precision IMU. This paper presents an automated real-time matching and geolocation algorithm between video keyframes and landmark database based on the integration of visual SALM and YOLOv3 deep learning network method. The algorithm mainly extracts the landmarks from the drone video keyframe images to improve target geolocation accuracy, and designs different processing scheme of the keyframes which contains rich and spare landmarks. For feature extraction matching, we improved ORB feature extraction strategy, and obtained a more uniformly distributed feature points than original ORB feature extraction. In the three groups of top-down drone video images experiments, the 100&amp;thinsp;m, 200&amp;thinsp;m, and 300&amp;thinsp;m of the case were carried out to verify the robustness of the algorithm and being compared with GPS surveying data. The results show that the features of keyframe landmarks in the top-down video images within 300&amp;thinsp;m are stable to match the landmark database, the geolocation accuracy is controlled within 0.8&amp;thinsp;m, and it has good accuracy.</p>

APA, Harvard, Vancouver, ISO, and other styles

19

Poomhiran, L., P. Meesad, and S. Nuanmeesri. "Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique." Engineering, Technology & Applied Science Research 11, no. 2 (2021): 6986–92. http://dx.doi.org/10.48084/etasr.4102.

Full text

Abstract:

This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition.

APA, Harvard, Vancouver, ISO, and other styles

20

Chahine, Georges, and Cédric Pradalier. "Semantic-aware Spatiotemporal Alignment of Natural Outdoor Surveys." Field Robotics 2, no. 1 (2022): 1819–48. http://dx.doi.org/10.55417/fr.2022056.

Full text

Abstract:

This article presents a keyframe-based, innovative map registration scheme for applications that benefit from recurring data acquisition, such as long-term natural environment monitoring. The proposed method consists of a multistage pipeline, in which semantic knowledge of the scene is acquired using a pretrained neural network. The semantic knowledge is subsequently employed to constrain the Iterative Closest Point algorithm (ICP). In this article, semantic-aware ICP is used to build keyframes as well as to align them both spatially and temporally, with neighboring keyframes and those captured around the same area but at a different point in time, respectively. Hierarchical clustering of ICP-generated transformations is then used to both eliminate outliers and find alignment consensus, followed by an optimization scheme based on a factor graph that includes loop closure. To evaluate the proposed framework, data were captured using a portable robotic sensor suite consisting of three cameras, a three-dimensional lidar, and an inertial navigation system. The data were acquired monthly over 12 months by revisiting the same trajectory between August 2020 and July 2021.

APA, Harvard, Vancouver, ISO, and other styles

21

Gong, Han, Lei Gong, Tianbing Ma, Zhicheng Sun, and Liang Li. "AHY-SLAM: Toward Faster and More Accurate Visual SLAM in Dynamic Scenes Using Homogenized Feature Extraction and Object Detection Method." Sensors 23, no. 9 (2023): 4241. http://dx.doi.org/10.3390/s23094241.

Full text

Abstract:

At present, SLAM is widely used in all kinds of dynamic scenes. It is difficult to distinguish dynamic targets in scenes using traditional visual SLAM. In the matching process, dynamic points are incorrectly added to the pose calculation with the camera, resulting in low precision and poor robustness in the pose estimation. This paper proposes a new dynamic scene visual SLAM algorithm based on adaptive threshold homogenized feature extraction and YOLOv5 object detection, named AHY-SLAM. This new method adds three new modules based on ORB-SLAM2: a keyframe selection module, a threshold calculation module, and an object detection module. The optical flow method is used to screen keyframes for each frame input in AHY-SLAM. An adaptive threshold is used to extract feature points for keyframes, and dynamic points are eliminated with YOLOv5. Compared with ORB-SLAM2, AHY-SLAM has significantly improved pose estimation accuracy over multiple dynamic scene sequences in the TUM open dataset, and the absolute pose estimation accuracy can be increased by up to 97%. Compared with other dynamic scene SLAM algorithms, the speed of AHY-SLAM is also significantly improved under a guarantee of acceptable accuracy.

APA, Harvard, Vancouver, ISO, and other styles

22

Yoon, Ui Nyoung, Myung Duk Hong, and Geun-Sik Jo. "Unsupervised Video Summarization Based on Deep Reinforcement Learning with Interpolation." Sensors 23, no. 7 (2023): 3384. http://dx.doi.org/10.3390/s23073384.

Full text

Abstract:

Individuals spend time on online video-sharing platforms searching for videos. Video summarization helps search through many videos efficiently and quickly. In this paper, we propose an unsupervised video summarization method based on deep reinforcement learning with an interpolation method. To train the video summarization network efficiently, we used the graph-level features and designed a reinforcement learning-based video summarization framework with a temporal consistency reward function and other reward functions. Our temporal consistency reward function helped to select keyframes uniformly. We present a lightweight video summarization network with transformer and CNN networks to capture the global and local contexts to efficiently predict the keyframe-level importance score of the video in a short length. The output importance score of the network was interpolated to fit the video length. Using the predicted importance score, we calculated the reward based on the reward functions, which helped select interesting keyframes efficiently and uniformly. We evaluated the proposed method on two datasets, SumMe and TVSum. The experimental results illustrate that the proposed method showed a state-of-the-art performance compared to the latest unsupervised video summarization methods, which we demonstrate and analyze experimentally.

APA, Harvard, Vancouver, ISO, and other styles

23

Ma, Lei, Weiyu Wang, Yaozong Zhang, Yu Shi, Zhenghua Huang, and Hanyu Hong. "Multi-features combinatorial optimization for keyframe extraction." Electronic Research Archive 31, no. 10 (2023): 5976–95. http://dx.doi.org/10.3934/era.2023304.

Full text

Abstract:

<abstract><p>Recent advancements in network and multimedia technologies have facilitated the distribution and sharing of digital videos over the Internet. These long videos contain very complex contents. Additionally, it is very challenging to use as few frames as possible to cover the video contents without missing too much information. There are at least two ways to describe these complex videos contents with minimal frames: the keyframes extracted from the video or the video summary. The former lays stress on covering the whole video contents as much as possible. The latter emphasizes covering the video contents of interest. As a consequence, keyframes are widely used in many areas such as video segmentation and object tracking. In this paper, we propose a keyframe extraction method based on multiple features via a novel combinatorial optimization algorithm. The key frame extraction is modeled as a combinatorial optimization problem. A fast dynamic programming algorithm based on a forward non-overlapping transfer matrix in polynomial time and a 0-1 integer linear programming algorithm based on an overlapping matrix is proposed to solve our maximization problem. In order to quantitatively evaluate our approach, a long video dataset named 'Animal world' is self-constructed, and the segmentation evaluation criterions are introduced. A good result is achieved on 'Animal world' dataset and a public available Keyframe-Sydney KFSYD dataset <sup>[<xref ref-type="bibr" rid="b1">1</xref>]</sup>.</p></abstract>

APA, Harvard, Vancouver, ISO, and other styles

24

Gu, Lingchen, Ju Liu, and Aixi Qu. "Performance Evaluation and Scheme Selection of Shot Boundary Detection and Keyframe Extraction in Content-Based Video Retrieval." International Journal of Digital Crime and Forensics 9, no. 4 (2017): 15–29. http://dx.doi.org/10.4018/ijdcf.2017100102.

Full text

Abstract:

The advancement of multimedia technology has contributed to a large number of videos, so it is important to know how to retrieve information from video, especially for crime prevention and forensics. For the convenience of retrieving video data, content-based video retrieval (CBVR) has got great publicity. Aiming at improving the retrieval performance, we focus on the two key technologies: shot boundary detection and keyframe extraction. After being compared with pixel analysis and chi-square histogram, histogram-based method is chosen in this paper. Then we combine it with adaptive threshold method and use HSV color space to get the histogram. For keyframe extraction, four methods are analyzed and four evaluation criteria are summarized, both objective and subjective, so the opinion is finally given that different types of keyframe extraction methods can be used for varied types of videos. Then the retrieval can be based on keyframes, simplifying the process of video investigation, and helping criminal investigation personnel to improve work efficiency.

APA, Harvard, Vancouver, ISO, and other styles

25

Motayyeb, S., F. Samadzadegan, F. Dadrass Javan, and H. R. Hosseinpour. "EFFECT OF KEYFRAMES EXTRACTION FROM THERMAL INFRARED VIDEO STREAM TO GENERATE DENSE POINT CLOUD OF THE BUILDING'S FACADE." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences X-4/W1-2022 (January 14, 2023): 551–61. http://dx.doi.org/10.5194/isprs-annals-x-4-w1-2022-551-2023.

Full text

Abstract:

Abstract. Keyframes extraction is required and effective for the 3D reconstruction of objects from a thermal video sequence to increase geometric accuracy, reduce the volume of aerial triangulation calculations, and generate the dense point cloud. The primary goal and focus of this paper are to assess the effect of keyframes extraction from the thermal infrared video sequence on the geometric accuracy of the dense point cloud generated. The method of keyframes extraction of thermal infrared video presented in this paper consists of three basic steps. (A) The ability to identify and remove blur frames from non-blur frames in a sequence of recorded frames. (B) The ability to apply the standard baseline condition between sequence frames to establish the overlap condition and prevent the creation of degeneracy conditions. (C) Evaluating degeneracy conditions and keyframes extraction using Geometric Robust Information Criteria (GRIC). The performance evaluation criteria for keyframes extraction in the generation of the thermal infrared dense point cloud in this paper are to assess the increase in density of the generated three-dimensional point cloud and reduce reprojection error. Based on the results and assessments presented in this paper, using keyframes increases the density of the thermal infrared dense point cloud by about 0.03% to 0.10% of points per square meter. It reduces the reprojection error by about 0.005% of pixels (2 times).

APA, Harvard, Vancouver, ISO, and other styles

26

Ji, Hyesung, Danial Hooshyar, Kuekyeng Kim, and Heuiseok Lim. "A semantic-based video scene segmentation using a deep neural network." Journal of Information Science 45, no. 6 (2018): 833–44. http://dx.doi.org/10.1177/0165551518819964.

Full text

Abstract:

Video scene segmentation is very important research in the field of computer vision, because it helps in efficient storage, indexing and retrieval of videos. Achieving this kind of scene segmentation cannot be done by just calculating the similarity of low-level features presented in the video; high-level features should also be considered to achieve a better performance. Even though much research has been conducted on video scene segmentation, most of these studies failed to semantically segment a video into scenes. Thus, in this study, we propose a Deep-learning Semantic-based Scene-segmentation model (called DeepSSS) that considers image captioning to segment a video into scenes semantically. First, the DeepSSS performs shot boundary detection by comparing colour histograms and then employs maximum-entropy-applied keyframe extraction. Second, for semantic analysis, using image captioning that benefits from deep learning generates a semantic text description of the keyframes. Finally, by comparing and analysing the generated texts, it assembles the keyframes into a scene grouped under a semantic narrative. That said, DeepSSS considers both low- and high-level features of videos to achieve a more meaningful scene segmentation. By applying DeepSSS to data sets from MS COCO for caption generation and evaluating its semantic scene-segmentation task results with the data sets from TRECVid 2016, we demonstrate quantitatively that DeepSSS outperforms other existing scene-segmentation methods using shot boundary detection and keyframes. What’s more, the experiments were done by comparing scenes segmented by humans and scene segmented by the DeepSSS. The results verified that the DeepSSS’ segmentation resembled that of humans. This is a new kind of result that was enabled by semantic analysis, which was impossible by just using low-level features of videos.

APA, Harvard, Vancouver, ISO, and other styles

27

Bi, Shusheng, Dongsheng Yang, and Yueri Cai. "Automatic Calibration of Odometry and Robot Extrinsic Parameters Using Multi-Composite-Targets for a Differential-Drive Robot with a Camera." Sensors 18, no. 9 (2018): 3097. http://dx.doi.org/10.3390/s18093097.

Full text

Abstract:

This paper simultaneously calibrates odometry parameters and the relative pose between a monocular camera and a robot automatically. Most camera pose estimation methods use natural features or artificial landmark tools. However, there are mismatches and scale ambiguity for natural features; the large-scale precision landmark tool is also challenging to make. To solve these problems, we propose an automatic process to combine multiple composite targets, select keyframes, and estimate keyframe poses. The composite target consists of an aruco marker and a checkerboard pattern. First, an analytical method is applied to obtain initial values of all calibration parameters; prior knowledge of the calibration parameters is not required. Then, two optimization steps are used to refine the calibration parameters. Planar motion constraints of the camera are introduced in these optimizations. The proposed solution is automatic; manual selection of keyframes, initial values, and robot construction within a specific trajectory are not required. The competing accuracy and stability of the proposed method under different target placements and robot paths are tested experimentally. Positive effects on calibration accuracy and stability are obtained when (1) composite targets are adopted; (2) two optimization steps are used; (3) plane motion constraints are introduced; and (4) target numbers are increased.

APA, Harvard, Vancouver, ISO, and other styles

28

Cao, Tianyang, Haoyuan Cai, Dongming Fang, Hui Huang, and Chang Liu. "Keyframes Global Map Establishing Method for Robot Localization through Content-Based Image Matching." Journal of Robotics 2017 (2017): 1–16. http://dx.doi.org/10.1155/2017/1646095.

Full text

Abstract:

Self-localization and mapping are important for indoor mobile robot. We report a robust algorithm for map building and subsequent localization especially suited for indoor floor-cleaning robots. Common methods, for example, SLAM, can easily be kidnapped by colliding or disturbed by similar objects. Therefore, keyframes global map establishing method for robot localization in multiple rooms and corridors is needed. Content-based image matching is the core of this method. It is designed for the situation, by establishing keyframes containing both floor and distorted wall images. Image distortion, caused by robot view angle and movement, is analyzed and deduced. And an image matching solution is presented, consisting of extraction of overlap regions of keyframes extraction and overlap region rebuild through subblocks matching. For improving accuracy, ceiling points detecting and mismatching subblocks checking methods are incorporated. This matching method can process environment video effectively. In experiments, less than 5% frames are extracted as keyframes to build global map, which have large space distance and overlap each other. Through this method, robot can localize itself by matching its real-time vision frames with our keyframes map. Even with many similar objects/background in the environment or kidnapping robot, robot localization is achieved with position RMSE <0.5 m.

APA, Harvard, Vancouver, ISO, and other styles

29

L. Jimson. and J. P. Ananth. "Local Optimal-Oriented Pattern and Exponential Weighed-Jaya Optimization-Based Deep Convolutional Networks for Video Summarization." International Journal of Swarm Intelligence Research 13, no. 3 (2022): 1–21. http://dx.doi.org/10.4018/ijsir.304403.

Full text

Abstract:

Video summarization is used to generate a short summary video for providing the users a very useful visual and synthetic abstract of the video content. There are various methods are developed for video summarization in existing, still an effective method is required due to some drawbacks, like cost and time. The ultimate goal of the research is to concentrate on an effective video summarization methodology that represents the development of short summary from the entire video stream in an effective manner. At first, the input cricket video consisting of number of frames is given to the keyframe generation phase, which is performed based on Discrete Cosine Transform (DCT) and Euclidean distance for obtaining the keyframes. Then, the residual keyframe generation is carried out based on Deep Convolutional Neural Network (DCNN), which is trained optimally using the proposed Exponential weighed moving average-Jaya (EWMA-Jaya) optimization.

APA, Harvard, Vancouver, ISO, and other styles

30

Kaavya, S., and G. G. Lakshmi Priya. "Static Shot based Keyframe Extraction for Multimedia Event Detection." International Journal of Computer Vision and Image Processing 6, no. 1 (2016): 28–40. http://dx.doi.org/10.4018/ijcvip.2016010103.

Full text

Abstract:

Nowadays, processing of Multimedia information leads to high computational cost due its larger size especially for video processing. In order to reduce the size of the video and to save the user's time in spending their attention on whole video, video summarization is adopted. However, it can be performed using keyframe extraction from the video. To perform this task, a new simple keyframe extraction method is proposed using divide and conquer strategy in which, Scale Invariant Feature Transform (SIFT) based feature representation vector is extracted and the whole video is categorized into static and dynamic shots. The dynamic shot is further processed till it becomes static. A representative frame is extracted from every static shot and the redundant keyframes are removed using keyframe similarity matching measure. Experimental evaluation is carried out and the proposed work is compared with related existing work. The authors' method outperforms existing methods in terms of Precision (P), Recall (R), F-Score (F). Also, Fidelity measure is computed for proposed work which gives better result.

APA, Harvard, Vancouver, ISO, and other styles

31

Muhammad, Bilyamin, Mariam Abdulazeez Ahmed, Ibrahim Haruna, and Usman Ismail Abdullahi. "Keyframe Extraction for Low-Motion Video Summarization Using K-Means Clustering." ELEKTRIKA- Journal of Electrical Engineering 21, no. 2 (2022): 1–6. http://dx.doi.org/10.11113/elektrika.v21n2.332.

Full text

Abstract:

The rate of increase in multimedia data required the need for an improved bandwidth utilization and storage capacity. However, low-motion videos come with a large number of feature-related frames due to its static background. These redundant frames result to difficulty in terms of video streaming, retrieval, and transmission. In other to improve the user experience, video summarization technologies were proposed. These techniques were presented to select representative frames from a full-length video and remove the duplicated ones. Though, an improvement was recorded in the keyframe extraction process. However, a large number of redundant frames were observed to be extracted as keyframes. Therefore, this study presents an improved keyframe extraction scheme for low-motion video summarization. The proposed scheme utilizes a k-means clustering approach to group the feature-related frames within a given video data into number of clusters. Furthermore, a representative frame from each cluster was extracted as keyframe. The results obtained shown that the proposed scheme outperforms the existing scheme in terms of compression ratio, precision and recall rates with a value of 26.62%, 13.78%, and 6.63% respectively

APA, Harvard, Vancouver, ISO, and other styles

32

Priscilla, C. Victoria, and D. Rajeshwari. "Performance Analysis of Spatio-temporal Human Detected Keyframe Extraction." Remittances Review 7, no. 1 (2022): 159–70. http://dx.doi.org/10.47059/rr.v7i1.2404.

Full text

Abstract:

Closed circuit television (CCTV) surveillance for detecting the humans involves an expanded research analysis especially for crime scene detection due to various restraints such as crowded annotation, night footages, and rainy (noisy) clips. The main visualization of the crime scene is to recognize the person in particular obtained in all frames is a challenging task. For this occurrence, Content-Based Video Retrieval (CBVR) method refines the collection of these video frames resulting keyframes to reduce the burden of huge storage. Here, Spatio-Temporal classifiers method as an added advantage with frame differencing and edge detection method reports the human detected keyframes without the termination of background regions in order to negotiate the crime scene more efficiently. The main objective of this paper is to analyze the obtained keyframes with Human detection pointing a distinctive between Spatio-Temporal HOG-SVM and HAAR-like classifier to survey the optimum. Finally, the resulting keyframes mutated with the canny edge detection method by HOG-SVM sequel with greater accuracy level of 98.21% compared to HAAR-like classifier.

APA, Harvard, Vancouver, ISO, and other styles

33

Smirnov, Anton О. "Dynamic map management for Gaussian Splatting SLAM." Control Systems and Computers, no. 2 (306) (July 2024): 3–9. http://dx.doi.org/10.15407/csc.2024.02.003.

Full text

Abstract:

Map representation and management for Simultaneous Localization and Mapping (SLAM) systems is at the core of such algorithms. Being able to efficiently construct new KeyFrames (KF), remove redundant ones, constructing covisibility graphs has direct impact on the performance and accuracy of SLAM. In this work we outline the algorithm for maintaining dynamic map and its management for SLAM algorithm based on Gaussian Splatting as the environment representation. Gaussian Splatting allows for high-fidelity photorealistic environment reconstruction using differentiable rasterization and is able to perform in real-time making it a great candidate for map representation in SLAM. Its end-to-end nature and gradient-based optimization significantly simplifies map optimization, camera pose estimation and KeyFrame management.

APA, Harvard, Vancouver, ISO, and other styles

34

Li, Changyang, and Lap-Fai Yu. "Generating Activity Snippets by Learning Human-Scene Interactions." ACM Transactions on Graphics 42, no. 4 (2023): 1–15. http://dx.doi.org/10.1145/3592096.

Full text

Abstract:

We present an approach to generate virtual activity snippets, which comprise sequenced keyframes of multi-character, multi-object interaction scenarios in 3D environments, by learning from recordings of human-scene interactions. The generation consists of two stages. First, we use a sequential deep graph generative model with a temporal module to iteratively generate keyframe descriptions, which represent abstract interactions using graphs, while preserving spatial-temporal relations through the activities. Second, we devise an optimization framework to instantiate the activity snippets in virtual 3D environments guided by the generated keyframe descriptions. Our approach optimizes the poses of character and object instances encoded by the graph nodes to satisfy the relations and constraints encoded by the graph edges. The instantiation process includes a coarse 2D optimization followed by a fine 3D optimization to effectively explore the complex solution space for placing and posing the instances. Through experiments and a perceptual study, we applied our approach to generate plausible activity snippets under different settings.

APA, Harvard, Vancouver, ISO, and other styles

35

Savran, Kızıltepe Rukiye, John Q. Gan, and Juan José Escobar. "A Novel Keyframe Extraction Method for Video Classification Using Deep Neural Networks." Neural Computing & Applications 35, no. 34 (2023): 24513–24. https://doi.org/10.1007/s00521-021-06322-x.

Full text

Abstract:

Combining convolutional neural networks (CNNs) and recurrent neural networks (RNNs) produces a powerful architecture for video classification problems as spatial–temporal information can be processed simultaneously and effectively. Using transfer learning, this paper presents a comparative study to investigate how temporal information can be utilized to improve the performance of video classification when CNNs and RNNs are combined in various architectures. To enhance the performance of the identified architecture for effective combination of CNN and RNN, a novel action template-based keyframe extraction method is proposed by identifying the informative region of each frame and selecting keyframes based on the similarity between those regions. Extensive experiments on KTH and UCF-101 datasets with ConvLSTM-based video classifiers have been conducted. Experimental results are evaluated using one-way analysis of variance, which reveals the effectiveness of the proposed keyframe extraction method in the sense that it can significantly improve video classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

36

Mohammed, Suhaila N., and Alia K. Abdul Hassan. "The Effect of the Number of Key-Frames on the Facial Emotion Recognition Accuracy." Engineering and Technology Journal 39, no. 1B (2021): 89–100. http://dx.doi.org/10.30684/etj.v39i1b.1806.

Full text

Abstract:

Key-frame selection plays an important role in facial expression recognition systems. It helps in selecting the most representative frames that capture the different poses of the face. The effect of the number of selected keyframes has been studied in this paper to find its impact on the final accuracy of the emotion recognition system. Dynamic and static information is employed to select the most effective key-frames of the facial video with a short response time. Firstly, the absolute difference between the successive frames is used to reduce the number of frames and select the candidate ones which then contribute to the clustering process. The static-based information of the reduced sets of frames is then given to the fuzzy C-Means algorithm to select the best C-frames. The selected keyframes are then fed to a graph mining-based facial emotion recognition system to select the most effective sub-graphs in the given set of keyframes. Different experiments have been conducted using Surrey Audio-Visual Expressed Emotion (SAVEE) database and the results show that the proposed method can effectively capture the keyframes that give the best accuracy with a mean response time equals to 2.89s.

APA, Harvard, Vancouver, ISO, and other styles

37

Sadiq, Bashir Olaniyi, Habeeb Bello-Salau, Latifat Abduraheem-Olaniyi, Bilyaminu Muhammed, and Sikiru Olayinka Zakariyya. "Towards Enhancing Keyframe Extraction Strategy for Summarizing Surveillance Video: An Implementation Study." Journal of ICT Research and Applications 16, no. 2 (2022): 167–83. http://dx.doi.org/10.5614/itbj.ict.res.appl.2022.16.2.5.

Full text

Abstract:

The large amounts of surveillance video data are recorded, containing many redundant video frames, which makes video browsing and retrieval difficult, thus increasing bandwidth utilization, storage capacity, and time consumed. To ensure the reduction in bandwidth utilization and storage capacity to the barest minimum, keyframe extraction strategies have been developed. These strategies are implemented to extract unique keyframes whilst removing redundancies. Despite the achieved improvement in keyframe extraction processes, there still exist a significant number of redundant frames in summarized videos. With a view to addressing this issue, the current paper proposes an enhanced keyframe extraction strategy using k-means clustering and a statistical approach. Surveillance footage, movie clips, advertisements, and sports videos from a benchmark database as well as Compeng IP surveillance videos were used to evaluate the performance of the proposed method. In terms of compression ratio, the results showed that the proposed scheme outperformed existing schemes by 2.82%. This implies that the proposed scheme further removed redundant frames whiles retaining video quality. In terms of video playtime, there was an average reduction of 27.32%, thus making video content retrieval less cumbersome when compared with existing schemes. Implementation was done using MATLAB R2020b.

APA, Harvard, Vancouver, ISO, and other styles

38

Bi, Yanqing, Dong Li, and Yu Luo. "Combining Keyframes and Image Classification for Violent Behavior Recognition." Applied Sciences 12, no. 16 (2022): 8014. http://dx.doi.org/10.3390/app12168014.

Full text

Abstract:

Surveillance cameras are increasingly prevalent in public places, and security services urgently need to monitor violence in real time. However, the current violent-behavior-recognition models focus on spatiotemporal feature extraction, which has high hardware resource requirements and can be affected by numerous interference factors, such as background information and camera movement. Our experiments have found that violent and non-violent video frames can be classified by deep-learning models. Therefore, this paper proposes a keyframe-based violent-behavior-recognition scheme. Our scheme considers video frames as independent events and judges violent events based on whether the number of keyframes exceeds a given threshold, which reduces hardware requirements. Moreover, to overcome interference factors, we propose a new training method in which the background-removed and original image pair facilitates feature extraction of deep-learning models and does not add any complexity to the networks. Comprehensive experiments demonstrate that our scheme achieves state-of-the-art performance for the RLVS, Violent Flow, and Hockey Fights datasets, outperforming existing methods.

APA, Harvard, Vancouver, ISO, and other styles

39

Kaur, Lakhwinder, Turki Aljrees, Ankit Kumar, et al. "Gated Recurrent Units and Recurrent Neural Network Based Multimodal Approach for Automatic Video Summarization." Traitement du Signal 40, no. 3 (2023): 1227–34. http://dx.doi.org/10.18280/ts.400340.

Full text

Abstract:

A typical video record aggregation system requires the concurrent performance of a large number of image processing tasks, including but not limited to image acquisition, pre-processing, segmentation, feature extraction, verification, and description. These tasks must be executed with utmost precision to ensure smooth system performance. Among these tasks, feature extraction and selection are the most critical. Feature extraction involves converting the large-scale image data into smaller mathematical vectors, and this process requires great skill. Various feature extraction models are available, including wavelet, cosine, Fourier, histogram-based, and edge-based models. The key objective of any feature extraction model is to represent the image data with minimal attributes and no loss of information. In this study, we propose a novel feature-variance model that detects differences in video features and generates feature-reduced video frames. These frames are then fed into a GRU-based RNN model, which classifies them as either keyframes or non-keyframes. Keyframes are then extracted to create a summarized video, while non-keyframes are reduced. Various key-frame extraction models are also discussed in this section, followed by a detailed analysis of the proposed summarization model and its results. Finally, we present some interesting observations about the proposed model and suggest ways to improve it.

APA, Harvard, Vancouver, ISO, and other styles

40

Fang, Yichuan, Qingxuan Shi, and Zhen Yang. "Bidirectional Temporal Pose Matching for Tracking." Electronics 13, no. 2 (2024): 442. http://dx.doi.org/10.3390/electronics13020442.

Full text

Abstract:

Multi-person pose tracking is a challenging task. It requires identifying the human poses in each frame and matching them across time. This task still faces two main challenges. Firstly, sudden camera zooming and drastic pose changes between adjacent frames may result in mismatched poses between them. Secondly, the time relationships modeled by most existing methods provide insufficient information in scenarios with long-term occlusion. In this paper, to address the first challenge, we propagate the bounding boxes of the current frame to the previous frame for pose estimation, and match the estimated results with the previous ones, which we call the Backward Temporal Pose-Matching (BTPM) module. To solve the second challenge, we design an Association Across Multiple Frames (AAMF) module that utilizes long-term temporal relationships to supplement tracking information lost in the previous frames as a Re-identification (Re-id) technique. Specifically, we select keyframes with a fixed step size in the videos and label other frames as general frames. In the keyframes, we use the BTPM module and the AAMF module to perform tracking. In the general frames, we propagate poses in the previous frame to the current frame for pose estimation and association, which we call the Forward Temporal Pose-Matching (FTPM) module. If the pose association fails, the current frame will be set as a keyframe, and tracking will be re-performed. In the PoseTrack 2018 benchmark tests, our method shows significant improvements over the baseline methods, with improvements of 2.1 and 1.1 in mean Average Precision (mAP) and Multi-Object Tracking Accuracy (MOTA), respectively.

APA, Harvard, Vancouver, ISO, and other styles

41

Khan, Jalaluddin, Ghufran Ahmad Khan, Jian Ping Li, et al. "Secure Smart Healthcare Monitoring in Industrial Internet of Things (IIoT) Ecosystem with Cosine Function Hybrid Chaotic Map Encryption." Scientific Programming 2022 (March 29, 2022): 1–22. http://dx.doi.org/10.1155/2022/8853448.

Full text

Abstract:

The technological progression is raised as a hybrid ecosystem with the industrial Internet of Things (IIoT). Among them, healthcare is also broadly unified with the Internet of Things to develop an industrial forthcoming system. Utilizing this type of system can be facilitating optimum patient monitoring, competent diagnosis, intensive care, and including the appropriate operation against the existing critical diseases. Due to enormous data theft or privacy leakage, security, and privacy towards patient-based informative data, the preservation of personal patients’ informative data has now become a necessity in the digitized community. The produced article is underlined on handsomely monitoring, perceptively extracted keyframe, and further processed lightweight cosine functions using hybrid way chaotic map keyframe image encryption. Initially, a regular concept of extracted keyframe is deployed to salvage meaningful detected frames by transmitting an alert autonomously to the administration. Then, lightweight cosine function for encryption is employed. This encryption incorporates keyframe exceedingly secure and safe from the outside world or any adversary. Our proposed methodology validates effectiveness throughout the IIoT ecosystem. The produced outcome is highly acceptable and has minimum execution time, robustness, and reasonably adopted cost-effective, secure parameter than any other (keyframes) image encryption methods. Furthermore, this methodology has optimally reduced bandwidth, essential communicating price, transmission cost, storage, and immediately judicious analysis of each occurred activity from the outside world or any adversary to remain secure and confident about the real patient-based data in the smartly developed environment.

APA, Harvard, Vancouver, ISO, and other styles

42

Ma, Xiasheng, Ci Song, Yimin Ji, and Shanlin Zhong. "Related Keyframe Optimization Gaussian–Simultaneous Localization and Mapping: A 3D Gaussian Splatting-Based Simultaneous Localization and Mapping with Related Keyframe Optimization." Applied Sciences 15, no. 3 (2025): 1320. https://doi.org/10.3390/app15031320.

Full text

Abstract:

Simultaneous localization and mapping (SLAM) is the basis for intelligent robots to explore the world. As a promising method for 3D reconstruction, 3D Gaussian splatting (3DGS) integrated with SLAM systems has shown significant potential. However, due to environmental uncertainties, errors in the tracking process with 3D Gaussians can negatively impact SLAM systems. This paper introduces a novel dense RGB-D SLAM system based on 3DGS that refines Gaussians through sub-Gaussians in the camera coordinate system. Additionally, we propose an algorithm to select keyframes closely related to the current frame, optimizing the scene map and pose of the current keyframe. This approach effectively enhances both the tracking and mapping performance. Experiments on high-quality synthetic scenes (Replica dataset) and low-quality real-world scenes (TUM-RGBD and ScanNet datasets) demonstrate that our system achieves competitive performance in tracking and mapping.

APA, Harvard, Vancouver, ISO, and other styles

43

Lei, Ting, Xiao-Feng Liu, Guo-Ping Cai, Yun-Meng Liu, and Pan Liu. "Pose Estimation of a Noncooperative Target Based on Monocular Visual SLAM." International Journal of Aerospace Engineering 2019 (December 7, 2019): 1–14. http://dx.doi.org/10.1155/2019/9086891.

Full text

Abstract:

This paper estimates the pose of a noncooperative space target utilizing a direct method of monocular visual simultaneous location and mapping (SLAM). A Large Scale Direct SLAM (LSD-SLAM) algorithm for pose estimation based on photometric residual of pixel intensities is provided to overcome the limitation of existing feature-based on-orbit pose estimation methods. Firstly, new sequence images of the on-orbit target are continuously inputted, and the pose of each current frame is calculated according to minimizing the photometric residual of pixel intensities. Secondly, frames are distinguished as keyframes or normal frames according to the pose relationship, and these frames are used to optimize the local map points. After that, the optimized local map points are added to the back-end map. Finally, the poses of keyframes are further enumerated and optimized in the back-end thread based on the map points and the photometric residual between the keyframes. Numerical simulations and experiments are carried out to prove the validity of the proposed algorithm, and the results elucidate the effectiveness of the algorithm in estimating the pose of the noncooperative target.

APA, Harvard, Vancouver, ISO, and other styles

44

Chen, Keju, Yun Zhang, Li Zhong, and Yongguo Liu. "Fast Tongue Detection Based on Lightweight Model and Deep Feature Propagation." Electronics 14, no. 7 (2025): 1457. https://doi.org/10.3390/electronics14071457.

Full text

Abstract:

While existing tongue detection methods have achieved good accuracy, the problems of low detection speed and excessive noise in the background area still exist. To address these problems, a fast tongue detection model based on a lightweight model and deep feature propagation (TD-DFP) is proposed. Firstly, a color channel is added to the RGB tongue image to introduce more prominent tongue features. To reduce the computational complexity, keyframes are selected through inter frame differencing, while optical flow maps are used to achieve feature alignment between non-keyframes and keyframes. Secondly, a convolutional neural network with feature pyramid structures is designed to extract multi-scale features, and object detection heads based on depth-wise convolutions are adopted to achieve real-time tongue region detection. In addition, a knowledge distillation module is introduced to improve training performance during the training phase. TD-DFP achieved 82.8% mean average precision (mAP) values and 61.88 frames per second (FPS) values on the tongue dataset. The experimental results indicate that TD-DFP can achieve efficient and accurate tongue detection, achieving real-time tongue detection.

APA, Harvard, Vancouver, ISO, and other styles

45

Li, Ronghao, Pengqi Gao, Xiangyuan Cai, et al. "A Real-Time Incremental Video Mosaic Framework for UAV Remote Sensing." Remote Sensing 15, no. 8 (2023): 2127. http://dx.doi.org/10.3390/rs15082127.

Full text

Abstract:

Unmanned aerial vehicles (UAVs) are becoming increasingly popular in various fields such as agriculture, forest protection, resource exploration, and so on, due to their ability to capture high-resolution images quickly and efficiently at low altitudes. However, real-time image mosaicking of UAV image sequences, especially during long multi-strip flights, remains challenging. In this paper, a real-time incremental UAV image mosaicking framework is proposed, which only uses the UAV image sequence, and does not rely on global positioning system (GPS), ground control points (CGPs), or other auxiliary information. Our framework aims to reduce spatial distortion, increase the speed of the operation in the mosaicking process, and output high-quality panorama. To achieve this goal, we employ several strategies. First, the framework estimates the approximate position of each newly added frame and selects keyframes to improve efficiency. Then, the matching relationship between keyframes and other frames is obtained by using the estimated position. After that, a new optimization method based on minimizing weighted reprojection errors is adopted to carry out precise position calculation of the current frame, so as to reduce the deformation caused by cumulative errors. Finally, the weighted partition fusion method based on the Laplacian pyramid is used to fuse and update the local image in real time to achieve the best mosaic result. We have carried out a series of experiments which show that our system can output high-quality panorama in real time. The proposed keyframe selection strategy and local optimization strategy can minimize cumulative errors, the image fusion strategy is highly robust, and it can effectively improve the panorama quality.

APA, Harvard, Vancouver, ISO, and other styles

46

Wu, Hui, Toan T. Huynh, and Richard Souvenir. "Phase-aware echocardiogram stabilization using keyframes." Medical Image Analysis 35 (January 2017): 172–80. http://dx.doi.org/10.1016/j.media.2016.06.039.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Iyer, Hemalata, and Caitlain Devereaux Lewis. "Prioritization strategies for video storyboard keyframes." Journal of the American Society for Information Science and Technology 58, no. 5 (2007): 629–44. http://dx.doi.org/10.1002/asi.20554.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Halfon, Efraim. "Data Animator — Software that Visualizes Data as Computer-Generated Animation on Personal Computers: an Application to Hamilton Harbour." Water Quality Research Journal 31, no. 3 (1996): 609–22. http://dx.doi.org/10.2166/wqrj.1996.034.

Full text

Abstract:

Abstract Data Animator, V1.0, is a scientific visualization package for microcomputers. Its main purpose is to generate two-dimensional animations from any data set collected over time. Geographical references such as a shore and/or bathymetry information, etc., may be added for additional clarity. Visualization of data as animations greatly simplifies the interpretation of field measurements. Data Animator is designed (but not restricted) to display data collected in aquatic environments, lakes, rivers, estuaries, oceans, etc., in a clear, concise way using colour to represent ranges of data values. Data sets can also be displayed as static images (keyframes). A graphic user interface allows the user to choose viewpoint, fonts, colour palette, data and keyframes. All Data Animator's options can be accessed through a graphical user interface (GUI). Point-and-click mouse operations allow the user to manipulate many features, with immediate on-screen feedback. Animations are generated by defining keyframes of known data, each located at a specific time. The program can then interpolate over time, between keyframes, to create smoothly animated transitions (in-between frames). Two types of graphs can be rendered with Data Animator. Plane-type graphs are horizontal slices at a depth specified by the user. Transect-type graphs are vertical slices along a straight line defined by the user. Data Animator can make use of both shore outline information and three-dimensional bathymetry information. This allows for the generation of realistic-looking graphs that follow the shape of the aquatic environment. Animations can be displayed on a computer monitor or transferred to video tape. pH data from Hamilton Harbour have been visualized and the results are discussed.

APA, Harvard, Vancouver, ISO, and other styles

49

Hu, Xiayun, Xiaobin Hu, Jingxian Li, and Kun You. "Generative Adversarial Networks for Video Summarization Based on Key-frame Selection." Information Technology and Control 52, no. 1 (2023): 185–98. http://dx.doi.org/10.5755/j01.itc.52.1.32278.

Full text

Abstract:

Video summarization based on generative adversarial networks (GANs) has been shown to easily produce more realistic results. However, most summary videos are composed of multiple key components. If the selection of some video frames changes during the training process, the information carried by these frames may not be reasonably reflected in the identification results. In this paper, we propose a video summarization method based on selecting keyframes over GANs. The novelty of the proposed method is the discriminator not only identifies the completeness of the video, but also takes into account the value judgment of the candidate keyframes, thus enabling the influence of keyframes on the result value. Given GANs are mainly designed to generate continuous real values, it is generally challenging to generate discrete symbol sequences during the summarization process directly. However, if the generated sample is based on discrete symbols, the slight guidance change of the discrimination network may be meaningless. To better use the advantages of GANs, the study also adopts the video summarization optimization method of GANs under a collaborative reinforcement learning strategy. Experimental results show the proposed method gets a significant summarization effect and character compared with the existing cutting-edge methods.

APA, Harvard, Vancouver, ISO, and other styles

50

Brodt, Kirill, and Mikhail Bessmeltsev. "Skeleton-Driven Inbetweening of Bitmap Character Drawings." ACM Transactions on Graphics 43, no. 6 (2024): 1–19. http://dx.doi.org/10.1145/3687955.

Full text

Abstract:

One of the primary reasons for the high cost of traditional animation is the inbetweening process, where artists manually draw each intermediate frame necessary for smooth motion. Making this process more efficient has been at the core of computer graphics research for years, yet the industry has adopted very few solutions. Most existing solutions either require vector input or resort to tight inbetweening; often, they attempt to fully automate the process. In industry, however, keyframes are often spaced far apart, drawn in raster format, and contain occlusions. Moreover, inbetweening is fundamentally an artistic process, so the artist should maintain high-level control over it. We address these issues by proposing a novel inbetweening system for bitmap character drawings, supporting both tight and far inbetweening. In our setup, the artist can control motion by animating a skeleton between the keyframe poses. Our system then performs skeleton-based deformation of the bitmap drawings into the same pose and employs discrete optimization and deep learning to blend the deformed images. Besides the skeleton and the two drawn bitmap keyframes, we require very little annotation. However, deforming drawings with occlusions is complex, as it requires a piecewise smooth deformation field. To address this, we observe that this deformation field is smooth when the drawing is lifted into 3D. Our system therefore optimizes topology of a 2.5D partially layered template that we use to lift the drawing into 3D and get the final piecewise-smooth deformaton, effectively resolving occlusions. We validate our system through a series of animations, qualitative and quantitative comparisons, and user studies, demonstrating that our approach consistently outperforms the state of the art and our results are consistent with the viewers' perception. Code and data for our paper are available at http://www-labs.iro.umontreal.ca/~bmpix/inbetweening/.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!