Log in

Relevant bibliographies by topics / Deep Video Representations

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Deep Video Representations'

Author: Grafiati

Published: 25 May 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Deep Video Representations.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Deep Video Representations"

1

Feichtenhofer, Christoph, Axel Pinz, Richard P. Wildes, and Andrew Zisserman. "Deep Insights into Convolutional Networks for Video Recognition." International Journal of Computer Vision 128, no. 2 (2019): 420–37. http://dx.doi.org/10.1007/s11263-019-01225-w.

Full text

Abstract:

Abstract As the success of deep models has led to their deployment in all areas of computer vision, it is increasingly important to understand how these representations work and what they are capturing. In this paper, we shed light on deep spatiotemporal representations by visualizing the internal representation of models that have been trained to recognize actions in video. We visualize multiple two-stream architectures to show that local detectors for appearance and motion objects arise to form distributed representations for recognizing human actions. Key observations include the following.

APA, Harvard, Vancouver, ISO, and other styles

2

Pandeya, Yagya Raj, Bhuwan Bhattarai, and Joonwhoan Lee. "Deep-Learning-Based Multimodal Emotion Classification for Music Videos." Sensors 21, no. 14 (2021): 4927. http://dx.doi.org/10.3390/s21144927.

Full text

Abstract:

Music videos contain a great deal of visual and acoustic information. Each information source within a music video influences the emotions conveyed through the audio and video, suggesting that only a multimodal approach is capable of achieving efficient affective computing. This paper presents an affective computing system that relies on music, video, and facial expression cues, making it useful for emotional analysis. We applied the audio–video information exchange and boosting methods to regularize the training process and reduced the computational costs by using a separable convolution stra

APA, Harvard, Vancouver, ISO, and other styles

3

Ljubešić, Nikola. "‟Deep lexicography” – Fad or Opportunity?" Rasprave Instituta za hrvatski jezik i jezikoslovlje 46, no. 2 (2020): 839–52. http://dx.doi.org/10.31724/rihjj.46.2.21.

Full text

Abstract:

In recent years, we are witnessing staggering improvements in various semantic data processing tasks due to the developments in the area of deep learning, ranging from image and video processing to speech processing, and natural language understanding. In this paper, we discuss the opportunities and challenges that these developments pose for the area of electronic lexicography. We primarily focus on the concept of representation learning of the basic elements of language, namely words, and the applicability of these word representations to lexicography. We first discuss well-known approaches

APA, Harvard, Vancouver, ISO, and other styles

4

Kumar, Vidit, Vikas Tripathi, and Bhaskar Pant. "Learning Unsupervised Visual Representations using 3D Convolutional Autoencoder with Temporal Contrastive Modeling for Video Retrieval." International Journal of Mathematical, Engineering and Management Sciences 7, no. 2 (2022): 272–87. http://dx.doi.org/10.33889/ijmems.2022.7.2.018.

Full text

Abstract:

The rapid growth of tag-free user-generated videos (on the Internet), surgical recorded videos, and surveillance videos has necessitated the need for effective content-based video retrieval systems. Earlier methods for video representations are based on hand-crafted, which hardly performed well on the video retrieval tasks. Subsequently, deep learning methods have successfully demonstrated their effectiveness in both image and video-related tasks, but at the cost of creating massively labeled datasets. Thus, the economic solution is to use freely available unlabeled web videos for representati

APA, Harvard, Vancouver, ISO, and other styles

5

Vihlman, Mikko, and Arto Visala. "Optical Flow in Deep Visual Tracking." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 12112–19. http://dx.doi.org/10.1609/aaai.v34i07.6890.

Full text

Abstract:

Single-target tracking of generic objects is a difficult task since a trained tracker is given information present only in the first frame of a video. In recent years, increasingly many trackers have been based on deep neural networks that learn generic features relevant for tracking. This paper argues that deep architectures are often fit to learn implicit representations of optical flow. Optical flow is intuitively useful for tracking, but most deep trackers must learn it implicitly. This paper is among the first to study the role of optical flow in deep visual tracking. The architecture of

APA, Harvard, Vancouver, ISO, and other styles

6

Rouast, Philipp V., and Marc T. P. Adam. "Learning Deep Representations for Video-Based Intake Gesture Detection." IEEE Journal of Biomedical and Health Informatics 24, no. 6 (2020): 1727–37. http://dx.doi.org/10.1109/jbhi.2019.2942845.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Li, Jialu, Aishwarya Padmakumar, Gaurav Sukhatme, and Mohit Bansal. "VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (2024): 18517–26. http://dx.doi.org/10.1609/aaai.v38i17.29813.

Full text

Abstract:

Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions. The performance of existing VLN methods is limited by insufficient diversity in navigation environments and limited training data. To address these issues, we propose VLN-Video, which utilizes the diverse outdoor environments present in driving videos in multiple cities in the U.S. augmented with automatically generated navigation instructions and actions to improve outdoor VLN performance. VLN-Video combines the best of intuitive classica

APA, Harvard, Vancouver, ISO, and other styles

8

Hu, Yueyue, Shiliang Sun, Xin Xu, and Jing Zhao. "Multi-View Deep Attention Network for Reinforcement Learning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (2020): 13811–12. http://dx.doi.org/10.1609/aaai.v34i10.7177.

Full text

Abstract:

The representation approximated by a single deep network is usually limited for reinforcement learning agents. We propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning task for the first time. The proposed model approximates a set of strategies from multiple representations and combines these strategies based on attention mechanisms to provide a comprehensive strategy for a single-agent. Experimental results on eight Atari video games show that the MvDAN has effective competitive performance than single-vi

APA, Harvard, Vancouver, ISO, and other styles

9

Dong, Zhen, Chenchen Jing, Mingtao Pei, and Yunde Jia. "Deep CNN based binary hash video representations for face retrieval." Pattern Recognition 81 (September 2018): 357–69. http://dx.doi.org/10.1016/j.patcog.2018.04.014.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Psallidas, Theodoros, and Evaggelos Spyrou. "Video Summarization Based on Feature Fusion and Data Augmentation." Computers 12, no. 9 (2023): 186. http://dx.doi.org/10.3390/computers12090186.

Full text

Abstract:

During the last few years, several technological advances have led to an increase in the creation and consumption of audiovisual multimedia content. Users are overexposed to videos via several social media or video sharing websites and mobile phone applications. For efficient browsing, searching, and navigation across several multimedia collections and repositories, e.g., for finding videos that are relevant to a particular topic or interest, this ever-increasing content should be efficiently described by informative yet concise content representations. A common solution to this problem is the

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Deep Video Representations"

1

Yang, Yang. "Learning Hierarchical Representations for Video Analysis Using Deep Learning." Doctoral diss., University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5892.

Full text

Abstract:

With the exponential growth of the digital data, video content analysis (e.g., action, event recognition) has been drawing increasing attention from computer vision researchers. Effective modeling of the objects, scenes, and motions is critical for visual understanding. Recently there has been a growing interest in the bio-inspired deep learning models, which has shown impressive results in speech and object recognition. The deep learning models are formed by the composition of multiple non-linear transformations of the data, with the goal of yielding more abstract and ultimately more useful r

APA, Harvard, Vancouver, ISO, and other styles

2

Sudhakaran, Swathikiran. "Deep Neural Architectures for Video Representation Learning." Doctoral thesis, Università degli studi di Trento, 2019. https://hdl.handle.net/11572/369191.

Full text

Abstract:

Automated analysis of videos for content understanding is one of the most challenging and well researched areas in computer vision and multimedia. This thesis addresses the problem of video content understanding in the context of action recognition. The major challenge faced by this research problem is the variations of the spatio-temporal patterns that constitute each action category and the difficulty in generating a succinct representation encapsulating these patterns. This thesis considers two important aspects of videos for addressing this problem: (1) a video is a sequence of images with

APA, Harvard, Vancouver, ISO, and other styles

3

Sudhakaran, Swathikiran. "Deep Neural Architectures for Video Representation Learning." Doctoral thesis, University of Trento, 2019. http://eprints-phd.biblio.unitn.it/3731/1/swathi_thesis_rev1.pdf.

Full text

Abstract:

Automated analysis of videos for content understanding is one of the most challenging and well researched areas in computer vision and multimedia. This thesis addresses the problem of video content understanding in the context of action recognition. The major challenge faced by this research problem is the variations of the spatio-temporal patterns that constitute each action category and the difficulty in generating a succinct representation encapsulating these patterns. This thesis considers two important aspects of videos for addressing this problem: (1) a video is a sequence of images with

APA, Harvard, Vancouver, ISO, and other styles

4

Sun, Shuyang. "Designing Motion Representation in Videos." Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/19724.

Full text

Abstract:

Motion representation plays a vital role in the vision-based human action recognition in videos. Generally, the information of a video could be divided into spatial information and temporal information. While the spatial information could be easily described by the RGB images, the design of the motion representation is yet a challenging problem. In order to design a motion representation that is efficient and effective, we design the feature according to two principles. First, to guarantee the robustness, the temporal information should be highly related to the informative modalities, e.g., th

APA, Harvard, Vancouver, ISO, and other styles

5

Mazari, Ahmed. "Apprentissage profond pour la reconnaissance d’actions en vidéos." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS171.

Full text

Abstract:

De nos jours, les contenus vidéos sont omniprésents grâce à Internet et les smartphones, ainsi que les médias sociaux. De nombreuses applications de la vie quotidienne, telles que la vidéo surveillance et la description de contenus vidéos, ainsi que la compréhension de scènes visuelles, nécessitent des technologies sophistiquées pour traiter les données vidéos. Il devient nécessaire de développer des moyens automatiques pour analyser et interpréter la grande quantité de données vidéo disponibles. Dans cette thèse, nous nous intéressons à la reconnaissance d'actions dans les vidéos, c.a.d au pr

APA, Harvard, Vancouver, ISO, and other styles

6

"Video2Vec: Learning Semantic Spatio-Temporal Embedding for Video Representations." Master's thesis, 2016. http://hdl.handle.net/2286/R.I.40765.

Full text

Abstract:

abstract: High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos. Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail

APA, Harvard, Vancouver, ISO, and other styles

7

(7486115), Gagandeep Singh Khanuja. "A STUDY OF REAL TIME SEARCH IN FLOOD SCENES FROM UAV VIDEOS USING DEEP LEARNING TECHNIQUES." Thesis, 2019.

Find full text

Abstract:

<div>Following a natural disaster, one of the most important facet that influence a persons chances of survival/being found out is the time with which they are rescued. Traditional means of search operations involving dogs, ground robots, humanitarian intervention; are time intensive and can be a major bottleneck in search operations. The main aim of these operations is to rescue victims without critical delay in the shortest time possible which can be realized in real-time by using UAVs. With advancements in computational devices and the ability to learn from complex data, deep learning can b

APA, Harvard, Vancouver, ISO, and other styles

8

Souček, Tomáš. "Detekce střihů a vyhledávání známých scén ve videu s pomocí metod hlubokého učení." Master's thesis, 2020. http://www.nusl.cz/ntk/nusl-434967.

Full text

Abstract:

Video retrieval represents a challenging problem with many caveats and sub-problems. This thesis focuses on two of these sub-problems, namely shot transition detection and text-based search. In the case of shot detection, many solutions have been proposed over the last decades. Recently, deep learning-based approaches improved the accuracy of shot transition detection using 3D convolutional architectures and artificially created training data, but one hundred percent accuracy is still an unreachable ideal. In this thesis we present a deep network for shot transition detection TransNet V2 that

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Deep Video Representations"

1

Aguayo, Angela J. Documentary Resistance. Oxford University Press, 2019. http://dx.doi.org/10.1093/oso/9780190676216.001.0001.

Full text

Abstract:

The potential of documentary moving images to foster democratic exchange has been percolating within media production culture for the last century, and now, with mobile cameras at our fingertips and broadcasts circulating through unpredictable social networks, the documentary impulse is coming into its own as a political force of social change. The exploding reach and power of audio and video are multiplying documentary modes of communication. Once considered an outsider media practice, documentary is finding mass appeal in the allure of moving images, collecting participatory audiences that c

APA, Harvard, Vancouver, ISO, and other styles

2

Anderson, Crystal S. Soul in Seoul. University Press of Mississippi, 2020. http://dx.doi.org/10.14325/mississippi/9781496830098.001.0001.

Full text

Abstract:

Soul in Seoul: African American Popular Music and K-pop examines how K-pop cites musical and performative elements of Black popular music culture as well as the ways that fans outside of Korea understand these citations. K-pop represents a hybridized mode of Korean popular music that emerged in the 1990s with global aspirations. Its hybridity combines musical elements from Korean and foreign cultures, particularly rhythm and blues-based genres (R&B) of African American popular music. Korean pop, R&B and hip-hop solo artists and groups engage in citational practices by simultaneously em

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Deep Video Representations"

1

Loban, Rhett. "Designing to produce deep representations." In Embedding Culture into Video Games and Game Design. Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003276289-10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Yao, Yuan, Zhiyuan Liu, Yankai Lin, and Maosong Sun. "Cross-Modal Representation Learning." In Representation Learning for Natural Language Processing. Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-1600-9_7.

Full text

Abstract:

AbstractCross-modal representation learning is an essential part of representation learning, which aims to learn semantic representations for different modalities including text, audio, image and video, etc., and their connections. In this chapter, we introduce the development of cross-modal representation learning from shallow to deep, and from respective to unified in terms of model architectures and learning mechanisms for different modalities and tasks. After that, we review how cross-modal capabilities can contribute to complex real-world applications.

APA, Harvard, Vancouver, ISO, and other styles

3

Mao, Feng, Xiang Wu, Hui Xue, and Rong Zhang. "Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network." In Lecture Notes in Computer Science. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-11018-5_24.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Becerra-Riera, Fabiola, Annette Morales-González, and Heydi Méndez-Vázquez. "Exploring Local Deep Representations for Facial Gender Classification in Videos." In Progress in Artificial Intelligence and Pattern Recognition. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01132-1_12.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Zhao, Kemeng, Liangrui Peng, Ning Ding, Gang Yao, Pei Tang, and Shengjin Wang. "Deep Representation Learning for License Plate Recognition in Low Quality Video Images." In Advances in Visual Computing. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-47966-3_16.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Chen, Yixiong, Chunhui Zhang, Li Liu, et al. "USCL: Pretraining Deep Ultrasound Image Diagnosis Model Through Video Contrastive Representation Learning." In Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87237-3_60.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Dhurgadevi, M., D. Vimal Kumar, R. Senthilkumar, and K. Gunasekaran. "Detection of Video Anomaly in Public With Deep Learning Algorithm." In Advances in Psychology, Mental Health, and Behavioral Studies. IGI Global, 2024. http://dx.doi.org/10.4018/979-8-3693-4143-8.ch004.

Full text

Abstract:

For traffic control and public safety, predicting the movement of people is crucial. The presented scheme entails the development of a wider network that can better satisfy created synthetic images by connecting spatial representations to temporal ones. The authors exclusively use the frames from those occurrences to create the dense optical flow for their corresponding normal events. In order to eliminate false-positive detection findings, they determine the local pixel reconstruction error. This particle prediction model and a likelihood model for giving these particles weights are both suggested. These models effectively use the variable-sized cell structure to produce sceneries with variable-sized sub-regions. It also successfully extracts and utilizes the video frame's size, motion, and position information. On the UCSD and LIVE datasets, the proposed framework is evaluated with the most recent algorithms reported in the literature. With a significantly shorter processing time, the suggested technique surpasses state-of-the-art techniques in relation to decreased equal error rate .

APA, Harvard, Vancouver, ISO, and other styles

8

Asma, Stephen T. "Drama In The Diorama: The Confederation & Art and Science." In Stuffed Animals & pickled Heads. Oxford University PressNew York, NY, 2001. http://dx.doi.org/10.1093/oso/9780195130508.003.0007.

Full text

Abstract:

Abstract The Museums That We’ve Studied throughout this journey reveal the tremendous diversity of goals and motives for collecting and displaying elements of the natural world. Yet underneath all these various constructions of nature, there has been a continuous dialogue between image-making activities and knowledge-producing activities. Unlike texts, natural history museums are inherently aesthetic representations of science in particular and conceptual ideas in general. The fact that a roulette wheel at the Field could touch the central nerves of our deep metaphysical convictions is an indication of a museum’s epistemic potential. After spending long stretches in many natural history museums, one begins to see that a display’s potential for education and transformation is largely a function of its artistic, nondiscursive character. Three-dimensional representations of nature (dioramas), two-dimensional and three-dimensional representations of concepts (such as the roulette wheel), and visual images generally are not just candy coatings on the real educational process of textual information transmission. This chapter explores how and why visual communication works on museum visitors. And this requires an examination of the more general issue of how images themselves can be pedagogical, an issue that extends from da Vinci’s anatomy drawings to the latest video edutainment technology. These issues lead to a survey of some of the most recent trends in museology, followed by some reflections on the museum at the millennium.

APA, Harvard, Vancouver, ISO, and other styles

9

Verma, Gyanendra K. "Emotions Modelling in 3D Space." In Multimodal Affective Computing: Affective Information Representation, Modelling, and Analysis. BENTHAM SCIENCE PUBLISHERS, 2023. http://dx.doi.org/10.2174/9789815124453123010013.

Full text

Abstract:

In this study, we have discussed emotion representation in two and three.dimensional space. The three-dimensional space is based on the three emotion primitives, i.e., valence, arousal, and dominance. The multimodal cues used in this study are EEG, Physiological signals, and video (under limitations). Due to the limited emotional content in videos from the DEAP database, we have considered only three classes of emotions, i.e., happy, sad, and terrible. The wavelet transforms, a classical transform, were employed for multi-resolution analysis of signals to extract features. We have evaluated the proposed emotion model with standard multimodal datasets, DEAP. The experimental results show that SVM and MLP can predict emotions in single and multimodal cues.

APA, Harvard, Vancouver, ISO, and other styles

10

Nandal, Priyanka. "Motion Imitation for Monocular Videos." In Examining the Impact of Deep Learning and IoT on Multi-Industry Applications. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-7511-6.ch008.

Full text

Abstract:

This work represents a simple method for motion transfer (i.e., given a source video of a subject [person] performing some movements or in motion, that movement/motion is transferred to amateur target in different motion). The pose is used as an intermediate representation to perform this translation. To transfer the motion of the source subject to the target subject, the pose is extracted from the source subject, and then the target subject is generated by applying the learned pose to-appearance mapping. To perform this translation, the video is considered as a set of images consisting of all the frames. Generative adversarial networks (GANs) are used to transfer the motion from source subject to the target subject. GANs are an evolving field of deep learning.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Deep Video Representations"

1

Morere, Olivier, Hanlin Goh, Antoine Veillard, Vijay Chandrasekhar, and Jie Lin. "Co-regularized deep representations for video summarization." In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015. http://dx.doi.org/10.1109/icip.2015.7351387.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Yu, Feiwu, Xinxiao Wu, Yuchao Sun, and Lixin Duan. "Exploiting Images for Video Recognition with Hierarchical Generative Adversarial Networks." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/154.

Full text

Abstract:

Existing deep learning methods of video recognition usually require a large number of labeled videos for training. But for a new task, videos are often unlabeled and it is also time-consuming and labor-intensive to annotate them. Instead of human annotation, we try to make use of existing fully labeled images to help recognize those videos. However, due to the problem of domain shifts and heterogeneous feature representations, the performance of classifiers trained on images may be dramatically degraded for video recognition tasks. In this paper, we propose a novel method, called Hierarchical

APA, Harvard, Vancouver, ISO, and other styles

3

Pernici, Federico, Federico Bartoli, Matteo Bruni, and Alberto Del Bimbo. "Memory Based Online Learning of Deep Representations from Video Streams." In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018. http://dx.doi.org/10.1109/cvpr.2018.00247.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Jung, Ilchae, Minji Kim, Eunhyeok Park, and Bohyung Han. "Online Hybrid Lightweight Representations Learning: Its Application to Visual Tracking." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/140.

Full text

Abstract:

This paper presents a novel hybrid representation learning framework for streaming data, where an image frame in a video is modeled by an ensemble of two distinct deep neural networks; one is a low-bit quantized network and the other is a lightweight full-precision network. The former learns coarse primary information with low cost while the latter conveys residual information for high fidelity to original representations. The proposed parallel architecture is effective to maintain complementary information since fixed-point arithmetic can be utilized in the quantized network and the lightweig

APA, Harvard, Vancouver, ISO, and other styles

5

Garcia-Gonzalez, Jorge, Rafael M. Luque-Baena, Juan M. Ortiz-de-Lazcano-Lobato, and Ezequiel Lopez-Rubio. "Moving Object Detection in Noisy Video Sequences Using Deep Convolutional Disentangled Representations." In 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022. http://dx.doi.org/10.1109/icip46576.2022.9897305.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Parchami, Mostafa, Saman Bashbaghi, Eric Granger, and Saif Sayed. "Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition." In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2017. http://dx.doi.org/10.1109/avss.2017.8078553.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Bueno-Benito, Elena, Biel Tura, and Mariella Dimiccoli. "Leveraging Triplet Loss for Unsupervised Action Segmentation." In LatinX in AI at Computer Vision and Pattern Recognition Conference 2023. Journal of LatinX in AI Research, 2023. http://dx.doi.org/10.52591/lxai202306185.

Full text

Abstract:

In this paper, we propose a novel fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself, without requiring any training data. Our method is a deep metric learning approach rooted in a shallow network with a triplet loss operating on similarity distributions and a novel triplet selection strategy that effectively models temporal and semantic priors to discover actions in the new representational space. Under these circumstances, we successfully recover temporal boundaries in the learned action representations

APA, Harvard, Vancouver, ISO, and other styles

8

Kich, Victor Augusto, Junior Costa de Jesus, Ricardo Bedin Grando, Alisson Henrique Kolling, Gabriel Vinícius Heisler, and Rodrigo da Silva Guerra. "Deep Reinforcement Learning Using a Low-Dimensional Observation Filter for Visual Complex Video Game Playing." In Anais Estendidos do Simpósio Brasileiro de Games e Entretenimento Digital. Sociedade Brasileira de Computação, 2021. http://dx.doi.org/10.5753/sbgames_estendido.2021.19659.

Full text

Abstract:

Deep Reinforcement Learning (DRL) has produced great achievements since it was proposed, including the possibility of processing raw vision input data. However, training an agent to perform tasks based on image feedback remains a challenge. It requires the processing of large amounts of data from high-dimensional observation spaces, frame by frame, and the agent's actions are computed according to deep neural network policies, end-to-end. Image pre-processing is an effective way of reducing these high dimensional spaces, eliminating unnecessary information present in the scene, supporting the

APA, Harvard, Vancouver, ISO, and other styles

9

Fan, Tingyu, Linyao Gao, Yiling Xu, Zhu Li, and Dong Wang. "D-DPCC: Deep Dynamic Point Cloud Compression via 3D Motion Prediction." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/126.

Full text

Abstract:

The non-uniformly distributed nature of the 3D Dynamic Point Cloud (DPC) brings significant challenges to its high-efficient inter-frame compression. This paper proposes a novel 3D sparse convolution-based Deep Dynamic Point Cloud Compression (D-DPCC) network to compensate and compress the DPC geometry with 3D motion estimation and motion compensation in the feature space. In the proposed D-DPCC network, we design a Multi-scale Motion Fusion (MMF) module to accurately estimate the 3D optical flow between the feature representations of adjacent point cloud frames. Specifically, we utilize a 3D

APA, Harvard, Vancouver, ISO, and other styles

10

Li, Yang, Kan Li, and Xinxin Wang. "Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/112.

Full text

Abstract:

In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!