Academic literature on the topic 'Large video dataset'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Large video dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Large video dataset"

1

Yu, Zhou, Dejing Xu, Jun Yu, et al. "ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9127–34. http://dx.doi.org/10.1609/aaai.v33i01.33019127.

Full text
Abstract:
Recent developments in modeling language and vision have been successfully applied to image question answering. It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA). Compared to the image domain where large scale and fully annotated benchmark datasets exists, VideoQA datasets are limited to small scale and are automatically generated, etc. These limitations restrict their applicability in practice. Here we introduce ActivityNet-QA, a fully annotated and large scale VideoQA dataset. The dataset consists of 58,000 QA pairs on 5,800 complex web videos derived from the popular ActivityNet dataset. We present a statistical analysis of our ActivityNet-QA dataset and conduct extensive experiments on it by comparing existing VideoQA baselines. Moreover, we explore various video representation strategies to improve VideoQA performance, especially for long videos.
APA, Harvard, Vancouver, ISO, and other styles
2

Chen, Hanqing, Chunyan Hu, Feifei Lee, et al. "A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval." Sensors 21, no. 9 (2021): 3094. http://dx.doi.org/10.3390/s21093094.

Full text
Abstract:
Recently, with the popularization of camera tools such as mobile phones and the rise of various short video platforms, a lot of videos are being uploaded to the Internet at all times, for which a video retrieval system with fast retrieval speed and high precision is very necessary. Therefore, content-based video retrieval (CBVR) has aroused the interest of many researchers. A typical CBVR system mainly contains the following two essential parts: video feature extraction and similarity comparison. Feature extraction of video is very challenging, previous video retrieval methods are mostly based on extracting features from single video frames, while resulting the loss of temporal information in the videos. Hashing methods are extensively used in multimedia information retrieval due to its retrieval efficiency, but most of them are currently only applied to image retrieval. In order to solve these problems in video retrieval, we build an end-to-end framework called deep supervised video hashing (DSVH), which employs a 3D convolutional neural network (CNN) to obtain spatial-temporal features of videos, then train a set of hash functions by supervised hashing to transfer the video features into binary space and get the compact binary codes of videos. Finally, we use triplet loss for network training. We conduct a lot of experiments on three public video datasets UCF-101, JHMDB and HMDB-51, and the results show that the proposed method has advantages over many state-of-the-art video retrieval methods. Compared with the DVH method, the mAP value of UCF-101 dataset is improved by 9.3%, and the minimum improvement on JHMDB dataset is also increased by 0.3%. At the same time, we also demonstrate the stability of the algorithm in the HMDB-51 dataset.
APA, Harvard, Vancouver, ISO, and other styles
3

Ghorbani, Saeed, Kimia Mahdaviani, Anne Thaler, et al. "MoVi: A large multi-purpose human motion and video dataset." PLOS ONE 16, no. 6 (2021): e0253157. http://dx.doi.org/10.1371/journal.pone.0253157.

Full text
Abstract:
Large high-quality datasets of human body shape and kinematics lay the foundation for modelling and simulation approaches in computer vision, computer graphics, and biomechanics. Creating datasets that combine naturalistic recordings with high-accuracy data about ground truth body shape and pose is challenging because different motion recording systems are either optimized for one or the other. We address this issue in our dataset by using different hardware systems to record partially overlapping information and synchronized data that lend themselves to transfer learning. This multimodal dataset contains 9 hours of optical motion capture data, 17 hours of video data from 4 different points of view recorded by stationary and hand-held cameras, and 6.6 hours of inertial measurement units data recorded from 60 female and 30 male actors performing a collection of 21 everyday actions and sports movements. The processed motion capture data is also available as realistic 3D human meshes. We anticipate use of this dataset for research on human pose estimation, action recognition, motion modelling, gait analysis, and body shape reconstruction.
APA, Harvard, Vancouver, ISO, and other styles
4

Monfort, Mathew, Bolei Zhou, Sarah Bargal, et al. "A Large Scale Video Dataset for Event Recognition." Journal of Vision 18, no. 10 (2018): 753. http://dx.doi.org/10.1167/18.10.753.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pang, Bo, Kaiwen Zha, Yifan Zhang, and Cewu Lu. "Further Understanding Videos through Adverbs: A New Video Task." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11823–30. http://dx.doi.org/10.1609/aaai.v34i07.6855.

Full text
Abstract:
Video understanding is a research hotspot of computer vision and significant progress has been made on video action recognition recently. However, the semantics information contained in actions is not rich enough to build powerful video understanding models. This paper first introduces a new video semantics: the Behavior Adverb (BA), which is a more expressive and difficult one covering subtle and inherent characteristics of human action behavior. To exhaustively decode this semantics, we construct the Videos with Action and Adverb Dataset (VAAD), which is a large-scale dataset with a semantically complete set of BAs. The dataset will be released to the public with this paper. We benchmark several representative video understanding methods (originally for action recognition) on BA and action recognition. The results show that BA recognition task is more challenging than conventional action recognition. Accordingly, we propose the BA Understanding Network (BAUN) to solve this problem and the experiments reveal that our BAUN is more suitable for BA recognition (11% better than I3D). Furthermore, we find these two semantics (action and BA) can propel each other forward to better performance: promoting action recognition results by 3.4% averagely on three standard action recognition datasets (UCF-101, HMDB-51, Kinetics).
APA, Harvard, Vancouver, ISO, and other styles
6

Jia, Jinlu, Zhenyi Lai, Yurong Qian, and Ziqiang Yao. "Aerial Video Trackers Review." Entropy 22, no. 12 (2020): 1358. http://dx.doi.org/10.3390/e22121358.

Full text
Abstract:
Target tracking technology that is based on aerial videos is widely used in many fields; however, this technology has challenges, such as image jitter, target blur, high data dimensionality, and large changes in the target scale. In this paper, the research status of aerial video tracking and the characteristics, background complexity and tracking diversity of aerial video targets are summarized. Based on the findings, the key technologies that are related to tracking are elaborated according to the target type, number of targets and applicable scene system. The tracking algorithms are classified according to the type of target, and the target tracking algorithms that are based on deep learning are classified according to the network structure. Commonly used aerial photography datasets are described, and the accuracies of commonly used target tracking methods are evaluated in an aerial photography dataset, namely, UAV123, and a long-video dataset, namely, UAV20L. Potential problems are discussed, and possible future research directions and corresponding development trends in this field are analyzed and summarized.
APA, Harvard, Vancouver, ISO, and other styles
7

Yang, Tao, Jing Li, Jingyi Yu, Sibing Wang, and Yanning Zhang. "Diverse Scene Stitching from a Large-Scale Aerial Video Dataset." Remote Sensing 7, no. 6 (2015): 6932–49. http://dx.doi.org/10.3390/rs70606932.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tiotsop, Lohic Fotio, Antonio Servetti, and Enrico Masala. "Investigating Prediction Accuracy of Full Reference Objective Video Quality Measures through the ITS4S Dataset." Electronic Imaging 2020, no. 11 (2020): 93–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.11.hvei-093.

Full text
Abstract:
Large subjectively annotated datasets are crucial to the development and testing of objective video quality measures (VQMs). In this work we focus on the recently released ITS4S dataset. Relying on statistical tools, we show that the content of the dataset is rather heterogeneous from the point of view of quality assessment. Such diversity naturally makes the dataset a worthy asset to validate the accuracy of video quality metrics (VQMs). In particular we study the ability of VQMs to model the reduction or the increase of the visibility of distortion due to the spatial activity in the content. The study reveals that VQMs are likely to overestimate the perceived quality of processed video sequences whose source is characterized by few spatial details. We then propose an approach aiming at modeling the impact of spatial activity on distortion visibility when objectively assessing the visual quality of a content. The effectiveness of the proposal is validated on the ITS4S dataset as well as on the Netflix public dataset.
APA, Harvard, Vancouver, ISO, and other styles
9

Hemalatha, C. Sweetlin, Vignesh Sankaran, Vaidehi V, et al. "Symmetric Uncertainty Based Search Space Reduction for Fast Face Recognition." International Journal of Intelligent Information Technologies 14, no. 4 (2018): 77–97. http://dx.doi.org/10.4018/ijiit.2018100105.

Full text
Abstract:
Face recognition from a large video database involves more search time. This article proposes a symmetric uncertainty based search space reduction (SUSSR) methodology that facilitates faster face recognition in video, making it viable for real time surveillance and authentication applications. The proposed methodology employs symmetric uncertainty based feature subset selection to obtain significant features. Further, Fuzzy C-Means clustering is applied to restrict the search to nearest possible cluster, thus speeding up the recognition process. Kullback Leibler's divergence based similarity measure is employed to recognize the query face in video by matching the query frame with that of stored features in the database. The proposed search space reduction methodology is tested upon benchmark video face datasets namely FJU, YouTube celebrities and synthetic datasets namely MIT-Dataset-I and MIT-Dataset-II. Experimental results demonstrate the effectiveness of the proposed methodology with a 10 increase in recognition accuracy and 35 reduction in recognition time.
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Haiqiang, Ioannis Katsavounidis, Jiantong Zhou, et al. "VideoSet: A large-scale compressed video quality dataset based on JND measurement." Journal of Visual Communication and Image Representation 46 (July 2017): 292–302. http://dx.doi.org/10.1016/j.jvcir.2017.04.009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Large video dataset"

1

Åkerlund, Rasmus. "Real-time localization of balls and hands in videos of juggling using a convolutional neural network." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-81204.

Full text
Abstract:
Juggling can be both a recreational activity that provides a wide variety of challenges to participants and an art form that can be performed on stage. Non-learning-based computer vision techniques, depth sensors, and accelerometers have been used in the past to augment these activities. These solutions either require specialized hardware or only work in a very limited set of environments. In this project, a 54 000 frame large video dataset of annotated juggling was created and a convolutional neural network was successfully trained that could locate the balls and hands with high accuracy in a variety of environments. The network was sufficiently light-weight to provide real-time inference on CPUs. In addition, the locations of the balls and hands were recorded for thirty-six common juggling pattern, and small neural networks were trained that could categorize them almost perfectly. By building on the publicly available code, models and datasets that this project has produced jugglers will be able to create interactive juggling games for beginners and novel audio-visual enhancements for live performances.
APA, Harvard, Vancouver, ISO, and other styles
2

Karpenko, Alexandre. "50,000 Tiny Videos: A Large Dataset for Non-parametric Content-based Retrieval and Recognition." Thesis, 2009. http://hdl.handle.net/1807/17690.

Full text
Abstract:
This work extends the tiny image data-mining techniques developed by Torralba et al. to videos. A large dataset of over 50,000 videos was collected from YouTube. This is the largest user-labeled research database of videos available to date. We demonstrate that a large dataset of tiny videos achieves high classification precision in a variety of content-based retrieval and recognition tasks using very simple similarity metrics. Content-based copy detection (CBCD) is evaluated on a standardized dataset, and the results are applied to related video retrieval within tiny videos. We use our similarity metrics to improve text-only video retrieval results. Finally, we apply our large labeled video dataset to various classification tasks. We show that tiny videos are better suited for classifying activities than tiny images. Furthermore, we demonstrate that classification can be improved by combining the tiny images and tiny videos datasets.
APA, Harvard, Vancouver, ISO, and other styles
3

Gebali, Aleya. "Detection of salient events in large datasets of underwater video." Thesis, 2012. http://hdl.handle.net/1828/4156.

Full text
Abstract:
NEPTUNE Canada possesses a large collection of video data for monitoring marine life. Such data is important for marine biologists who can observe species in their natural habitat on a 24/7 basis. It is counterproductive for researchers to manually search for the events of interest (EOI) in a large database. Our study aims to perform the automatic detection of the EOI de ned as animal motion. The output of this approach is in a summary video clip of the original video fi le that contains all salient events with their associated start and end frames. Our work is based on Laptev [1] spatio-temporal interest points detection method. Interest points in the 3D spatio-temporal domain (x,y,t) require frame values in local spatio-temporal volumes to have large variations along all three dimensions. These local intensity variations are captured in the magnitude of the spatio-temporal derivatives. We report experimental results on video summarization using a database of videos from Neptune Canada. The eff ect of several parameters on the performance of the proposed approach is studied in detail.<br>Graduate
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Large video dataset"

1

Lei, Jie, Licheng Yu, Tamara L. Berg, and Mohit Bansal. "TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval." In Computer Vision – ECCV 2020. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58589-1_27.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ma, Yiting, Xuejin Chen, Kai Cheng, Yang Li, and Bin Sun. "LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps." In Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-87240-3_37.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Cojocea, Eduard, and Traian Rebedea. "Exploring a Large Dataset of Educational Videos Using Object Detection Analysis." In Ludic, Co-design and Tools Supporting Smart Learning Ecosystems and Smart Education. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-3930-2_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Lin, Xiang, Brett Cowan, and Alistair Young. "Automated Detection of the Left Ventricle from 4D MR Images: Validation Using Large Clinical Datasets." In Advances in Image and Video Technology. Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11949534_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Cojocea, Eduard, and Traian Rebedea. "Exploratory Analysis of a Large Dataset of Educational Videos: Preliminary Results Using People Tracking." In Ludic, Co-design and Tools Supporting Smart Learning Ecosystems and Smart Education. Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-7383-5_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zucco, Chiara, Barbara Calabrese, and Mario Cannataro. "Emotion Mining: from Unimodal to Multimodal Approaches." In Lecture Notes in Computer Science. Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-82427-3_11.

Full text
Abstract:
AbstractIn the last decade, Sentiment Analysis and Affective Computing have found applications in different domains. In particular, the interest of extracting emotions in healthcare is demonstrated by the various applications which encompass patient monitoring and adverse events prediction. Thanks to the availability of large datasets, most of which are extracted from social media platforms, several techniques for extracting emotion and opinion from different modalities have been proposed, using both unimodal and multimodal approaches. After introducing the basic concepts related to emotion theories, mainly borrowed from social sciences, the present work reviews three basic modalities used in emotion recognition, i.e. textual, audio and video, presenting for each of these i) some basic methodologies, ii) some among the widely used datasets for the training of supervised algorithms and iii) briefly discussing some deep Learning architectures. Furthermore, the paper outlines the challenges and existing resources to perform a multimodal emotion recognition which may improve performances by combining at least two unimodal approaches. architecture to perform multimodal emotion recognition.
APA, Harvard, Vancouver, ISO, and other styles
7

Khaire, Pushpajit A., and Roshan R. Kotkondawar. "Measures of Image and Video Segmentation." In Computer Vision. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-5204-8.ch049.

Full text
Abstract:
Study on Video and Image segmentation is currently limited by the lack of evaluation metrics and benchmark datasets that covers the large variety of sub-problems appearing in image and video segmentation. Proposed chapter provides an analysis of Evaluation Metrics, Datasets for Image and Video Segmentation methods. Importance is on wide-ranging, Datasets robust Metrics which used for evaluation purposes without inducing any bias towards the evaluation results. Introductory Section discusses traditional image and video segmentation methods available, the importance and need of measures, metrics and dataset required to evaluate segmentation algorithms are discussed in next section. Main focus of the chapter explains the measures, metrics and dataset available for evaluation of segmentation techniques of both image and video. The goal is to provide details about a set of impartial datasets and evaluation metrics and to leave the final evaluation of the evaluation process to the understanding of the reader.
APA, Harvard, Vancouver, ISO, and other styles
8

Khaire, Pushpajit A., and Roshan R. Kotkondawar. "Measures of Image and Video Segmentation." In Applied Video Processing in Surveillance and Monitoring Systems. IGI Global, 2017. http://dx.doi.org/10.4018/978-1-5225-1022-2.ch002.

Full text
Abstract:
Study on Video and Image segmentation is currently limited by the lack of evaluation metrics and benchmark datasets that covers the large variety of sub-problems appearing in image and video segmentation. Proposed chapter provides an analysis of Evaluation Metrics, Datasets for Image and Video Segmentation methods. Importance is on wide-ranging, Datasets robust Metrics which used for evaluation purposes without inducing any bias towards the evaluation results. Introductory Section discusses traditional image and video segmentation methods available, the importance and need of measures, metrics and dataset required to evaluate segmentation algorithms are discussed in next section. Main focus of the chapter explains the measures, metrics and dataset available for evaluation of segmentation techniques of both image and video. The goal is to provide details about a set of impartial datasets and evaluation metrics and to leave the final evaluation of the evaluation process to the understanding of the reader.
APA, Harvard, Vancouver, ISO, and other styles
9

Ning, Huazhong, Junxian Wang, Xu Liu, and Ying Shan. "Content and Attention Aware Overlay for Online Video Advertising." In Advances in Multimedia and Interactive Technologies. IGI Global, 2011. http://dx.doi.org/10.4018/978-1-60960-189-8.ch007.

Full text
Abstract:
Recent proliferation of online video advertising brings new opportunities and challenges to the multimedia community. A successful online video advertising system is expected to have the following essential features: effective targeting, scalability, non-intrusiveness, and attractiveness. While scalable systems with targeting capability are emerging, few have achieved the goal of being both non-intrusive and attractive. To our knowledge, this work is the first attempt to generate video overlay ads that balances the two conflicting characteristics. We achieve the goal by jointly optimizing a non-intrusive metric and a set of metrics associated with video ad templates designed by UI experts. The resulting system is able to dynamically create a video overlay ad that effectively attracts user attention at the least intrusive spatial-temporal spots of a video clip. The system is also designed to enable a scalable business model with effective targeting capabilities, and later will be tested with live traffic on a major video publisher site. In this work, we conducted intensive experiments and user studies on the samples of a large-scale video dataset. The results demonstrate the effectiveness of our approach.
APA, Harvard, Vancouver, ISO, and other styles
10

Seal, Ayan, Debotosh Bhattacharjee, Mita Nasipuri, and Dipak Kumar Basu. "Thermal Human Face Recognition for Biometric Security System." In Computer Vision. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-5204-8.ch078.

Full text
Abstract:
Automatic face recognition has been comprehensively studied for more than four decades, since face recognition of individuals has many applications, particularly in human-machine interaction and security. Although face recognition systems have achieved a significant level of maturity with some realistic achievement, face recognition still remains a challenging problem due to large variation in face images. Face recognition techniques can be generally divided into three categories based on the face image acquisition methodology: methods that work on intensity images, those that deal with video sequences, and those that require other sensory (like 3D sensory or infra-red imagery) data. Researchers are using thermal infrared images for face recognition. Since thermal infrared images have some advantages over 2D images. In this chapter, an overview of some of the well-known techniques of face recognition using thermal infrared faces are discussed, and some of the drawbacks and benefits of each of these methods mentioned therein are discussed. This chapter talks about some of the most recent algorithms developed for this purpose, and tries to give a brief idea of the state of the art of face recognition technology. The authors propose one approach for evaluating the performance of face recognition algorithms using thermal infrared images. They also note the results of several classifiers on a benchmark dataset (Terravic Facial Infrared Database).
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Large video dataset"

1

Materzynska, Joanna, Guillaume Berger, Ingo Bax, and Roland Memisevic. "The Jester Dataset: A Large-Scale Video Dataset of Human Gestures." In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019. http://dx.doi.org/10.1109/iccvw.2019.00349.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tanaka, Tsunehiko, and Edgar Simo-Serra. "LoL-V2T: Large-Scale Esports Video Description Dataset." In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2021. http://dx.doi.org/10.1109/cvprw53098.2021.00513.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Karpenko, Alexandre, and Parham Aarabi. "Tiny Videos: A Large Dataset for Image and Video Frame Categorization." In 2009 11th IEEE International Symposium on Multimedia. IEEE, 2009. http://dx.doi.org/10.1109/ism.2009.74.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Xu, Jun, Tao Mei, Ting Yao, and Yong Rui. "MSR-VTT: A Large Video Description Dataset for Bridging Video and Language." In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016. http://dx.doi.org/10.1109/cvpr.2016.571.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Jiang, Qing-Yuan, Yi He, Gen Li, Jian Lin, Lei Li, and Wu-Jun Li. "SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval." In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019. http://dx.doi.org/10.1109/iccv.2019.00538.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wali, Ali, and Adel M. Alimi. "Incremental Learning Approach for Events Detection from Large Video Dataset." In 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2010. http://dx.doi.org/10.1109/avss.2010.54.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Miao, Yinjun, Chao Wang, Peng Cui, Lifen Sun, Pin Tao, and Shiqiang Yang. "HFAG: Hierarchical Frame Affinity Group for video retrieval on very large video dataset." In 2010 17th IEEE International Conference on Image Processing (ICIP 2010). IEEE, 2010. http://dx.doi.org/10.1109/icip.2010.5654073.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tang, Yansong, Dajun Ding, Yongming Rao, et al. "COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis." In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019. http://dx.doi.org/10.1109/cvpr.2019.00130.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Jingzhou, Wenhu Chen, Yu Cheng, et al. "Violin: A Large-Scale Dataset for Video-and-Language Inference." In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020. http://dx.doi.org/10.1109/cvpr42600.2020.01091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Monfort, Mathew, Kandan Ramakrishnan, Dan Gutfreund, and Aude Oliva. "A Large Scale Multi-Label Action Dataset for Video Understanding." In 2018 Conference on Cognitive Computational Neuroscience. Cognitive Computational Neuroscience, 2018. http://dx.doi.org/10.32470/ccn.2018.1137-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!