Log in

Relevant bibliographies by topics / Videos / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Videos.

Dissertations / Theses on the topic 'Videos'

Author: Grafiati

Published: 4 June 2021

Last updated: 8 March 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Videos.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Lindskog, Eric, and Wrang Jesper. "Design of video players for branched videos." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148592.

Full text

Abstract:

Interactive branched video allows users to make viewing decisions while watching, that affect the playback path of the video and potentially the outcome of the story. This type of video introduces new challenges in terms of design, for example displaying the playback progress, the structure of the branched video as well as the choices that the viewers can make. In this thesis we test three implementations of working video players with different types of playback bars: one fully viewed with no moving parts, one that zooms into the currently watched section of the video, and one that leverages a fisheye distortion. A number of usability tests are carried out using surveys complemented with observations made during the tests. Based on these user tests we concluded that the implementation with a zoomed in playback bar was the easiest to understand and that fisheye effect received mixed results, ranging from distracting and annoying to interesting and clear. With this feedback a new set of implementations was created and solutions for each component of the video player were identified. These new implementations support more general solutions for the shape of the branch segments and the position and location of the choices for upcoming branches. The new implementations have not gone through any testing, but we expect that future work can further explore this subject with the help of our code and suggestions.

APA, Harvard, Vancouver, ISO, and other styles

2

Ogata, Atsushi. "Meditative videos." Thesis, Massachusetts Institute of Technology, 1988. http://hdl.handle.net/1721.1/78990.

Full text

Abstract:

Thesis (M.S.V.S.)--Massachusetts Institute of Technology, Dept. of Architecture, 1988.
Includes bibliographical references (leaves 82-85). Filmography: leaves 86-87. Videography: leaves 88-92.
My intention is to provide "meditative" moments to all of us who must struggle with the fast pace of the modern world. These "meditative" moments are both calming and engaging. They resemble the moment of "satori," or "opening of mind," in Zen. Zen, embedded in the culture of Japan, is closely related to my work and sensibility. In realizing my intention, I have chosen the medium of video. Video can reach a wide audience through broadcasting and home videos. Its photographic ability allows us to directly record and celebrate our natural environment. As video is an experience in time, it can create quiet soothing sounds and slow subtle movements. It allows us time to "tune in" to the rhythm of the piece. In my video work, I depict basic natural elements such as light, water, and clouds, and their relationship to animate beings.
by Atsushi Ogata.
M.S.V.S.

APA, Harvard, Vancouver, ISO, and other styles

3

Stewart, Richard Christopher. "Effective audio for music videos : the production of an instructional video outlining audio production techniques for amateur music videos." Instructions for remote access. Click here to access this electronic resource. Access available to Kutztown University faculty, staff, and students only, 1996. http://www.kutztown.edu/library/services/remote_access.asp.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Sedlařík, Vladimír. "Informační strategie firmy." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2012. http://www.nusl.cz/ntk/nusl-223526.

Full text

Abstract:

This thesis analyzes the YouTube service and describes its main deficiencies. Based on theoretical methods and analyses, its main goal is to design a service that will solve the main YouTube problems, build a company around this service and introduce this service to the market. This service will not replace YouTube, but it will supplement it. Further, this work will suggest a possible structure, strategy and information strategy of this new company and its estimated financial results in the first few years.

APA, Harvard, Vancouver, ISO, and other styles

5

Lindgren, Björn. "Erfarenheter och åsikter om videos : Instrumentlärare om videos som undervisningsmaterial." Thesis, Linnéuniversitetet, Institutionen för musik och bild (MB), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-65800.

Full text

Abstract:

I en undersökning vars syfte är att lyfta instrumentlärares erfarenheter och åsikter om videos som undervisningsmaterial har fem informanter intervjuats. Materialet som samlats in presenterar efter analys informanternas erfarenheter och åsikter genom Koehler och Mishras (2009) TPaCK-modell. Det som framgår efter analys är informanternas olika vilja, intresse och kunnande för att involvera videos och deras erfarenheter och åsikter om att involvera videos i sin undervisning. I resultatet presenteras en gemensam åsikt om videos som en visuell resurs där åskådaren både kan se och höra. Videos främsta egenskap är också enligt informanterna att verka som en möjlig förlängning av lektionen. I diskussionen framgår dock att videos inte är någon självklar del av undervisningen utan hänger på instrumentlärares intresse, kunnande och vilja att involvera videos i sin undervisning.

APA, Harvard, Vancouver, ISO, and other styles

6

Chen, Juan. "Content-based Digital Video Processing. Digital Videos Segmentation, Retrieval and Interpretation." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4256.

Full text

Abstract:

Recent research approaches in semantics based video content analysis require shot boundary detection as the first step to divide video sequences into sections. Furthermore, with the advances in networking and computing capability, efficient retrieval of multimedia data has become an important issue. Content-based retrieval technologies have been widely implemented to protect intellectual property rights (IPR). In addition, automatic recognition of highlights from videos is a fundamental and challenging problem for content-based indexing and retrieval applications. In this thesis, a paradigm is proposed to segment, retrieve and interpret digital videos. Five algorithms are presented to solve the video segmentation task. Firstly, a simple shot cut detection algorithm is designed for real-time implementation. Secondly, a systematic method is proposed for shot detection using content-based rules and FSM (finite state machine). Thirdly, the shot detection is implemented using local and global indicators. Fourthly, a context awareness approach is proposed to detect shot boundaries. Fifthly, a fuzzy logic method is implemented for shot detection. Furthermore, a novel analysis approach is presented for the detection of video copies. It is robust to complicated distortions and capable of locating the copy of segments inside original videos. Then, iv objects and events are extracted from MPEG Sequences for Video Highlights Indexing and Retrieval. Finally, a human fighting detection algorithm is proposed for movie annotation.

APA, Harvard, Vancouver, ISO, and other styles

7

Potapov, Danila. "Supervised Learning Approaches for Automatic Structuring of Videos." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM023/document.

Full text

Abstract:

L'Interprétation automatique de vidéos est un horizon qui demeure difficile a atteindre en utilisant les approches actuelles de vision par ordinateur. Une des principales difficultés est d'aller au-delà des descripteurs visuels actuels (de même que pour les autres modalités, audio, textuelle, etc) pour pouvoir mettre en oeuvre des algorithmes qui permettraient de reconnaitre automatiquement des sections de vidéos, potentiellement longues, dont le contenu appartient à une certaine catégorie définie de manière sémantique. Un exemple d'une telle section de vidéo serait une séquence ou une personne serait en train de pêcher; un autre exemple serait une dispute entre le héros et le méchant dans un film d'action hollywoodien. Dans ce manuscrit, nous présentons plusieurs contributions qui vont dans le sens de cet objectif ambitieux, en nous concentrant sur trois tâches d'analyse de vidéos: le résumé automatique, la classification, la localisation temporelle.Tout d'abord, nous introduisons une approche pour le résumé automatique de vidéos, qui fournit un résumé de courte durée et informatif de vidéos pouvant être très longues, résumé qui est de plus adapté à la catégorie de vidéos considérée. Nous introduisons également une nouvelle base de vidéos pour l'évaluation de méthodes de résumé automatique, appelé MED-Summaries, ou chaque plan est annoté avec un score d'importance, ainsi qu'un ensemble de programmes informatiques pour le calcul des métriques d'évaluation.Deuxièmement, nous introduisons une nouvelle base de films de cinéma annotés, appelée Inria Action Movies, constitué de films d'action hollywoodiens, dont les plans sont annotés suivant des catégories sémantiques non-exclusives, dont la définition est suffisamment large pour couvrir l'ensemble du film. Un exemple de catégorie est "course-poursuite"; un autre exemple est "scène sentimentale". Nous proposons une approche pour localiser les sections de vidéos appartenant à chaque catégorie et apprendre les dépendances temporelles entre les occurrences de chaque catégorie.Troisièmement, nous décrivons les différentes versions du système développé pour la compétition de détection d'événement vidéo TRECVID Multimédia Event Detection, entre 2011 et 2014, en soulignant les composantes du système dont l'auteur du manuscrit était responsable
Automatic interpretation and understanding of videos still remains at the frontier of computer vision. The core challenge is to lift the expressive power of the current visual features (as well as features from other modalities, such as audio or text) to be able to automatically recognize typical video sections, with low temporal saliency yet high semantic expression. Examples of such long events include video sections where someone is fishing (TRECVID Multimedia Event Detection), or where the hero argues with a villain in a Hollywood action movie (Inria Action Movies). In this manuscript, we present several contributions towards this goal, focusing on three video analysis tasks: summarization, classification, localisation.First, we propose an automatic video summarization method, yielding a short and highly informative video summary of potentially long videos, tailored for specified categories of videos. We also introduce a new dataset for evaluation of video summarization methods, called MED-Summaries, which contains complete importance-scorings annotations of the videos, along with a complete set of evaluation tools.Second, we introduce a new dataset, called Inria Action Movies, consisting of long movies, and annotated with non-exclusive semantic categories (called beat-categories), whose definition is broad enough to cover most of the movie footage. Categories such as "pursuit" or "romance" in action movies are examples of beat-categories. We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots.Third, we overview the Inria event classification system developed within the TRECVID Multimedia Event Detection competition and highlight the contributions made during the work on this thesis from 2011 to 2014

APA, Harvard, Vancouver, ISO, and other styles

8

Liu, Yunjun 1977. "Creating animated mosaic videos." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=84053.

Full text

Abstract:

Animated mosaics are a traditional form of stop-motion animation created by arranging and rearranging small objects or tiles from frame to frame. While this animation style is uniquely compelling, the traditional process of manually placing and then moving tiles in each frame is time-consuming and laborious. Recent work has proposed algorithms for static mosaics, but generating temporally coherent mosaic animations has remained open. This thesis presents several contributions to the animated mosaic problem in the context of a larger system for creating mosaic animations. Specifically, this thesis describes contributions for enabling an animator to relatively quickly and easily specify the desired animation using Scalable Vector Graphics (SVG), as well as an initial exploration of 3D packing for mosaic animations based on a successful 2D packing approach.

APA, Harvard, Vancouver, ISO, and other styles

9

Cui, Yingnan S. M. Massachusetts Institute of Technology. "On learning from videos/." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/120233.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Department of Mechanical Engineering, 2018.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 95-97).
The robot phone disassembly task is difficult in many ways: It has requirements on high precision, high speed, and should be general to all types of cell phones. Previous works on robot learning from demonstration are hardly applicable due to the complexity of teaching, huge amounts of data and difficulty in generalization. To tackle these problems, we try to learn from videos and extract useful information for the robot. To reduce the amounts of data we need to process, we generate a mask for the video and observe only the region of interest. Inspired by the idea that spatio-temporal interest point (STIP) detector may give meaningful points such as the contact point between the tool and the part, we design a new method of detecting STIPs based on optical flow. We also design a new descriptor by modifying the histogram of optical flow. The STIP detector and descriptor together can make sure that the features are invariant to scale, rotation and noises. Using the modified histogram of optical flow descriptor, we show that even without considering raw pixels of the original video, we can achieve pretty good classification results.
by Yingnan Cui.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

10

Touliatou, Georgia. "Diegetic stories in a video mediation : a narrative analysis of four videos." Thesis, University of Surrey, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.397132.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Karlsson, Simon Gustav. "Subgenres in Swedish music videos - A neo-formalistic analysis of fifteen music videos." Thesis, Linköpings universitet, Institutionen för teknik och naturvetenskap, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-96274.

Full text

Abstract:

Detta examensarbete är en analys av femton svenska musikvideor. Syftet är att undersöka likheter och skillnader i berättande, stilar och tekniker som används. Uppsatsen jämför även de videor som undersöks för att se om det finns gemensamma kontaktpunkter i deras sammansättning samt om videons uppbyggnad styrs av musikgenren den representerar. Många tidigare analyser inom området utgår från musikvideons roll i samhället; politiskt, ekonomiskt eller historiskt. De koncentrerar sig på innebörden av musikvideons budskap. Till skillnad från dessa utgångspunkter, undersöker denna uppsats filmens uppbyggnad ur ett neoformalistiskt perspektiv. Det vill säga att den utgår från teoribyggen hämtade från kognitions- och perceptionspsykologin och som sedan applicerats på film. Dessa teorier bryter ner filmen i sina beståndsdelar för att se hur den är uppbyggd och hur den berättar. Filmens beståndsdelar sträcker sig från form, via stil och teknik till musik. Resultatet är en analys gjord på femton videor utifrån dessa teorier och mynnar ut i en sammanfattning där skillnader och likheter kan ses. Där kan man bland annat se att de flesta videorna har en narrativ struktur, har artisten som huvudmotiv, och arbetar med olika tekniker och teman för att rama in motivet. Det finns många likheter och paralleller i de olika videorna – men även en hel del olikheter. Den tydligaste faktorn som binder ihop olika videorna är, vid sidan om berättarstrukturen, videons stil. Använder man ordet musikstil istället för musikgenre, blir sambandet lättare att förstå. Undersökningen visar att videons stil och musikstilen, främst inom hip hop-genren i mångt och mycket går hand i hand. Detta samband är tydligt och påvisar att hip hop inte bara är en musikgenre, utan kan i förlägningen även ses som en subgenre till svenska musikvideogenren. Uppsatsen visar även en tendens av avsaknat användande av musiken i musikvideorna. Det är främst stilen, berättarstrukturen och funktionerna av olika tekniker som leder videon vidare snarare än musiken. Detta skulle kunna förklaras av att regissörerna kommer från en klassisk filmbakgrund snarare än en musikerbakgrund. För framtida forskare skulle uppsatsen kunna ligga till grund för en mer omfattande analys av eventuellt fler videor eller analys ur ett internationellt perspektiv. För att starkare kunna förankra sambandet mellan musikvideors berättande och spelfilm, kunde en undersökning göras på regissörer, som arbetat med både musikvideor och spelfilm. Man kunde då på ett mer säkert sätt undersöka om det fanns paralleller i berättarform och stil mellan de olika filmtyperna.

APA, Harvard, Vancouver, ISO, and other styles

12

Berigny, Wall Caitilin de. "Documentary transforms into video installation via the processes of intertextuality and detournement /." Canberra : University of Canberra, 2006. http://erl.canberra.edu.au/public/adt-AUC20070723.103335/index.html.

Full text

Abstract:

Thesis (PhD) -- University of Canberra, 2007.
Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at the University of Canberra, May 2007. Includes filmography (leaves 124-126) and bibliography (leaves 130-136). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

13

Anegekuh, Louis. "Video content-based QoE prediction for HEVC encoded videos delivered over IP networks." Thesis, University of Plymouth, 2015. http://hdl.handle.net/10026.1/3377.

Full text

Abstract:

The recently released High Efficiency Video Coding (HEVC) standard, which halves the transmission bandwidth requirement of encoded video for almost the same quality when compared to H.264/AVC, and the availability of increased network bandwidth (e.g. from 2 Mbps for 3G networks to almost 100 Mbps for 4G/LTE) have led to the proliferation of video streaming services. Based on these major innovations, the prevalence and diversity of video application are set to increase over the coming years. However, the popularity and success of current and future video applications will depend on the perceived quality of experience (QoE) of end users. How to measure or predict the QoE of delivered services becomes an important and inevitable task for both service and network providers. Video quality can be measured either subjectively or objectively. Subjective quality measurement is the most reliable method of determining the quality of multimedia applications because of its direct link to users’ experience. However, this approach is time consuming and expensive and hence the need for an objective method that can produce results that are comparable with those of subjective testing. In general, video quality is impacted by impairments caused by the encoder and the transmission network. However, videos encoded and transmitted over an error-prone network have different quality measurements even under the same encoder setting and network quality of service (NQoS). This indicates that, in addition to encoder settings and network impairment, there may be other key parameters that impact video quality. In this project, it is hypothesised that video content type is one of the key parameters that may impact the quality of streamed videos. Based on this assertion, parameters related to video content type are extracted and used to develop a single metric that quantifies the content type of different video sequences. The proposed content type metric is then used together with encoding parameter settings and NQoS to develop content-based video quality models that estimate the quality of different video sequences delivered over IP-based network. This project led to the following main contributions: (1) A new metric for quantifying video content type based on the spatiotemporal features extracted from the encoded bitstream. (2) The development of novel subjective test approach for video streaming services. (3) New content-based video quality prediction models for predicting the QoE of video sequences delivered over IP-based networks. The models have been evaluated using subjective and objective methods.

APA, Harvard, Vancouver, ISO, and other styles

14

Dye, Brigham R. "Reliability of Pre-Service Teachers Coding of Teaching Videos Using Video-Annotation Tools." BYU ScholarsArchive, 2007. https://scholarsarchive.byu.edu/etd/990.

Full text

Abstract:

Teacher education programs that aspire to helping pre-service teachers develop expertise must help students engage in deliberate practice along dimensions of teaching expertise. However, field teaching experiences often lack the quantity and quality of feedback that is needed to help students engage in meaningful teaching practice. The limited availability of supervising teachers makes it difficult to personally observe and evaluate each student teacher's field teaching performances. Furthermore, when a supervising teacher debriefs such an observation, the supervising teacher and student may struggle to communicate meaningfully about the teaching performance. This is because the student teacher and supervisor often have very different perceptions of the same teaching performance. Video analysis tools show promise for improving the quality of feedback student teachers receive in their teaching performance by providing a common reference for evaluative debriefing and allowing students to generate their own feedback by coding videos of their own teaching. This study investigates the reliability of pre-service teacher coding using a video analysis tool. This study found that students were moderately reliable coders when coding video of an expert teacher (49%-68%). However, when the reliability of student coding of their own teaching videos was audited, students showed a high degree of accuracy (91%). These contrasting findings suggest that coding reliability scores may not be simple indicators of student understanding of the teaching competencies represented by a coding scheme. Instead, reliability scores may also be subject to the influence of extraneous factors. For example, reliability scores in this study were influenced by differences in the technical aspects of how students implemented the coding system. Furthermore, reliability scores were influenced by how coding proficiency was measured. Because this study also suggests that students can be taught to improve their coding reliability, further research may improve reliability scores"-and make them a more valid reflection of student understanding of teaching competency-"by training students about the technical aspects of implementing a coding system.

APA, Harvard, Vancouver, ISO, and other styles

15

Naji, Yassine. "Abnormal events detection in videos." Electronic Thesis or Diss., université Paris-Saclay, 2025. http://www.theses.fr/2025UPASG008.

Full text

Abstract:

La détection d'événements anormaux dans les vidéos constitue une tâche complexe en raison de la grande variété des anomalies possibles, du nombre limité de données d'anomalies étiquetées disponibles pour l'entraînement des modèles, ainsi que de la nature contextuelle de la normalité. De surcroît, les données normales et anormales peuvent présenter une forte variabilité intra-classe, rendant leur distinction souvent difficile. Un défi supplémentaire réside dans le manque d'explicabilité des méthodes de détection d'anomalies basées sur l'apprentissage profond qui, bien que performantes, demeurent souvent opaques. Ces facteurs font de la détection d'anomalies dans les vidéos un problème de recherche ouvert. Pour traiter le déséquilibre entre l'abondance de données normales et la rareté de données anormales, la détection des anomalies dans les vidéos est souvent abordée à l'aide du paradigme d'apprentissage « One-Class ». Dans cette approche, les modèles apprennent une distribution des données normales et identifient les anomalies comme des outliers par rapport à cette distribution apprise. Dans cette thèse, nous introduisons des approches pour mieux représenter la diversité des données normales. De plus, nous proposons une approche qui permet à la fois de détecter les anomalies et d'en fournir des explications. Enfin, nous présentons de nouvelles métriques permettant d'évaluer la performance explicative des modèles de détection d'anomalies
The detection of abnormal events in videos is a challenging task due to the wide variety of possible anomalies, the limited availability of labeled anomaly data for model training, and the contextual nature of normality. Moreover, normal and abnormal data can exhibit significant intra-class variability, often making their distinction difficult. An additional challenge lies in the lack of explainability in deep learning-based anomaly detection methods which, although effective, often remain opaque. These factors make anomaly detection in videos an open research problem. To address the imbalance between the abundance of normal data and the scarcity of anomalous data, video anomaly detection is usually approached using the "One-Class" learning paradigm. In this approach, models learn a distribution of normal data and identify anomalies as outliers relative to this learned distribution. In this thesis, we introduce approaches to better represent the diversity of normal data. Additionally, we propose a method that not only detects anomalies but also provides explanations for them. Finally, we present new metrics to evaluate the explainability performance of anomaly detection models

APA, Harvard, Vancouver, ISO, and other styles

16

Rossi, Silvia. "Content characterization of 3D Videos." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10192/.

Full text

Abstract:

Il principale scopo di questa tesi è focalizzato alla ricerca di una caratterizzazione dei contenuti in video 3D. In una prima analisi, le complessità spaziale e temporale di contenuti 3D sono state studiate seguendo le convenzionali tecniche applicate a video 2D. In particolare, Spatial Information (SI) e Temporal Information (TI) sono i due indicatori utilizzati nella caratterizzazione 3D di contenuti spaziali e temporali. Per presentare una descrizione completa di video 3D, deve essere considerata anche la caratterizzazione in termini di profondità. A questo riguardo, nuovi indicatori di profondità sono stati proposti sulla base di valutazioni statistiche degli istogrammi di mappe di profondità. Il primo depth indicator è basato infatti sullo studio della media e deviazione standard della distribuzione dei dati nella depth map. Un'altra metrica proposta in questo lavoro stima la profondità basandosi sul calcolo dell’entropia della depth map. Infine, il quarto algoritmo implementato applica congiuntamente una tecnica di sogliatura (thresholding technique) e analizza i valori residui dell’istogramma calcolando l’indice di Kurtosis. Gli algoritmi proposti sono stati testati con un confronto tra le metriche proposte in questo lavoro e quelle precedenti, ma anche con risultati di test soggettivi. I risultati sperimentali mostrano l’efficacia delle soluzioni proposte nel valutare la profondità in video 3D. Infine, uno dei nuovi indicatori è stato applicato ad un database di video 3D per completare la caratterizzazione di contenuti 3D.

APA, Harvard, Vancouver, ISO, and other styles

17

Krause, Uwe. "Videos related to the maps." Universität Potsdam, 2012. http://opus.kobv.de/ubp/volltexte/2013/6574/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Chaptini, Bassam H. 1978. "Intelligent segmentation of lecture videos." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/84314.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Wang, Ami M. "Lifecycle of viral YouTube videos." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/97377.

Full text

Abstract:

Thesis: S.B., Massachusetts Institute of Technology, Department of Architecture, 2014.
Cataloged from PDF version of thesis.
Includes bibliographical references (page 28).
YouTube was founded in 2005 as a video-sharing website. Today, it's a powerhouse social media platform where users can upload, view, comment, and share content. For many, it's the first site visited when looking for songs, music videos, TV shows, or just general entertainment. Along with the sharing potential provided by social media like Twitter, Facebook, Tumblr, and more, YouTube videos have the potential to spread like wildfire. A term that has been coined to describe such videos is "viral videos." This comes from the scientific definition of viral, which involves the contagious nature of the spread of a virus. Virality on the Internet is not a new concept. Back when email was the hottest new technology, chain e-mails spreading hoaxes and scams were widely shared by emailing back and forth. As the Internet aged, however, new forms of virality have evolved. This thesis looks at a series of 20 viral videos as case studies and analyzes their growth over time via the Lifecycle Theory. By analyzing viral videos in this manner, it aids in a deeper understanding of the human consciousness's affinity for content, the sociology of online sharing, and the context of today's media culture. This thesis proposes that the phenomenon of virality supports the claim of Internet as heterotopia.
by Ami M. Wang.
S.B.

APA, Harvard, Vancouver, ISO, and other styles

20

Kandakatla, Rajeshwari. "Identifying Offensive Videos on YouTube." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1484751212961772.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Sun, Shuyang. "Designing Motion Representation in Videos." Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/19724.

Full text

Abstract:

Motion representation plays a vital role in the vision-based human action recognition in videos. Generally, the information of a video could be divided into spatial information and temporal information. While the spatial information could be easily described by the RGB images, the design of the motion representation is yet a challenging problem. In order to design a motion representation that is efficient and effective, we design the feature according to two principles. First, to guarantee the robustness, the temporal information should be highly related to the informative modalities, e.g., the optical flow. Second, only basic operations could be applied to make the computational cost affordable when extracting the temporal information. Based on these principles, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distil temporal information through a fast and robust approach. The OFF is derived from the definition of optical flow and is orthogonal to the optical flow. The derivation also provides theoretical support for using the difference between two frames. By directly calculating pixel-wise spatiotemporal gradients of the deep feature maps, the OFF could be embedded in any existing CNN based video action recognition framework with only a slight additional cost. It enables the CNN to extract spatiotemporal information. This simple but powerful idea is validated by experimental results. The network with OFF fed only by RGB inputs achieves a competitive accuracy of 93.3% on UCF-101, which is comparable with the result obtained by two streams (RGB and optical flow), but is 15 times faster in speed. Experimental results also show that OFF is complementary to other motion modalities such as optical flow. When the proposed method is plugged into the state-of-the-art video action recognition framework, it has 96.0% and 74.2% accuracy on UCF-101 and HMDB-51 respectively.

APA, Harvard, Vancouver, ISO, and other styles

22

Fan, Quanfu. "Matching Slides to Presentation Videos." Diss., The University of Arizona, 2008. http://hdl.handle.net/10150/195757.

Full text

Abstract:

Video streaming is becoming a major channel for distance learning (or e-learning). A tremendous number of videos for educational purpose are capturedand archived in various e-learning systems today throughout schools, corporations and over the Internet. However, making information searchable and browsable, and presenting results optimally for a wide range of users and systems, remains a challenge.In this work two core algorithms have been developedto support effective browsing and searching of educational videos. The first is a fully automatic approach that recognizes slides in the videowith high accuracy. Built upon SIFT (scale invariant feature transformation) keypoint matching using RANSAC (random sample consensus), the approach is independent of capture systems and can handle a variety of videos with different styles and plentiful ambiguities. In particular, we propose a multi-phase matching pipeline that incrementally identifies slides from the easy ones to the difficult ones. We achieve further robustness by using the matching confidence as part of a dynamic Hidden Markov model (HMM) that integrates temporal information, taking camera operations into account as well.The second algorithm locates slides in the video. We develop a non-linear optimization method (bundle adjustment) to accurately estimate the projective transformations (homographies) between slides and video frames. Different from estimating homography from a single image, our method solves a set of homographies jointly in a frame sequence that is related to a single slide.These two algorithms open up a series of possibilities for making the video content more searchable, browsable and understandable, thus greatly enriching the user's learning experience. Their usefulness has been demonstrated in the SLIC (Semantically Linking Instructional Content) system, which aims to turnsimple video content into fully interactive learning experience for students and scholars.

APA, Harvard, Vancouver, ISO, and other styles

23

Estrada, Rayna Allison. "Appropriate exercise videos for adolescents." CSUSB ScholarWorks, 2003. https://scholarworks.lib.csusb.edu/etd-project/2165.

Full text

Abstract:

The purpose of this project was to review literature for appropriate elements that should make up an adolescent exercise video. Methods consisted of gathering research from twenty-three publications in books and professional journal articles. A review of the literature was examined to create chapters of information and a checklist pertaining to what makes up an appropriate adolescent exercise video.

APA, Harvard, Vancouver, ISO, and other styles

24

Dye, Brigham R. "Reliability of pre-service teachers' coding of teaching videos using a video-analysis tool /." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2020.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Juul, Lisa. "Examining 360° storytelling in immersive music videos." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20828.

Full text

Abstract:

Music videos invite the viewer to an enhanced experience of a song, and by combining it with360°, a new dimension of immersion emerges. However, a new wave of complex narrativeand user interface unfolds when intertwining 360° and the contemporary way of telling storiesin music videos.This thesis used an experimental mixed method research design, focusing on collecting,analyzing, and mixing both quantitative and qualitative data in a series of studies. A surveywas first conducted to get an overview of consensus with respect to music videos, VR, and360°. The majority of the respondents had tried VR and 40% of stated that they felt immersedwhile trying it. Around 18% argued it was experience-dependent and 42% did not feelimmersed at all. The survey was followed by experiments showing two 360° music videoswith different storytelling techniques. After the participants had seen the videos, theydiscussed the experience in focus groups in a semi-structured interview. The results were thencoded and benchmarked with theory, which led to the rise of six key 360° storytellingguidelines.All three focus groups concluded 360° music videos enable a deeper level of immersion.However, when combining novelty and a sometimes overwhelming visual experience, 360°music videos can distract the audience if not told right. The guidelines discuss the purpose ofa music video, how the technology affects the experience, if the medium is passive or active,and how different types of interaction can be used as a storytelling mean. They also discussways to pedagogically intertwine audio and visuals. Additionally, the guidelines includediscussions of how different cues and POV’s can be utilized to ensure that the filmmakers andviewers experiences are somewhat aligned, they also tackle the fear of missing out, andfinally compare 360° and traditional music videos.Conclusively, the research shows that storytelling in a 360° sphere will entail a journey oftrial and error, and that the audience have scattered preferences of what different narrativestyles they find work and do not.

APA, Harvard, Vancouver, ISO, and other styles

26

Portocarrero, Rodriguez Marco Antonio. "Diseño de la arquitectura de transformada discreta directa e inversa del coseno para un decodificador HEVC." Bachelor's thesis, Pontificia Universidad Católica del Perú, 2018. http://tesis.pucp.edu.pe/repositorio/handle/123456789/13002.

Full text

Abstract:

El empleo de video de alta resolución es una actividad muy común en la actualidad, debido a la existencia de dispositivos portátiles capaces de reproducir y crear secuencias de video, ya sea en HD o en resoluciones mayores, como 4k u 8k. Sin embargo, debido a que las secuencias de video de mayor resolución pueden llegar a ocupar grandes espacios de memoria, estas no pueden ser almacenadas sin antes realizar un proceso de compresión. Organizaciones especializadas como ITU-T Coding Experts Group e ISO/IEC Moving Picture Experts Group, han sido responsables del desarrollo de estándares de codificación de video. De esta manera, para mejorar la transmisión de video y poder obtener resoluciones cada vez mayores, se llevó a cabo el desarrollo del estándar de codificación HEVC o H.265, el cual es el sucesor al estándar H.264/AVC. El presente trabajo de tesis está centrado en el módulo de Transformada Discreta e Inversa del Coseno (DCT e IDCT), el cual forma parte del estándar HEVC y su función es hallar los coeficientes en el dominio de la frecuencia de muestras, para poder cuantificarlas y reducir su número. Se realizó el diseño la arquitectura, tomando en consideración la capacidad de procesamiento de pixeles requerida por el estándar, la frecuencia de operación de circuito y la cantidad de recursos lógicos usados. La arquitectura fue descrita en el lenguaje Verilog HDL y fue sintetizada para dispositivos Zynq – 7000 de la empresa Xilinx. La verificación funcional del circuito fue realizada mediante el uso de Testbenchs en el software ModelSim. Para verificar el funcionamiento de la arquitectura diseñada, se utilizó el software MATLAB para obtener los resultados esperados y se compararon con los obtenidos en la simulación funcional del circuito. La frecuencia máxima de operación fue hallada mediante la síntesis de la arquitectura, la cual llegó a ser de 135 MHz, que es equivalente al procesamiento de secuencias de vídeo de resolución 4k o 3840x2160 pixeles a 65 fps.
Tesis

APA, Harvard, Vancouver, ISO, and other styles

27

Wang, Yi. "Design and Evaluation of Contextualized Video Interfaces." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/28798.

Full text

Abstract:

pictures. Videos have been increasingly used in multiple applications, including surveillance, teleconferencing, learning and experience sharing. Since a video captures a scene from a particular viewpoint, it can often be understood better if presented within a larger spatial context. We call such interactive visualizations that combine videos with their spatial context â Contextualized Videosâ . Over recent years, multiple innovative Contextualized Video interfaces have been proposed to taking advantage of the latest computer graphics and video processing technologies. These interfaces opened a huge design space with numerous design possibilities, each with its own benefits and limitations. To avoid piecemeal understanding of the design space, this dissertation systematically designs and evaluates Contextualized Video interfaces based on a taxonomy of tasks that can potentially benefit from Contextualized Videos. This dissertation first formalizes a design space. New designs are created incrementally along the four major dimensions of the design space. These designs are then empirically compared through a series of controlled experiments using multiple tasks. The tasks are carefully selected from a task taxonomy, which helps to avoid piecemeal understanding of the effect of the designs. Our design practices and empirical evaluations result in a set of design guidelines on how to choose proper designs according to the characteristics of the tasks and the users. Finally, we demonstrate how to apply the design guidelines to prototype a complex interface for a specific video surveillance application.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

28

Dalal, Navneet. "Finding People in Images and Videos." Phd thesis, Grenoble INPG, 2006. http://tel.archives-ouvertes.fr/tel-00390303.

Full text

Abstract:

Cette thèse propose une solution pour la détection de personnes et de classes d'objet dans des images et vidéos. Le but principal est de développer des représentations robustes et discriminantes de formes visuelles, qui permettent de décider si un objet de la classe apparaˆit dans une région de l'image. Les décisions sont basées sur des vecteurs de descripteurs visuels de dimension élevée extraits des régions. Afin d'avoir une comparaison objective des différents ensembles de descripteurs, nous apprenons une règle de décision pour chaque ensemble avec un algorithme de type machine à vecteur de support linéaire. Piloté entièrement par les données, notre approche se base sur des descripteurs d'apparence et de mouvement de bas niveau sans utiliser de modèle explicite pour l'objet a détecter. Dans la plupart des cas nous nous concentrons sur la détection de personnes – classe difficile, fréquente et particulièrement intéressante dans applications come l'analyse de film et de vidéo, la détection de piétons pour la conduite assistée ou la surveillance. Cependant, notre méthode ne fait pas d'hypothèse forte sur la classe à reconnaˆitre et elle donne également des résultats satisfaisants pour d'autres classes comme les voitures, les motocyclettes, les vaches et les moutons. Nous apportons quatre contributions principales au domaine de la reconnaissance visuelle. D'abord, nous présentons des descripteurs visuels pour la détection d'objets dans les images statiques : les grilles d'histogrammes d'orientations de gradients d'image (en anglais, HOG – Histogrammes of Oriented Gradients). Les histogrammes sont évalués sur une grille de blocs spatiaux, avec une forte normalisation locale. Cette structure assure à la fois une bonne caract érisation de la forme visuelle locale de l'objet et la robustesse aux petites variations de position, d'orientation spatiale, d'illumination locale et de couleur. Nous montrons que la combinaison de gradients peu lissés, une quantification fine de l'orientation et relativement grossière de l'espace, une normalisation forte de l'intensité, et une méthode évoluée de ré-apprentissage des cas difficiles permet de réduire le taux de faux positifs par un à deux ordres de grandeur par rapport aux méthodes précédentes. Deuxièmement, afin de détecter des personnes dans les vidéos, nous proposons plusieurs descripteurs de mouvement basés sur le flot optique. Ces descripteurs sont incorporés dans l'approche précédente. Analogues aux HOG statiques, ils substituent aux gradients d'image statique les différences spatiales du flot optique dense. L'utilisation de différences minimise l'influence du mouvement de la caméra et du fond sur les détections. Nous évaluons plusieurs variations de cette approche, qui codent soit les frontières de mouvement (motion boundaries), soit les mouvements relatifs des paires de régions adjacentes. L'incorporation du mouvement réduit le taux de faux positifs d'un ordre de grandeur par rapport à l'approche précédente. Troisièmement, nous proposons une méthode générale pour combiner les détections multiples basées sur l'algorithme “mean shift” pour estimer des maxima de densité à base de noyaux. L'approche tient compte du nombre, de la confiance et de l'échelle relative des détections. Finalement, nous présentons un travail en cours sur la fac¸on de créer de un détecteur de personnes à partir de plusieurs détecteurs de parties – en occurrence le visage, la tête, le torse, et les jambes.

APA, Harvard, Vancouver, ISO, and other styles

29

Wang, Ping. "Social game retrieval from unstructured videos." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34673.

Full text

Abstract:

Parent-child social games, such as peek-a-boo and patty-cake, are a key element of an infant's earliest social interactions. The analysis of children's behaviors in social games based on video recordings provides a means for psychologists to study their social and cognitive development. However, the current practice in the use of video for behavioral research is extremely labor-intensive, involving many hours spent extracting and coding relevant video clips from a large corpus. From the standpoint of computer vision, such real-world video collections pose significant challenges in the automatic analysis of behavior, such as cluttered backgrounds, the effect of varying camera angles, clothing, subject appearance and lighting. These observations motivate my thesis work - automatic retrieval of social games from unstructured videos. The goal of this work is both to help accelerate the research progress in behavioral science and to take the initial steps towards the analysis of natural human interactions in natural settings. Social games are characterized by repetitions of turn-taking interactions between the parent and the child, with variations that are recognizable by both of them. I developed a computational model for social games that exploits the temporal structure over a long time-scale window as quasi-periodic patterns in a time series. I presented an unsupervised algorithm that mines the quasi-periodic patterns from videos. The algorithm consists of two functional modules: converting image sequences into discrete symbolic sequences and mining quasi-periodic patterns from the symbolic sequences. When this technique is applied to video of social games, the extracted quasi-periodic patterns often correspond to meaningful stages of the games. The retrieval performance on unstructured, lab-recorded videos and real-world family movies is promising. Building on this work, I developed a new feature extraction algorithm for social game categorization. Given a quasi-periodic pattern representation, my method automatically selects the most relevant space-time interest points to construct the feature representation. Our experiments demonstrate very promising classification performance on social games collected from YouTube. In addition, the method can also be used to categorize TV videos of sports rallies, demonstrating the generality of this approach. In order to support and encourage more research on human behavior analysis in realistic contexts, a video database of realistic child play in natural settings has been collected and is published on our project website (http://www.cc.gatech.edu/cpl/projects/socialgames), along with annotations. The unsupervised quasi-periodic pattern mining method represents a substantial generalization of conventional periodic motion analysis. Its generality is evaluated by retrieving motions of a range of quasi-periodicity from unstructured videos. The performance was compared with that of a periodic motion detection method based on motion self-similarity. Our method demonstrates superior retrieval performance with a 100% precision when the recall is up to 92.04%, with much fewer parameters than that of the other method.

APA, Harvard, Vancouver, ISO, and other styles

30

Erdem, Elif. "Constructing Panoramic Scenes From Aerial Videos." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12609083/index.pdf.

Full text

Abstract:

In this thesis, we address the problem of panoramic scene construction in which a single image covering the entire visible area of the scene is constructed from an aerial image video. In the literature, there are several algorithms developed for construction of panoramic scene of a video sequence. These algorithms can be categorized as feature based and featureless algorithms. In this thesis, we concentrate on the feature based algorithms and comparison of these algorithms is performed for aerial videos. The comparison is performed on video sequences captured by non-stationary cameras, whose optical axis does not have to be the same. In addition, the matching and tracking performances of the algorithms are separately analyzed, their advantages-disadvantages are presented and several modifications are proposed.

APA, Harvard, Vancouver, ISO, and other styles

31

Sivic, Josef. "Efficient visual search of images videos." Thesis, University of Oxford, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.436952.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Dubba, Krishna Sandeep Reddy. "Learning relational event models from videos." Thesis, University of Leeds, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.590428.

Full text

Abstract:

Learning event models from videos has applications ranging from abnormal event detection to content based video retrieval. When multiple agents are involved in the events, characterizing events naturally suggests encoding interactions as relations. This can be realized by tracking the objects using computer vision algorithms and encoding the interactions using qualitative spatial and temporal relations. Learning event models from this kind of relational spatio-temporal data is particularly challenging because of the presence of multiple objects, uncertainty from the tracking and especially the time component as this increases the size of the relational data (the number of temporal relational facts is quadratically proportional to the number of intervals present). Relational learning techniques such as Inductive Logic Programming (ILP) hold promise for building models from this kind of data. but have not been successfully applied to the very large datasets which result from video data. In this thesis, we present a novel supervised learning framework to learn relational event models from large video datasets (several million frames) using ILP. Efficiency is achieved via the learning from interpretations setting and using a typing system that exploits the type hierarchy of objects: in a domain. We also present a type. refining operator and prove that it is optimal. Positive and negative examples are extracted using domain experts' minimal event annotations (termed deictic supervision) which are used for learning relational event models. These models can be used for recognizing events from unseen videos. If the input data is from sensors, it is prone to noise and to handle this, we present extensions to the original framework by integrating abduction as well as extending the framework based on Markov Logic Networks to obtain robust probabilistic models that improve the event recognition performance. The experimental results on video data from two challenging real world domains (an airport domain which has events such as loading, unloading, passenger-bridge parking etc. and a verbs domain which has verbs like exchange, pick-up etc.) suggest that the techniques are suitable to real world scenarios.

APA, Harvard, Vancouver, ISO, and other styles

33

Raza, Syed H. "Temporally consistent semantic segmentation in videos." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/53455.

Full text

Abstract:

The objective of this Thesis research is to develop algorithms for temporally consistent semantic segmentation in videos. Though many different forms of semantic segmentations exist, this research is focused on the problem of temporally-consistent holistic scene understanding in outdoor videos. Holistic scene understanding requires an understanding of many individual aspects of the scene including 3D layout, objects present, occlusion boundaries, and depth. Such a description of a dynamic scene would be useful for many robotic applications including object reasoning, 3D perception, video analysis, video coding, segmentation, navigation and activity recognition. Scene understanding has been studied with great success for still images. However, scene understanding in videos requires additional approaches to account for the temporal variation, dynamic information, and exploiting causality. As a first step, image-based scene understanding methods can be directly applied to individual video frames to generate a description of the scene. However, these methods do not exploit temporal information across neighboring frames. Further, lacking temporal consistency, image-based methods can result in temporally-inconsistent labels across frames. This inconsistency can impact performance, as scene labels suddenly change between frames. The objective of our this study is to develop temporally consistent scene descriptive algorithms by processing videos efficiently, exploiting causality and data-redundancy, and cater for scene dynamics. Specifically, we achieve our research objectives by (1) extracting geometric context from videos to give broad 3D structure of the scene with all objects present, (2) Detecting occlusion boundaries in videos due to depth discontinuity, (3) Estimating depth in videos by combining monocular and motion features with semantic features and occlusion boundaries.

APA, Harvard, Vancouver, ISO, and other styles

34

Mehran, Ramin. "Analysis of behaviors in crowd videos." Doctoral diss., University of Central Florida, 2011. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4801.

Full text

Abstract:

In this dissertation, we address the problem of discovery and representation of group activity of humans and objects in a variety of scenarios, commonly encountered in vision applications. The overarching goal is to devise a discriminative representation of human motion in social settings, which captures a wide variety of human activities observable in video sequences. Such motion emerges from the collective behavior of individuals and their interactions and is a significant source of information typically employed for applications such as event detection, behavior recognition, and activity recognition. We present new representations of human group motion for static cameras, and propose algorithms for their application to variety of problems. We first propose a method to model and learn the scene activity of a crowd using Social Force Model for the first time in the computer vision community. We present a method to densely estimate the interaction forces between people in a crowd, observed by a static camera. Latent Dirichlet Allocation (LDA) is used to learn the model of the normal activities over extended periods of time. Randomly selected spatio-temporal volumes of interaction forces are used to learn the model of normal behavior of the scene. The model encodes the latent topics of social interaction forces in the scene for normal behaviors. We classify a short video sequence of $n$ frames as normal or abnormal by using the learnt model. Once a sequence of frames is classified as an abnormal, the regions of anomalies in the abnormal frames are localized using the magnitude of interaction forces. The representation and estimation framework proposed above, however, has a few limitations. This algorithm proposes to use a global estimation of the interaction forces within the crowd. It, therefore, is incapable of identifying different groups of objects based on motion or behavior in the scene. Although the algorithm is capable of learning the normal behavior and detects the abnormality, but it is incapable of capturing the dynamics of different behaviors. To overcome these limitations, we then propose a method based on the Lagrangian framework for fluid dynamics, by introducing a streakline representation of flow. Streaklines are traced in a fluid flow by injecting color material, such as smoke or dye, which is transported with the flow and used for visualization. In the context of computer vision, streaklines may be used in a similar way to transport information about a scene, and they are obtained by repeatedly initializing a fixed grid of particles at each frame, then moving both current and past particles using optical flow. Streaklines are the locus of points that connect particles which originated from the same initial position. This approach is advantageous over the previous representations in two aspects: first, its rich representation captures the dynamics of the crowd and changes in space and time in the scene where the optical flow representation is not enough, and second, this model is capable of discovering groups of similar behavior within a crowd scene by performing motion segmentation. We propose a method to distinguish different group behaviors such as divergent/convergent motion and lanes using this framework. Finally, we introduce flow potentials as a discriminative feature to recognize crowd behaviors in a scene. Results of extensive experiments are presented for multiple real life crowd sequences involving pedestrian and vehicular traffic. The proposed method exploits optical flow as the low level feature and performs integration and clustering to obtain coherent group motion patterns. However, we observe that in crowd video sequences, as well as a variety of other vision applications, the co-occurrence and inter-relation of motion patterns are the main characteristics of group behaviors. In other words, the group behavior of objects is a mixture of individual actions or behaviors in specific geometrical layout and temporal order. We, therefore, propose a new representation for group behaviors of humans using the inter-relation of motion patterns in a scene. The representation is based on bag of visual phrases of spatio-temporal visual words. We present a method to match the high-order spatial layout of visual words that preserve the geometry of the visual words under similarity transformations. To perform the experiments we collected a dataset of group choreography performances from the YouTube website. The dataset currently contains four categories of group dances.
ID: 031001560; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Title from PDF title page (viewed August 26, 2013).; Thesis (Ph.D.)--University of Central Florida, 2011.; Includes bibliographical references (p. 100-104).
Ph.D.
Doctorate
Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering

APA, Harvard, Vancouver, ISO, and other styles

35

Pusiol, Guido Thomas. "Découvertes d'activités humaines dans des videos." Phd thesis, Université Nice Sophia Antipolis, 2012. http://tel.archives-ouvertes.fr/tel-00944617.

Full text

Abstract:

L'objectif de cette thèse est de proposer une plateforme complète pour la découverte automatique d'activités, leur modélisation et leur reconnaissance à patir de vidéos. La plateforme utilise des informations perceptuelles (i.e des trajectoires) en entrée et produit une reconnaissance sémantique des activités. La plateforme fonctionne en 5 étapes: 1) La video est divisée en plusieurs parties afin de reconnaitre des activités. Nous proposons différentes techniques pour extraire des caractéristiques perceptuelles à partir du découpage. Nous construisons des ensembles de caractéristiques perceptuelles capable de décrire les activités dans des petites périodes de temps. 2) Nous proposons d'apprendre les informations contextuelles de la video. Nous construisons des modèles de scène en apprenant les caractéristiques perceptuelles pertinentes. Le modèle final contient des régions de la scène intéressantes pour décrire des actions sémantiques (i.e des régions ou des interactions arrivent). 3) Nous proposons de réduire le gap entre les informations visuelles de bas niveau et l'interprètation sémantique en construisant un niveau intermédiaire composés d'évènements primitifs. La représentation proposée pour ces évènements primitifs décrit les mouvements intéressants de la scène. Ceci est fait en par abstraction des caractéristiques perceptuelles en utilisant les informations contextuelles de la scène , de manière non supervisée. 4) Nous reconnaissons des activités composées avec une méthode de reconnaissance de chemins. Nous proposons aussi une méthode générique pour modéliser les activités composées. Les modèles sont construits comme des ensembles probabilistes flexibles, faciles à mettre à jour. 5) Nous proposons une méthode de reconnaissance d'activités qui cherche de façon déterministe les occurrences des activités modélisées dans des nouveaux ensemble de données. Les sémantiqes sont générées en interaction avec l'utilisateur. Toute cette approche a été évaluée sur des ensembles de données réels provenant de la surveillance de personnes dans un appartement et de personnes agées dans un hopital. Ce travail a aussi été évalué sur d'autres types d'application comme la surveillance du sommeil.

APA, Harvard, Vancouver, ISO, and other styles

36

Fernandez, Arguedas Virginia. "Automatic object classification for surveillance videos." Thesis, Queen Mary, University of London, 2012. http://qmro.qmul.ac.uk/xmlui/handle/123456789/3354.

Full text

Abstract:

The recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems.

APA, Harvard, Vancouver, ISO, and other styles

37

Ross, Candace Cheronda. "Grounded semantic parsing using captioned videos." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/118036.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 45-47).
We develop a semantic parser which is trained in a grounded setting using pairs of videos captioned with sentences. This setting is both data-efficient requiring little annotation and far more similar to the experience of children where they observe their environment and listen to speakers. The semantic parser recovers the meaning of English sentences despite not having access to any annotated sentences and despite the ambiguity inherent in vision where a sentence may refer to any combination of objects, object properties, relations or actions taken by any agent in a video. We introduce a new corpus for grounded language acquisition. Learning to understand language, turn sentences into logical forms, by using captioned video will significantly expand the range of data that parsers can be trained on, lower the effort of training a semantic parser, and ultimately lead to a better understanding of child language acquisition.
by Candace Cheronda Ross.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

38

Sotomaior, Gabriel de Barcelos 1982. "Auto-representação em videos na internet." [s.n.], 2008. http://repositorio.unicamp.br/jspui/handle/REPOSIP/284043.

Full text

Abstract:

Orientador: Marcius Cesar Soares Freire
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Artes
Made available in DSpace on 2018-08-14T18:45:22Z (GMT). No. of bitstreams: 1 Sotomaior_GabrieldeBarcelos_M.pdf: 1533142 bytes, checksum: 6e4f31f9f8fc104224c6c30cbbcc8356 (MD5) Previous issue date: 2008
Resumo: O que acontece quando viramos a câmera para nós mesmos? Este trabalho estudará o fenômeno da auto-representação em vídeos na internet. A pesquisa faz uma reflexão sobre os processos de subjetivação e a ação performática de sujeitos que se representam com a utilização das novas tecnologias, em especial a internet. Pretendo compreender as conseqüências para a transformação do audiovisual, observando algumas possíveis tendências dentro da cultura contemporânea. Pensando nessas questões, fiz a análise de diferentes vídeos na internet, além do estudo do ambiente hipertextual em que estes trabalhos estão inseridos. O trabalho aponta para a importância do protagonismo de novos indivíduos em um cenário muito mais múltiplo, diverso e "em construção", mas questiona a ideologia de uma tecnologia "salvadora", que por si só já traria as grandes transformações que a sociedade necessita.
Abstract: What happened when we turn the camera to ourselves? This work will study the phenomenon of self-representation in internet videos. The research makes a reflection about subjectvations process and the performative acts of the subjects who are representing themselves with the new technologies, mainly the internet. I intend understand the consequences to the audiovisual transformation, looking for some possible tendencies inside the contemporary culture. Thinking these questions, I did the analysis of different kind of videos in the internet, and the study of their hypertextual surroundings. The work points out to the importance of a new individual protagonism, in a much more multiple scenery, diverse and "under construction", but it questions the ideology of a "salvage" technology, who by itself could bring all the transformations that we need.
Mestrado
Mestre em Multimeios

APA, Harvard, Vancouver, ISO, and other styles

39

Duraivelan, Shreenivasan. "Group Trajectory Analysis in Sport Videos." University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1619636056814278.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Mahfoudi, Gaël. "Authentication of Digital Images and Videos." Thesis, Troyes, 2021. http://www.theses.fr/2021TROY0043.

Full text

Abstract:

Les médias digitaux font partie de notre vie de tous les jours. Après des années de photojournalisme, nous nous sommes habitués à considérer ces médias comme des témoignages objectifs de la réalité. Cependant les logiciels de retouches d'images et de vidéos deviennent de plus en plus puissants et de plus en plus simples à utiliser, ce qui permet aux contrefacteurs de produire des images falsifiées d'une grande qualité. L'authenticité de ces médias ne peut donc plus être prise pour acquise. Récemment, de nouvelles régulations visant à lutter contre le blanchiment d'argent ont vu le jour. Ces régulations imposent notamment aux institutions financières de vérifier l'identité de leurs clients. Cette vérification est souvent effectuée de manière distantielle au travers d'un Système de Vérification d'Identité à Distance (SVID). Les médias digitaux sont centraux dans de tels systèmes, il est donc essentiel de pouvoir vérifier leurs authenticités. Cette thèse se concentre sur l'authentification des images et vidéos au sein d'un SVID. Suite à la définition formelle d'un tel système, les attaques probables à l'encontre de ceux-ci ont été identifiées. Nous nous sommes efforcés de comprendre les enjeux de ces différentes menaces afin de proposer des solutions adaptées. Nos approches sont basées sur des méthodes de traitement de l'image ou sur des modèles paramétriques. Nous avons aussi proposé de nouvelles bases de données afin d'encourager la recherche sur certains défis spécifiques encore peu étudiés
Digital media are parts of our day-to-day lives. With years of photojournalism, we have been used to consider them as an objective testimony of the truth. But images and video retouching software are becoming increasingly more powerful and easy to use and allow counterfeiters to produce highly realistic image forgery. Consequently, digital media authenticity should not be taken for granted any more. Recent Anti-Money Laundering (AML) relegation introduced the notion of Know Your Customer (KYC) which enforced financial institutions to verify their customer identity. Many institutions prefer to perform this verification remotely relying on a Remote Identity Verification (RIV) system. Such a system relies heavily on both digital images and videos. The authentication of those media is then essential. This thesis focuses on the authentication of images and videos in the context of a RIV system. After formally defining a RIV system, we studied the various attacks that a counterfeiter may perform against it. We attempt to understand the challenges of each of those threats to propose relevant solutions. Our approaches are based on both image processing methods and statistical tests. We also proposed new datasets to encourage research on challenges that are not yet well studied

APA, Harvard, Vancouver, ISO, and other styles

41

LAAKE, REBECCA A. "DEPICTION OF SEXUALITY IN MUSIC VIDEOS." University of Cincinnati / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1104784016.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

El, Ghazouani Anas. "Spatial Immersion Dimensions in 360º Videos." Thesis, Södertörns högskola, Medieteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:sh:diva-32972.

Full text

Abstract:

360º videos have emerged as a technology that providesnew possibilities for filmmakers and users alike. Thisresearch study will look at 360º videos and the level ofspatial immersion that users can achieve while viewingthem in different contexts. A number of studies havelooked at immersion in virtual environment. However, thesame does not apply to 360º videos. The paper willintroduce related work in the areas of 360º as well asimmersion and spatial immersion in virtual realityenvironments in order to provide a background for theresearch question. The process of answering this researchquestion is conducted through showing test subjects fivedifferent videos in set in different locations andinterviewing them as well as asking them to take part in aquestionnaire. The study analyses the findings that emergefrom the interviews and questionnaire in relation to thespatial immersion dimensions that are presented in background literature. Among the study's findings is that the potential movements and actions that users feel theycan perform in the virtual environment is a significantfactor when it comes to achieving spatial immersion. Thestudy also concludes that movement is another factor thathelp users achieve spatial immersion.

APA, Harvard, Vancouver, ISO, and other styles

43

Dias, Moreira De Souza Fillipe. "Semantic Description of Activities in Videos." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/6649.

Full text

Abstract:

Description of human activities in videos results not only in detection of actions and objects but also in identification of their active semantic relationships in the scene. Towards this broader goal, we present a combinatorial approach that assumes availability of algorithms for detecting and labeling objects and actions, albeit with some errors. Given these uncertain labels and detected objects, we link them into interpretative structures using domain knowledge encoded with concepts of Grenander’s general pattern theory. Here a semantic video description is built using basic units, termed generators, that represent labels of objects or actions. These generators have multiple out-bonds, each associated with either a type of domain semantics, spatial constraints, temporal constraints or image/video evidence. Generators combine between each other, according to a set of pre-defined combination rules that capture domain semantics, to form larger structures known as configurations, which here will be used to represent video descriptions. Such connected structures of generators are called configurations. This framework offers a powerful representational scheme for its flexibility in spanning a space of interpretative structures (configurations) of varying sizes and structural complexity. We impose a probability distribution on the configuration space, with inferences generated using a Markov Chain Monte Carlo-based simulated annealing algorithm. The primary advantage of the approach is that it handles known computer vision challenges – appearance variability, errors in object label annotation, object clutter, simultaneous events, temporal dependency encoding, etc. – without the need for a exponentially- large (labeled) training data set.

APA, Harvard, Vancouver, ISO, and other styles

44

Wang, Dongang. "Action Recognition in Multi-view Videos." Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/19740.

Full text

Abstract:

A long-lasting goal in the field of artificial intelligence is to develop agents that can perceive and understand the rich visual world around us. With the improvement in deep learning and neural networks, many previous difficulties in the computer vision area have been resolved. For example, the accuracy in image classification has even exceeded human being in the ImageNet challenge. However, some issues are still attractive in the community, like action recognition and its application in multi-view videos. Based on a large number of previous works in the last few years, we propose a new Dividing and Aggregating Network (DA-Net) to address the problem of action recognition in multi-view videos in this thesis. First, the DA-Net can learn view-independent representations shared by all views at lower layers and learn one view-specific representation for each view at higher layers. We then train view-specific action classifiers based on the view-specific representation for each view and a view classifier based on the shared representation at lower layers. The view classifier is used to predict how likely each video belongs to each view. Finally, the predicted view probabilities from multiple views are used as the weights when fusing the prediction scores of view-specific action classifiers. We also propose a new approach based on the conditional random field (CRF) formulation to pass message among view-specific representations from different branches to help each other. Comprehensive experiments are conducted accordingly. The experiments on three benchmark datasets clearly demonstrate the effectiveness of our proposed DA-Net for multi-view action recognition. We also conduct the ablation study, which indicates the three modules we proposed can provide steady improvements to the prediction accuracy.

APA, Harvard, Vancouver, ISO, and other styles

45

Sharma, Nabin. "Multi-lingual Text Processing from Videos." Thesis, Griffith University, 2015. http://hdl.handle.net/10072/367489.

Full text

Abstract:

Advances in digital technology have produced low priced portable imaging devices such as digital cameras attached to mobile phones, camcorders, PDA’s etc. which are highly portable. These devices can be used to capture videos and images at ease, which can be shared through the internet and other communication media. In the commercial do- main, cameras are used to create news, advertisement videos and other forms of material for information communication. The use of multiple languages to create information for targeted audiences is quite common in countries having multiple oﬃcial languages. Trans- mission of news, advertisement videos and images across various communication channels has created large databases of videos and these are increasing exponentially. Eﬀective management of such databases requires proper indexing for the retrieval of relevant in- formation. Text information is dominant in most of the videos and images, which can be used as keywords for retrieval of relevant video and images. Automatic annotation of videos and images to extract keywords requires the text to be converted to an editable form. This thesis addresses the problem of multi-lingual text processing from video frames. Multi-lingual text processing involves text detection, word segmentation, script identiﬁcation, and text recognition. Additionally, text frame classiﬁcation is required to avoid processing a video frame which does not contain text information. A new multi-lingual video word dataset was created and published as a part of the current research. The dataset comprises words of ten scripts, namely English (Roman), Hindi (Devanagari), Bengali (Bangla), Arabic, Oriya, Gujrathi, Punjabi, Kannada, Tamil and Telugu. This dataset was created to facilitate future research on multi-lingual text recognition.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information and Communication Technology.
Science, Environment, Engineering and Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

46

Kapoor, Aditi. "Saliency detection in images and videos." Thesis, IIT Delhi, 2017. http://localhost:8080/xmlui/handle/12345678/7234.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Houten, Ynze van. "Searching for videos the structure of video interaction in the framework of information foraging theory /." Enschede : University of Twente [Host], 2009. http://doc.utwente.nl/60628.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Bai, Yannan. "Video analytics system for surveillance videos." Thesis, 2018. https://hdl.handle.net/2144/30739.

Full text

Abstract:

Developing an intelligent inspection system that can enhance the public safety is challenging. An efficient video analytics system can help monitor unusual events and mitigate possible damage or loss. This thesis aims to analyze surveillance video data, report abnormal activities and retrieve corresponding video clips. The surveillance video dataset used in this thesis is derived from ALERT Dataset, a collection of surveillance videos at airport security checkpoints. The video analytics system in this thesis can be thought as a pipelined process. The system takes the surveillance video as input, and passes it through a series of processing such as object detection, multi-object tracking, person-bin association and re-identification. In the end, we can obtain trajectories of passengers and baggage in the surveillance videos. Abnormal events like taking away other's belongings will be detected and trigger the alarm automatically. The system could also retrieve the corresponding video clips based on user-defined query.

APA, Harvard, Vancouver, ISO, and other styles

49

SINGHAL, AKSHAT. "DETECTING FAKE VIDEOS." Thesis, 2019. http://dspace.dtu.ac.in:8080/jspui/handle/repository/16585.

Full text

Abstract:

As the World Wide Web usage continues to grow, people all over the world are relying more and more everyday on it in different ways like social networking, making online payments, entertainment purposes like watching videos and content sharing, educational purposes as well as professional uses too. One such common use is watching videos on the web or having update feeds in form of videos from social networking websites. This work is an effort towards helping the users in identifying the content in the videos as fake or real. Thus the user is alerted from believing false information which might lead to unwanted outcomes for the user like money loss for instance if the video was regarding share market or identifying rumors circulating on web. For the above stated aim, an application using Python has been developed. The application follows supervised learning with a training data set of 574 videos having fake as well as real videos. The technique used is taking into consideration the audio component of the video in addition to the video component. Also, the accuracy percentage of the subsequent test results using LSTM, CNN and Naïve Bayes model is displayed.

APA, Harvard, Vancouver, ISO, and other styles

50

Lu, Yi-Chun, and 魯怡君. "Video Summarization for Multi-intensity Illuminated Infrared Videos." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/37670354304076414799.

Full text

Abstract:

碩士
國立交通大學
多媒體工程研究所
101
In nighttime video surveillance, proper illumination plays a key role for the image quality. For ordinary IR-illuminators with fixed intensity, faraway objects are often hard to be identified due to insufficient illumination while nearby objects may suffer from over-exposure, resulting in image foreground/background of poor quality. In this thesis we proposed a novel video summarization method which utilizes a multi-intensity IR-illuminator to generate images of human activities with different illumination levels. First, a GMM-based foreground extraction procedure is adopted for images acquired under each illumination level. With quality assessment of the outcome of such procedure, the system than selects visually most plausible foreground regions from different illumination levels to generate a set of new input data. Finally, an automatic video summary method is developed to identify key frames for these data and merge them with a preselected representation for still background. The result brings out a reasonable video summary for moving foreground, which is generally unachievable for nighttime surveillance videos.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!