Dissertations / Theses on the topic 'Computer vision and multimedia computation'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 47 dissertations / theses for your research on the topic 'Computer vision and multimedia computation.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Gong, Shaogang. "Parallel computation of visual motion." Thesis, University of Oxford, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.238149.
Full textGavin, Andrew S. (Andrew Scott). "Low computation vision-based navigation for mobile robots." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/38006.
Full textBryant, Bobby PROTOTYPES NIGHT VISION COMPUTER AIDED INSTRUCTION GOGGLES RISK TRAINING INTERACTIONS THREE DIMENSIONAL INSTRUCTIONS GRAPHICS OPERATION COMPUTERS PILOTS THESES. "A computer-based multimedia prototype for night vision goggles /." Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1994. http://handle.dtic.mil/100.2/ADA286208.
Full textThesis advisor(s): Kishore Sengupta, Alice Crawford. "September 1994." Bibliography: p. 35. Also available online.
Bryant, Bobby. "A computer-based multimedia prototype for night vision goggles." Thesis, Monterey, California. Naval Postgraduate School, 1994. http://hdl.handle.net/10945/30923.
Full textSahiner, Ali Vahit. "A computation model for parallelism : self-adapting parallel servers." Thesis, University of Westminster, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.305872.
Full textLiu, Jianguo, and 劉建國. "Fast computation of moments with applications to transforms." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1996. http://hub.hku.hk/bib/B31235086.
Full textLiu, Jianguo. "Fast computation of moments with applications to transforms /." Hong Kong : University of Hong Kong, 1996. http://sunzi.lib.hku.hk/hkuto/record.jsp?B17664986.
Full textBattiti, Roberto Fox Geoffrey C. "Multiscale methods, parallel computation, and neural networks for real-time computer vision /." Diss., Pasadena, Calif. : California Institute of Technology, 1990. http://resolver.caltech.edu/CaltechETD:etd-06072007-074441.
Full textHsiao, Hsu-Feng. "Multimedia streaming congestion control over heterogeneous networks : from distributed computation and end-to-end perspectives /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/5946.
Full textNóbrega, Rui Pedro da Silva. "Interactive acquisition of spatial information from images for multimedia applications." Doctoral thesis, Faculdade de Ciências e Tecnologia, 2013. http://hdl.handle.net/10362/11079.
Full textThis dissertation addresses the problem of creating interactive mixed reality applications where virtual objects interact in a real world scenario. These scenarios are intended to be captured by the users with cameras. In other words, the goal is to produce applications where virtual objects are introduced in photographs taken by the users. This is relevant to create games and architectural and space planning applications that interact with visual elements in the images such as walls, floors and empty spaces. Introducing virtual objects in photographs or video sequences presents several challenges, such as the pose estimation and the visually correct interaction with the boundaries of such objects. Furthermore, the introduced virtual objects should be interactive and respond to the real physical environments. The proposed detection system is semi-automatic and thus depends partially on the user to obtain the elements it needs. This operation should be significantly simple to accommodate the needs of a non-expert user. The system analyzes a photo captured by the user and detects high-level features such as vanishing points, floor and scene orientation. Using these features it will be possible to create virtual mixed and augmented reality applications where the user takes one or more photos of a certain place and interactively introduces virtual objects or elements that blend with the picture in real time. This document discusses computer vision, computer graphics and human-computer interaction techniques required to acquire images and information about the scenario involving the user. To demonstrate the framework and the proposed solutions, several proof-of-concept projects are presented and studied. Additionally, to validate the solution several system tests are described and each case-study interface was subject of different user-studies.
Fundação para a Ciência e Tecnologia - research grant SFRH/BD/47511/2008
Muñiz, Pablo E. (Muñiz Aponte). "Detection of launch frame in long jump videos using computer vision and discreet computation." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123277.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (page 44).
Pose estimation, a computer vision technique, can be used to develop a quantitative feedback training tool for long jumping. Key performance indicators (KPIs) such as launch velocity would allow a long jumping athlete to optimize their technique while training. However, these KPIs need a prior knowledge of when the athlete jumped, referred to as the launch frame in the context of videos and computer vision. Thus, an algorithm for estimating the launch frame was made using the OpenPose Demo and Matlab. The algorithm estimates the launch frame to within 0.8±0.91 frames. Implementing the algorithm into a training tool would give an athlete real-time, quantitative feedback from a video. This process of developing an algorithm to flag an event can be used in other sports as well, especially with the rise of KPIs in the sports industry (e.g. launch angle and velocity in baseball).
by Pablo E. Muniz.
S.B.
S.B. Massachusetts Institute of Technology, Department of Mechanical Engineering
Kaloskampis, Ioannis. "Recognition of complex human activities in multimedia streams using machine learning and computer vision." Thesis, Cardiff University, 2013. http://orca.cf.ac.uk/59377/.
Full textBaró, i. Solé Xavier. "Probabilistic Darwin Machines: A new approach to develop Evolutionary Object Detection Systems." Doctoral thesis, Universitat Autònoma de Barcelona, 2009. http://hdl.handle.net/10803/5793.
Full textUna de les tasques inconscients per a les persones i que més interès està despertant en àmbit científics des del principi, és el que es coneix com a reconeixement de patrons. La creació de models del món que ens envolta, ens serveix per a reconèixer objectes del nostre entorn, predir situacions, identificar conductes, etc. Tota aquesta informació ens permet adaptar-nos i interactuar amb el nostre entorn. S'ha arribat a relacionar la capacitat d'adaptació d'un ésser al seu entorn amb la quantitat de patrons que és capaç d'identificar.
Quan parlem de reconeixement de patrons en el camp de la Visió per Computador, ens referim a la capacitat d'identificar objectes a partir de la informació continguda en una o més imatges. En aquest camp s'ha avançat molt en els últims anys, i ara ja som capaços d'obtenir resultats "útils" en entorns reals, tot i que encara estem molt lluny de tenir un sistema amb la mateixa capacitat d'abstracció i tan robust com el sistema visual humà.
En aquesta tesi, s'estudia el detector de cares de Viola i Jones, un dels mètode més estesos per resoldre la detecció d'objectes. Primerament, s'analitza la manera de descriure els objectes a partir d'informació de contrastos d'il·luminació en zones adjacents de les imatges, i posteriorment com aquesta informació és organitzada per crear estructures més complexes. Com a resultat d'aquest estudi, i comparant amb altres metodologies, s'identifiquen dos punts dèbils en el mètode de detecció de Viola i Jones. El primer fa referència a la descripció dels objectes, i la segona és una limitació de l'algorisme d'aprenentatge, que dificulta la utilització de millors descriptors.
La descripció dels objectes utilitzant les característiques de Haar, limita la informació extreta a zones connexes de l'objecte. En el cas de voler comparar zones distants, s'ha d'optar per grans mides de les característiques, que fan que els valors obtinguts depenguin més del promig de valors d'il·luminació de l'objecte, que de les zones que es volen comparar. Amb l'objectiu de poder utilitzar aquest tipus d'informacions no locals, s'intenta introduir els dipols dissociats en l'esquema de detecció d'objectes.
El problema amb el que ens trobem en voler utilitzar aquest tipus de descriptors, és que la gran cardinalitat del conjunt de característiques, fa inviable la utilització de l'Adaboost, l'algorisme utilitzat per a l'aprenentatge. El motiu és que durant el procés d'aprenentatge, es fa un anàlisi exhaustiu de tot l'espai d'hipòtesis, i al ser tant gran, el temps necessari per a l'aprenentatge esdevé prohibitiu. Per eliminar aquesta limitació, s'introdueixen mètodes evolutius dins de l'esquema de l'Adaboost i s'estudia els efectes d'aquest canvi en la capacitat d'aprenentatge. Les conclusions extretes són que no només continua essent capaç d'aprendre, sinó que la velocitat de convergència no és afectada significativament.
Aquest nou Adaboost amb estratègies evolutives obre la porta a la utilització de conjunts de característiques amb cardinalitats arbitràries, el que ens permet indagar en noves formes de descriure els nostres objectes, com per exemple utilitzant els dipols dissociats. El primer que fem és comparar la capacitat d'aprenentatge del mètode utilitzant les característiques de Haar i els dipols dissociats. Com a resultat d'aquesta comparació, el que veiem és que els dos tipus de descriptors tenen un poder de representació molt similar, i depenent del problema en que s'apliquen, uns s'adapten una mica millor que els altres. Amb l'objectiu d'aconseguir un sistema de descripció capaç d'aprofitar els punts forts tant de Haar com dels dipols, es proposa la utilització d'un nou tipus de característiques, els dipols dissociats amb pesos, els quals combinen els detectors d'estructures que fan robustes les característiques de Haar amb la capacitat d'utilitzar informació no local dels dipols dissociats. A les proves realitzades, aquest nou conjunt de característiques obté millors resultats en tots els problemes en que s'ha comparat amb les característiques de Haar i amb els dipols dissociats.
Per tal de validar la fiabilitat dels diferents mètodes, i poder fer comparatives entre ells, s'ha utilitzat un conjunt de bases de dades públiques per a diferents problemes, tals com la detecció de cares, la detecció de texts, la detecció de vianants i la detecció de cotxes. A més a més, els mètodes també s'han provat sobre una base de dades més extensa, amb la finalitat de detectar senyals de trànsit en entorns de carretera i urbans.
Ever since computers were invented, we have wondered whether they might perform some of the human quotidian tasks. One of the most studied and still nowadays less understood problem is the capacity to learn from our experiences and how we generalize the knowledge that we acquire.
One of that unaware tasks for the persons and that more interest is awakening in different scientific areas since the beginning, is the one that is known as pattern recognition. The creation of models that represent the world that surrounds us, help us for recognizing objects in our environment, to predict situations, to identify behaviors... All this information allows us to adapt ourselves and to interact with our environment. The capacity of adaptation of individuals to their environment has been related to the amount of patterns that are capable of identifying.
When we speak about pattern recognition in the field of Computer Vision, we refer to the ability to identify objects using the information contained in one or more images. Although the progress in the last years, and the fact that nowadays we are already able to obtain "useful" results in real environments, we are still very far from having a system with the same capacity of abstraction and robustness as the human visual system.
In this thesis, the face detector of Viola & Jones is studied as the paradigmatic and most extended approach to the object detection problem. Firstly, we analyze the way to describe the objects using comparisons of the illumination values in adjacent zones of the images, and how this information is organized later to create more complex structures. As a result of this study, two weak points are identified in this family of methods: The first makes reference to the description of the objects, and the second is a limitation of the learning algorithm, which hampers the utilization of best descriptors.
Describing objects using Haar-like features limits the extracted information to connected regions of the object. In the case we want to compare distant zones, large contiguous regions must be used, which provokes that the obtained values depend more on the average of lighting values of the object than in the regions we are wanted to compare. With the goal to be able to use this type of non local information, we introduce the Dissociated Dipoles into the outline of objects detection.
The problem using this type of descriptors is that the great cardinality of this feature set makes unfeasible the use of Adaboost as learning algorithm. The reason is that during the learning process, an exhaustive search is made over the space of hypotheses, and since it is enormous, the necessary time for learning becomes prohibitive. Although we studied this phenomenon on the Viola & Jones approach, it is a general problem for most of the approaches, where learning methods introduce a limitation on the descriptors that can be used, and therefore, on the quality of the object description. In order to remove this limitation, we introduce evolutionary methods into the Adaboost algorithm, studying the effects of this modification on the learning ability. Our experiments conclude that not only it continues being able to learn, but its convergence speed is not significantly altered.
This new Adaboost with evolutionary strategies opens the door to the use of feature sets with an arbitrary cardinality, which allows us to investigate new ways to describe our objects, such as the use of Dissociated Dipoles. We first compare the learning ability of this evolutionary Adaboost using Haar-like features and Dissociated Dipoles, and from the results of this comparison, we conclude that both types of descriptors have similar representation power, but depends on the problem they are applied, one adapts a little better than the other. With the aim of obtaining a descriptor capable of share the strong points from both Haar-like and Dissociated Dipoles, we propose a new type of feature, the Weighted Dissociated Dipoles, which combines the robustness of the structure detectors present in the Haar-like features, with the Dissociated Dipoles ability to use non local information. In the experiments we carried out, this new feature set obtains better results in all problems we test, compared with the use of Haar-like features and Dissociated Dipoles.
In order to test the performance of each method, and compare the different methods, we use a set of public databases, which covers face detection, text detection, pedestrian detection, and cars detection. In addition, our methods are tested to face a traffic sign detection problem, over large databases containing both, road and urban scenes.
Liu, Yixian. "Reasoning scene geometry from single images." Thesis, Queen Mary, University of London, 2014. http://qmro.qmul.ac.uk/xmlui/handle/123456789/9131.
Full textCai, Bill Yang. "Applications of deep learning and computer vision in large scale quantification of tree canopy cover and real-time estimation of street parking." Thesis, Massachusetts Institute of Technology, 2018. https://hdl.handle.net/1721.1/122317.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 73-77).
A modern city generates a large volume of digital information, especially in the form of unstructured image and video data. Recent advancements in deep learning techniques have enabled effective learning and estimation of high-level attributes and meaningful features from large digital datasets of images and videos. In my thesis, I explore the potential of applying deep learning to image and video data to quantify urban tree cover and street parking utilization. Large-scale and accurate quantification of urban tree cover is important towards informing government agencies in their public greenery efforts, and useful for modelling and analyzing city ecology and urban heat island effects. We apply state-of-the-art deep learning models, and compare their performance to a previously established benchmark of an unsupervised method.
Our training procedure for deep learning models is novel; we utilize the abundance of openly available and similarly labelled street-level image datasets to pre-train our model. We then perform additional training on a small training dataset consisting of GSV images. We also employ a recently developed method called gradient-weighted class activation map (Grad-CAM) to interpret the features learned by the end-to-end model. The results demonstrate that deep learning models are highly accurate, can be interpretable, and can also be efficient in terms of data-labelling effort and computational resources. Accurate parking quantification would inform developers and municipalities in space allocation and design, while real-time measurements would provide drivers and parking enforcement with information that saves time and resources. We propose an accurate and real-time video system for future Internet of Things (IoT) and smart cities applications.
Using recent developments in deep convolutional neural networks (DCNNs) and a novel intelligent vehicle tracking filter, the proposed system combines information across multiple image frames in a video sequence to remove noise introduced by occlusions and detection failures. We demonstrate that the proposed system achieves higher accuracy than pure image-based instance segmentation, and is comparable in performance to industry benchmark systems that utilize more expensive sensors such as radar. Furthermore, the proposed system can be easily configured for deployment in different parking scenarios, and can provide spatial information beyond traditional binary occupancy statistics.
by Bill Yang Cai.
S.M.
S.M. Massachusetts Institute of Technology, Computation for Design and Optimization Program
Orriols, Majoral Xavier. "Generative Models for Video Analysis and 3D Range Data Applications." Doctoral thesis, Universitat Autònoma de Barcelona, 2004. http://hdl.handle.net/10803/3037.
Full textLa elección de una representación adecuada para los datos toma una relevancia significativa cuando se tratan invariancias, dado que estas siempre implican una reducción del los grados de libertad del sistema, i.e., el número necesario de coordenadas para la representación es menor que el empleado en la captura de datos. De este modo, la descomposición en unidades básicas y el cambio de representación dan lugar a que un problema complejo se pueda transformar en uno de manejable. Esta simplificación del problema de la estimación debe depender del mecanismo propio de combinación de estas primitivas con el fin de obtener una descripción óptima del modelo complejo global. Esta tesis muestra como los Modelos de Variables Latentes reducen dimensionalidad, que teniendo en cuenta las simetrías internas del problema, ofrecen una manera de tratar con datos parciales y dan lugar a la posibilidad de predicciones de nuevas observaciones.
Las líneas de investigación de esta tesis están dirigidas al manejo de datos provinentes de múltiples fuentes. Concretamente, esta tesis presenta un conjunto de nuevos algoritmos aplicados a dos áreas diferentes dentro de la Visión por Computador: i) video análisis y sumarización y ii) datos range 3D. Ambas áreas se han enfocado a través del marco de los Modelos Generativos, donde se han empleado protocolos similares para representar datos.
The majority of problems in Computer Vision do not contain a direct relation between the stimuli provided by a general purpose sensor and its corresponding perceptual category. A complex learning task must be involved in order to provide such a connection. In fact, the basic forms of energy, and their possible combinations are a reduced number compared to the infinite possible perceptual categories corresponding to objects, actions, relations among objects... Two main factors determine the level of difficulty of a specific problem: i) The different levels of information that are employed and ii) The complexity of the model that is intended to explain the observations.
The choice of an appropriate representation for the data takes a significant relevance when it comes to deal with invariances, since these usually imply that the number of intrinsic degrees of
freedom in the data distribution is lower than the coordinates used to represent it. Therefore, the decomposition into basic units (model parameters) and the change of representation, make that a complex problem can be transformed into a manageable one. This simplification of the estimation problem has to rely on a proper mechanism of combination of those primitives in order to give an optimal description of the global complex model. This thesis shows how Latent Variable Models reduce dimensionality, taking into account the internal symmetries of a problem, provide a manner of dealing with missing data and make possible predicting new observations.
The lines of research of this thesis are directed to the management of multiple data sources. More specifically, this thesis presents a set of new algorithms applied to two different areas in Computer Vision: i) video analysis and summarization, and ii) 3D range data. Both areas have been approached through the Generative Models framework, where similar protocols for representing data have been employed.
Rehn, Martin. "Aspects of memory and representation in cortical computation." Doctoral thesis, KTH, Numerisk Analys och Datalogi, NADA, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4161.
Full textIn this thesis I take a modular approach to cortical function. I investigate how the cerebral cortex may realise a number of basic computational tasks, within the framework of its generic architecture. I present novel mechanisms for certain assumed computational capabilities of the cerebral cortex, building on the established notions of attractor memory and sparse coding. A sparse binary coding network for generating efficient representations of sensory input is presented. It is demonstrated that this network model well reproduces the simple cell receptive field shapes seen in the primary visual cortex and that its representations are efficient with respect to storage in associative memory. I show how an autoassociative memory, augmented with dynamical synapses, can function as a general sequence learning network. I demonstrate how an abstract attractor memory system may be realised on the microcircuit level -- and how it may be analysed using tools similar to those used experimentally. I outline some predictions from the hypothesis that the macroscopic connectivity of the cortex is optimised for attractor memory function. I also discuss methodological aspects of modelling in computational neuroscience.
QC 20100916
Silva, João Miguel Ferreira da. "People and object tracking for video annotation." Master's thesis, Faculdade de Ciências e Tecnologia, 2012. http://hdl.handle.net/10362/8953.
Full textObject tracking is a thoroughly researched problem, with a body of associated literature dating at least as far back as the late 1970s. However, and despite the development of some satisfactory real-time trackers, it has not yet seen widespread use. This is not due to a lack of applications for the technology, since several interesting ones exist. In this document, it is postulated that this status quo is due, at least in part, to a lack of easy to use software libraries supporting object tracking. An overview of the problems associated with object tracking is presented and the process of developing one such library is documented. This discussion includes how to overcome problems like heterogeneities in object representations and requirements for training or initial object position hints. Video annotation is the process of associating data with a video’s content. Associating data with a video has numerous applications, ranging from making large video archives or long videos searchable, to enabling discussion about and augmentation of the video’s content. Object tracking is presented as a valid approach to both automatic and manual video annotation, and the integration of the developed object tracking library into an existing video annotator, running on a tablet computer, is described. The challenges involved in designing an interface to support the association of video annotations with tracked objects in real-time are also discussed. In particular, we discuss our interaction approaches to handle moving object selection on live video, which we have called “Hold and Overlay” and “Hold and Speed Up”. In addition, the results of a set of preliminary tests are reported.
project “TKB – A Transmedia Knowledge Base for contemporary dance” (PTDC/EA /AVP/098220/2008 funded by FCT/MCTES), the UTAustin – Portugal, Digital Media Program (SFRH/BD/42662/2007 FCT/MCTES) and by CITI/DI/FCT/UNL (Pest-OE/EEI/UI0527/2011)
Anistratov, Pavel. "Computation of Autonomous Safety Maneuvers Using Segmentation and Optimization." Licentiate thesis, Linköpings universitet, Fordonssystem, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-162164.
Full textEbadat, Ali-Reza. "Toward Robust Information Extraction Models for Multimedia Documents." Phd thesis, INSA de Rennes, 2012. http://tel.archives-ouvertes.fr/tel-00760383.
Full textVukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Full textIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Ringaby, Erik. "Optical Flow Computation on Compute Unified Device Architecture." Thesis, Linköping University, Department of Electrical Engineering, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15426.
Full textThere has been a rapid progress of the graphics processor the last years, much because of the demands from computer games on speed and image quality. Because of the graphics processor’s special architecture it is much faster at solving parallel problems than the normal processor. Due to its increasing programmability it is possible to use it for other tasks than it was originally designed for.
Even though graphics processors have been programmable for some time, it has been quite difficult to learn how to use them. CUDA enables the programmer to use C-code, with a few extensions, to program NVIDIA’s graphics processor and completely skip the traditional programming models. This thesis investigates if the graphics processor can be used for calculations without knowledge of how the hardware mechanisms work. An image processing algorithm calculating the optical flow has been implemented. The result shows that it is rather easy to implement programs using CUDA, but some knowledge of how the graphics processor works is required to achieve high performance.
Gonzalez, Preciado Matilde. "Méthodes de vision par ordinateur pour la reconnaissance de gestes naturelles dans le contexte de lʼannotation en langue des signes." Phd thesis, Université Paul Sabatier - Toulouse III, 2012. http://tel.archives-ouvertes.fr/tel-00768440.
Full textBondyfalat, Didier. "Interaction entre symbolique et numérique : application à la vision artificielle." Phd thesis, Université de Nice Sophia-Antipolis, 2000. http://tel.archives-ouvertes.fr/tel-00685629.
Full textPiemontese, Cristiano. "Progettazione e implementazione di una applicazione didattica interattiva per il riconoscimento di oggetti basata sull'algoritmo SIFT." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10883/.
Full textFröml, Vojtěch. "API datového úložiště pro práci s videem a obrázky." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236419.
Full textTjondronegoro, Dian W. "PhD Thesis: "Content-based Video Indexing for Sports Applications using Multi-modal approach"." Thesis, Deakin University, 2005. https://eprints.qut.edu.au/2199/1/PhDThesis_Tjondronegoro.pdf.
Full textBuratti, Luca. "Valutazione sperimentale di metodologie di rettificazione e impatto su algoritmi di visione stereo." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/11648/.
Full textHuisman, Maximiliaan. "Vision Beyond Optics: Standardization, Evaluation and Innovation for Fluorescence Microscopy in Life Sciences." eScholarship@UMMS, 2019. https://escholarship.umassmed.edu/gsbs_diss/1017.
Full textKaraman, Svebor. "Indexation de la Vidéo Portée : Application à l'Étude Épidémiologique des Maladies Liées à l'Âge." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2011. http://tel.archives-ouvertes.fr/tel-00689855.
Full textKumar, Tushar. "Characterizing and controlling program behavior using execution-time variance." Diss., Georgia Institute of Technology, 2016. http://hdl.handle.net/1853/55000.
Full textArcila, Romain. "Séquences de maillages : classification et méthodes de segmentation." Phd thesis, Université Claude Bernard - Lyon I, 2011. http://tel.archives-ouvertes.fr/tel-00653542.
Full textKhan, Asim. "Automated Detection and Monitoring of Vegetation Through Deep Learning." Thesis, 2022. https://vuir.vu.edu.au/43941/.
Full text"A Cooperative algorithm for stereo disparity computation." Chinese University of Hong Kong, 1991. http://library.cuhk.edu.hk/record=b5886947.
Full textThesis (M.Phil.)--Chinese University of Hong Kong, 1991.
Bibliography: leaves [102]-[105].
Acknowledgements --- p.V
Chapter Chapter 1 --- Introduction
Chapter 1.1 --- The problem --- p.1
Chapter 1.1.1 --- The correspondence problem --- p.5
Chapter 1.1.2 --- The problem of surface reconstruction --- p.6
Chapter 1.2 --- Our goal --- p.8
Chapter 1.3 --- Previous works --- p.8
Chapter 1.3.1 --- Constraints on matching --- p.10
Chapter 1.3.2 --- Interpolation of disparity surfaces --- p.12
Chapter Chapter 2 --- Preprocessing of images
Chapter 2.1 --- Which operator to use --- p.14
Chapter 2.2 --- Directional zero-crossing --- p.14
Chapter 2.3 --- Laplacian of Gaussian --- p.16
Chapter 2.3.1 --- Theoretical background of the Laplacian of Gaussian --- p.18
Chapter 2.3.2 --- Implementation of the operator --- p.21
Chapter Chapter 3 --- Disparity Layers Generation
Chapter 3.1 --- Geometrical constraint --- p.23
Chapter 3.2 --- Basic idea of disparity layer --- p.26
Chapter 3.3 --- Consideration in matching --- p.28
Chapter 3.4 --- effect of vertical misalignment of sensor --- p.37
Chapter 3.5 --- Final approach --- p.39
Chapter Chapter 4 --- Disparity combination
Chapter 4.1 --- Ambiguous match from different layers --- p.52
Chapter 4.2 --- Our approach --- p.54
Chapter Chapter 5 --- Generation of dense disparity map
Chapter 5.1 --- Introduction --- p.58
Chapter 5.2 --- Cooperative computation --- p.58
Chapter 5.2.1 --- Formulation of oscillation algorithm --- p.59
Chapter 5.3 --- Interpolation by Gradient descent method --- p.69
Chapter 5.3.1 --- Formulation of constraints --- p.70
Chapter 5.3.2 --- Gradient projection interpolation algorithm --- p.72
Chapter 5.3.3 --- Implementation of the algorithm --- p.78
Chapter Chapter 6 --- Conclusion --- p.89
Reference
Appendix (Dynamical behavior of the cooperative algorithm)
Sun, Jun. "Efficient computation of MRF for low-level vision problems." Master's thesis, 2012. http://hdl.handle.net/1885/156022.
Full textZareian, Alireza. "Learning Structured Representations for Understanding Visual and Multimedia Data." Thesis, 2021. https://doi.org/10.7916/d8-94j1-yb14.
Full textBattiti, Roberto. "Multiscale methods, parallel computation, and neural networks for real-time computer vision." Thesis, 1990. https://thesis.library.caltech.edu/2496/1/Battiti_r_1990.pdf.
Full textJou, Brendan Wesley. "Large-scale Affective Computing for Visual Multimedia." Thesis, 2016. https://doi.org/10.7916/D8474B0B.
Full textSidhu, Reetinder P. S. "Novel Energy Transfer Computation Techniques For Radiosity Based Realistic Image Synthesis." Thesis, 1995. http://etd.iisc.ernet.in/handle/2005/1737.
Full text(9089423), Daniel Mas Montserrat. "Machine Learning-Based Multimedia Analytics." Thesis, 2020.
Find full textMachine learning is widely used to extract meaningful information from video, images, audio, text, and other multimedia data. Through a hierarchical structure, modern neural networks coupled with backpropagation learn to extract information from large amounts of data and to perform specific tasks such as classification or regression. In this thesis, we explore various approaches to multimedia analytics with neural networks. We present several image synthesis and rendering techniques to generate new images for training neural networks. Furthermore, we present multiple neural network architectures and systems for commercial logo detection, 3D pose estimation and tracking, deepfakes detection, and manipulation detection in satellite images.
Avinash, Ramakanth S. "Approximate Nearest Neighbour Field Computation and Applications." Thesis, 2014. http://etd.iisc.ernet.in/2005/3503.
Full textGupta, Sonal. "Activity retrieval in closed captioned videos." Thesis, 2009. http://hdl.handle.net/2152/ETD-UT-2009-08-305.
Full texttext
Kun-Chih, Shih, and 施昆志. "An creative and interactive multimedia system for playing comfortable music in general spaces based on computer vision and image processing technique, and combined analyses of color, psychology, and music information." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/r4366v.
Full text南台科技大學
多媒體與電腦娛樂科學研究所
94
Systems based on computer vision and image processing are widely developed in scientific and medical applications. On the other hand, integrated analyses of color, psychology, music, and showing ways of multimedia are useful and helpful in life entertainments. Association of the two fields becomes more and more popular in recent years, and it will be a trend in the future. This motivates us to design a creative and interactive multimedia system that can recognize and capture the color information of one’s wearing when one enters a space. After the color recognition and extraction, we relate the color information with the psychology theory to analyze the characteristics and feeling of the people in the space. Moreover, we relate the psychology theory with the music theory to play appropriate music to comfort the people’s mind in the space. This application can easily be extended to exhibition centers, conference halls, coffee bars, or any space needing special music. Successful experimental results confirm the effectiveness of the proposed approach.
Vieira, Leonardo Machado Alves. "Development of an Automatic Image Enhancement Framework." Master's thesis, 2020. http://hdl.handle.net/10316/92489.
Full textImage enhancement is an image processing procedure in which an image becomes better suited for a task, and so, it is very relevant across multiple fields, such as medical imagery, space imagery, bio-metrics, etc. Image enhancement can be used to alter an image in several different ways, for instance, by highlighting a specific feature in order to ease post-processing analyses by a human or a machine, or by increasing its human perceived aesthetic. The main objective of this work is the study and development of a possible automatic image enhancement system, while having digital real-estate marketing as a case study, in the context of the project "Indest - Indicador de composicion estética". We explored existing research in image enhancement and propose an end-to-end image enhancement pipeline architecture that takes advantage of both classical, evolutionary and machine learning approaches from the literature. The framework is very modular as it can allow changes in its components and parameters. We tested it using a provided dataset of various real-estate pictures of different quality. The outputted enhanced images were evaluated using four image quality assessment tools and by conducting a user survey to assess their user perceived quality. We confirmed the initial presupposition that states that manipulating multiple image attributes at the same time is a complex problem. Also, looking at the survey results, we arrived to the conclusion that, in our scenario, similarity between an enhanced version and the original image, is more important to some extent, than improving its aesthetic value. This improvement can sometimes be exaggerated, causing the lost of useful contextual information or highlighting image defects. As such, a balance between similarity and aesthetic is desirable. Nevertheless, the attained results suggest that a modular and hybrid architecture like the one proposed, has potential in the area of image enhancement. Automatic image enhancement is very closely tied with the capability of machine automated image quality assessment systems, and so progress in both areas are also intrinsically connected.
Melhoramento de imagem é um procedimento da área de processamento de imagem onde uma imagem é manipulada de forma a que se adeque melhor a uma determinada tarefa, sendo por isso de grande relevância em múltiplas áreas, como por exemplo, imagem médica, imagem espacial, imagem biométrica, etc. Melhoramento de imagem pode resultar em diversos tipos de melhorias, por exemplo, o realce de uma características específica de forma a facilitar a subsequente análise realizada por uma máquina ou por um humano, ou melhoramento da sua percepção estética por um humano.O objectivo deste trabalho é o estudo e desenvolvimento de um possível sistema para melhoramento automático de imagens, tendo como caso de estudo marketing digital de imóveis no contexto do projecto "Indest - Indicador de composicion estética". Explorámos os estudos existentes em melhoramento de imagem e propomos uma pipeline com arquitectura "end-to-end", que tira partido de técnicas clássicas, evolucionarias e de aprendizagem computacional presentes na literatura.A estrutura apresentada é muito modular, pelo que permite a alteração de módulos e parâmetros. Os testes efectuados foram realizados sobre um conjunto variado de imagens de imobiliário, que nos foi fornecido. Os resultados dos testes foram avaliados utilizando técnicas de avaliação automática da qualidade de imagens e por um inquérito a utilizadores. Confirmámos o pressuposto inicial que indica que melhoramento de imagem manipulando múltiplos atributos, é uma tarefa complexa. Para além disso, olhando para os resultados do inquérito, chegámos à conclusão que, no nosso caso de uso, similaridade entre a imagem melhorada e a original, é algo mais importante do que puro melhoramento estético da imagem. Isto porque este melhoramento pode por vezes tornar-se exagerado, causando perda de informação contextual e realçando defeitos da imagem. Assim sendo, um balanço entre os dois é desejável. Apesar de tudo, os resultados obtidos sugerem que uma abordagem modular e híbrida como a apresentada, tem potencial na área de melhoramento de imagem. Melhoramento automático de imagem está fortemente ligado à capacidade de avaliar correctamente e automaticamente a qualidade das imagens, e assim sendo, o progresso nas duas áreas está também intrinsecamente ligado.
Héon-Morissette, Barah. "L’espace du geste-son, vers une nouvelle pratique performative." Thèse, 2016. http://hdl.handle.net/1866/19567.
Full textThis research-creation thesis is a reflection on the gesture-sound space. The author’s artistic research, based on six elements: body, sound, gesture, video, physical space, and technological space, was integrated in the conception of a motion capture system based on computer vision, the SICMAP (Système Interactif de Captation du Mouvement en Art Performatif – Interactive Motion Capture System For Performative Arts). This approach proposes a new performative hybrid practice. In the first part, the author situates her artistic practice supported by the three pillars of transdisciplinary research methodology: the levels of Reality and perception (the body and space as matter), the logic of the included middle (gesture-sound space) and the com- plexity (elements of the creative process). These transdisciplinary concepts are juxtaposed through the analysis of works bearing a common element to the author’s artistic practice, the body at the center of a sensorial universe. The author then puts forth elements relative to scenic practice arisen by this innovative artistic practice through the expressive body. The path taken by the performer-creator, leading to the conception of the SICMAP, is then explained through a reflection on the “dream instrument” and the realization of two preparatory gestural interfaces. Implying a new gestural in the context of a non-haptic interface that of the free-body gesture, the topology of the instrumental gesture is revisited in response to a new paradigm of the gesture-sound space. In reply to this research, the details of the SICMAP are then presented from the angle of the technological space and then applied to the gesture-sound space. The compositions realized during the development of SICMAP are then presented. These works are discussed from an artistic and poietic point of view through the founding elements of the author’s creative process. The conclusion summarises the objectives of this research-creation as well as the contributions of this new performative hybrid practice.
(8786558), Mehul Nanda. "You Only Gesture Once (YouGo): American Sign Language Translation using YOLOv3." Thesis, 2020.
Find full text(8771429), Ashley S. Dale. "3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAINING." Thesis, 2021.
Find full textAn RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ F1 = 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.