Dissertations / Theses: 'Computer vision technology'

1

Johansson, Björn. "Multiscale Curvature Detection in Computer Vision." Licentiate thesis, Linköping University, Linköping University, Computer Vision, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54966.

Full text

Abstract:

This thesis presents a new method for detection of complex curvatures such as corners, circles, and star patterns. The method is based on a second degree local polynomial model applied to a local orientation description in double angle representation. The theory of rotational symmetries is used to compute curvature responses from the parameters of the polynomial model. The responses are made more selective using a scheme of inhibition between different symmetry models. These symmetries can serve as feature points at a high abstraction level for use in hierarchical matching structures for 3D estimation, object recognition, image database search, etc.

A very efficient approximative algorithm for single and multiscale polynomial expansion is developed, which is used for detection of the complex curvatures in one or several scales. The algorithm is based on the simple observation that polynomial functions multiplied with a Gaussian function can be described in terms of partial derivatives of the Gaussian. The approximative polynomial expansion algorithm is evaluated in an experiment to estimate local orientation on 3D data, and the performance is comparable to previously tested algorithms which are more computationally expensive.

The curvature algorithm is demonstrated on natural images and in an object recognition experiment. Phase histograms based on the curvature features are developed and shown to be useful as an alternative compact image representation.

The importance of curvature is furthermore motivated by reviewing examples from biological and perceptual studies. The usefulness of local orientation information to detect curvature is also motivated by an experiment about learning a corner detector.

APA, Harvard, Vancouver, ISO, and other styles

2

Bårman, Håkan. "Hierarchical curvature estimation in computer vision." Doctoral thesis, Linköpings universitet, Bildbehandling, 1991. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54887.

Full text

Abstract:

This thesis concerns the estimation and description of curvature for computer vision applications. Different types of multi-dimensional data are considered: images (2D); volumes (3D); time sequences of images (3D); and time sequences of volumes (4D). The methods are based on local Fourier domain models and use local operations such as filtering. A hierarchical approach is used. Firstly, the local orientation is estimated and represented with a vector field equivalent description. Secondly, the local curvature is estimated from the orientation description. The curvature algorithms are closely related to the orientation estimation algorithms and the methods as a whole give a unified approach to the estimation and description of orientation and curvature. In addition, the methodology avoids thresholding and premature decision making. Results on both synthetic and real world data are presented to illustrate the algorithms performance with respect to accuracy and noise insensitivity. Examples illustrating the use of the curvature estimates for tasks such as image enhancement are also included.

APA, Harvard, Vancouver, ISO, and other styles

3

Moe, Anders. "Passive Aircraft Altitude Estimation using Computer Vision." Licentiate thesis, Linköping University, Linköping University, Computer Vision, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-53415.

Full text

Abstract:

This thesis presents a number of methods to estimate 3D structures with a single translating camera. The camera is assumed to be calibrated and to have a known translation and rotation.

Applications for aircraft altitude estimation and ground structure estimation ahead of the aircraft are discussed. The idea is to mount a camera on the aircraft and use the motion estimates obtained in the inertia navigation system. One reason for this arrangement is to make the aircraft more passive, in comparison to conventional radar based altitude estimation.

Two groups of methods are considered, optical flow based and region tracking based. Both groups have advantages and drawbacks.

Two methods to estimate the optical flow are presented. The accuracy of the estimated ground structure is increased by varying the temporal distance between the frames used in the optical flow estimation algorithms.

Four region tracking algorithms are presented. Two of them use canonical correlation and the other two are based on sum of squared difference and complex correlation respectively.

The depth estimates are then temporally filtered using weighted least squares or a Kalman filter.

A simple estimation of the computational complexity and memory requirements for the algorithms is presented to aid estimation of the hardware requirements.

Tests on real flight sequences are performed, showing that the aircraft altitude can be estimated with a good accuracy.

APA, Harvard, Vancouver, ISO, and other styles

4

Söderkvist, Oskar. "Computer Vision Classification of Leaves from Swedish Trees." Thesis, Linköping University, Linköping University, Computer Vision, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54366.

Full text

Abstract:

The aim of this master thesis is to classify the tree class from an image of a leaf with a computer vision classiffication system. We compare different descriptors that will describe the leaves different features. We will also look at different classiffication models and combine them with the descriptors to build a system hat could classify the different tree classes.

APA, Harvard, Vancouver, ISO, and other styles

5

Johansson, Björn. "Low Level Operations and Learning in Computer Vision." Doctoral thesis, Linköpings universitet, Bildbehandling, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-24005.

Full text

Abstract:

This thesis presents some concepts and methods for low level computer vision and learning, with object recognition as the primary application. An efficient method for detection of local rotational symmetries in images is presented. Rotational symmetries include circle patterns, star patterns, and certain high curvature patterns. The method for detection of these patterns is based on local moments computed on a local orientation description in double angle representation, which makes the detection invariant to the sign of the local direction vectors. Some methods are also suggested to increase the selectivity of the detection method. The symmetries can serve as feature descriptors and interest points for use in hierarchical matching structures for object recognition and related problems. A view-based method for 3D object recognition and estimation of object pose from a single image is also presented. The method is based on simple feature vector matching and clustering. Local orientation regions computed at interest points are used as features for matching. The regions are computed such that they are invariant to translation, rotation, and locally invariant to scale. Each match casts a vote on a certain object pose, rotation, scale, and position, and a joint estimate is found by a clustering procedure. The method is demonstrated on a number of real images and the region features are compared with the SIFT descriptor, which is another standard region feature for the same application. Finally, a new associative network is presented which applies the channel representation for both input and output data. This representation is sparse and monopolar, and is a simple yet powerful representation of scalars and vectors. It is especially suited for representation of several values simultaneously, a property that is inherited by the network and something which is useful in many computer vision problems. The chosen representation enables us to use a simple linear model for non-linear mappings. The linear model parameters are found by solving a least squares problem with a non-negative constraint, which gives a sparse regularized solution.

APA, Harvard, Vancouver, ISO, and other styles

6

Klomark, Marcus. "Occupant Detection using Computer Vision." Thesis, Linköping University, Linköping University, Computer Vision, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54363.

Full text

Abstract:

The purpose of this master’s thesis was to study the possibility to use computer vision methods to detect and classify objects in the front passenger seat in a car. This work presents different approaches to solve this problem and evaluates the usefulness of each technique. The classification information should later be used to modulate the speed and the force of the airbag, to be able to provide each occupant with optimal protection and safety.

This work shows that computer vision has a great potential in order to provide data, which may be used to perform reliable occupant classification. Future choice of method to use depends on many factors, for example costs and requirements on the system from laws and car manufacturers. Further, evaluation and tests of the methods in this thesis, other methods, the ABE approach and post-processing of the results should also be made before a reliable classification algorithm may be written.

APA, Harvard, Vancouver, ISO, and other styles

7

Fang, Jian. "Optical Imaging and Computer Vision Technology for Corn Quality Measurement." OpenSIUC, 2011. https://opensiuc.lib.siu.edu/theses/733.

Full text

Abstract:

The official U.S. standards for corn have been available for almost one hundred years. Corn grading system has been gradually updated over the years. In this thesis, we investigated a fast corn grading system, which includes the mechanical part and the computer recognition part. The mechanical system can deliver the corn kernels onto the display plate. For the computer recognition algorithms, we extracted common features from each corn kernel, and classified them to measure the grain quality.

APA, Harvard, Vancouver, ISO, and other styles

8

Andersson, Mats T. "Controllable Multi-dimensional Filters and Models in Low-Level Computer Vision." Doctoral thesis, Linköpings universitet, Bildbehandling, 1992. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54340.

Full text

Abstract:

This thesis concerns robust estimation of low-level features for use in computer vision systems. The presentation consists of two parts. The first part deals with controllable filters and models. A basis filter set is introduced which supports a computationally efficient synthesis of filters in arbitrary orientations. In contrast to many earlier methods, this approach allows the use of more complex models at an early stage of the processing. A new algorithm for robust estimation of orientation is presented. The algorithm is based on synthesized quadrature responses and supports the simultaneous representation and individual averaging of multiple events. These models are then extended to include estimation and representation of more complex image primitives such as as line ends, T-junctions, crossing lines and curvature. The proposed models are based on symmetry properties in the Fourier domain as well as in the spatial plane and the feature extraction is performed by applying the original basis filters directly on the grey-level image. The basis filters and interpolation scheme are finally generalized to allow synthesis of 3-D filters. The performance of the proposed models and algorithms is demonstrated using test images of both synthetic and real world data. The second part of the thesis concerns an image feature representation adapted for a robust analogue implementation. A possible use for this approach is in analogue VLSI or corresponding analogue hardware adapted for neural networks. The methods are based on projections of quadrature filter responses and mutual inhibition of magnitude signals.

APA, Harvard, Vancouver, ISO, and other styles

9

Möller, Sebastian. "Image Segmentation and Target Tracking using Computer Vision." Thesis, Linköpings universitet, Datorseende, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-68061.

Full text

Abstract:

In this master thesis the possibility of detecting and tracking objects in multispectral infrared video sequences is investigated. The current method with fix-sized rectangles have significant disadvantages. These disadvantages will be solved using image segmentation to estimate the shape of the object. The result of the image segmentation is used to determine the infrared contrast of the object. Our results show how some objects will give very good segmentation, tracking as well as shape detection. The objects that perform best are the flares and countermeasures. But especially helicopters seen from the side, with significant movements, is better detected with our method. The motion of the object is very important since movement is the main component in successful shape detection. This is so because helicopters are much colder than flares and engines. Detecting the presence and position of moving objects is easier and can be done quite successfully even with helicopters. But using structure tensors we can also detect the presence and estimate the position for stationary objects.
I detta examensarbete undersöks möjligheterna att detektera och spåra intressanta objekt i multispektrala infraröda videosekvenser. Den nuvarande metoden, som använder sig av rektanglar med fix storlek, har sina nackdelar. Dessa nackdelar kommer att lösas med hjälp av bildsegmentering för att uppskatta formen på önskade mål.Utöver detektering och spårning försöker vi också att hitta formen och konturen för intressanta objekt för att kunna använda den exaktare passformen vid kontrastberäkningar. Denna framsegmenterade kontur ersätter de gamla fixa rektanglarna som använts tidigare för att beräkna intensitetskontrasten för objekt i de infraröda våglängderna. Resultaten som presenteras visar att det för vissa objekt, som motmedel och facklor, är lättare att få fram en bra kontur samt målföljning än vad det är med helikoptrar, som var en annan önskad måltyp. De svårigheter som uppkommer med helikoptrar beror till stor del på att de är mycket svalare vilket gör att delar av helikoptern kan helt döljas i bruset från bildsensorn. För att kompensera för detta används metoder som utgår ifrån att objektet rör sig mycket i videon så att rörelsen kan användas som detekteringsparameter. Detta ger bra resultat för de videosekvenser där målet rör sig mycket i förhållande till sin storlek.

APA, Harvard, Vancouver, ISO, and other styles

10

Lindvall, Victor. "A Computer Vision-Based Approach for Automated Inspection of Cable Connections." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-448446.

Full text

Abstract:

The goal of the project was to develop an algorithm based on a Convolutional NeuralNetwork(CNN) for automatically detecting exposed metal components on coaxialcable connections, a.k.a. the detector. We show that the performance of such a CNN trained to identify bad weatherproofings can be improved by applying an image post processing technique. This post processing technique utilizes specular features as an advantage when predicting exposed metal components. Such specular features are notorious for posing problems in computer vision algorithms and therefore typically removed. The results achieved by applying the stand alone detector, without post processing, are compared with the image post processing approach to highlight the benefits of implementing such an algorithm.

APA, Harvard, Vancouver, ISO, and other styles

11

ALI, FAIZA, and MAKSIMS SVJATOHA. "Integration of Computer Vision Methods and Sensor Fusion Technologies for Precision Driving." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299793.

Full text

Abstract:

Increasing interest in artificial intelligence has given rise to new technologies. This has enabled advanced sensors within fields such as computer vision, which boast increased precision, consistency and lack the accumulation of small errors over time. However, they require increased computing power and are prone to processing delays. It is therefore interesting to combine them with faster and more traditional sensors in order to compensate for their weaknesses. While there exist such combinations today, is it interesting to see if there are ways to use computer vision techniques to bring the performance of cheaper sensors up to the standard of more expensive and industrial ones in terms of accuracy and precision. In this thesis, a standard Raspberry Pi camera has been installed on a Jetracer vehicle to estimate the distance to a target object, trying to fuse its output with that of a rotary encoder. A Kalman filter is used for this sensor fusion setup and it is designed to reduce any existing measurement uncertainties present in both the depth estimation algorithm for the camera, as well as the encoder position outputs. There exists a relationship between uncertainty mitigation and effective resolution. Sensor fusion was partially implemented in an online setting, but the focus was on fusing recorded sensor data to avoid issues with compensation for the inherent vision system latency. Fusing encoder measurements with those of a vision system significantly reduced position estimation uncertainty compared to only using the vision system, but it is unclear if it is better than using the encoder alone. Further investigation confirms that increased latencies and reduced sampling frequencies have a negative impact on position uncertainty. However, the impact of latencies in realistic ranges is negligible. There also exists a trade-off in sampling frequencies between precision and accuracy - higher frequencies are not necessarily better.
Med ett ökat intresse för artificiell intelligens har avancerade sensorer som använder datorseende slagit igenom stort i teknikvärlden. Dessa sensorer är kända för att ha hög precision, ger mer konsekventa mätningar och har inga fel som ackumuleras med tiden. En nackdel med dessa sensorer är att de kräver mer datorkraft för processering, vilket kan orsaka fördröjningar i utsignalen. Därför är det intressant att kombinera dessa med mer traditionella sensorer för att försöka kompensera för deras svagheter. Trots att denna metod redan har implementerats, är det intressant att studera om det finns ett sätt att öka prestandan hos billigare sensorer till en precisionsnivå som är likvärdig med dyrare och industriella varianter. Detta kan möjliggöras genom att använda olika bildbehandlingsmetoder och algoritmer för datorseende. I detta examensarbete, har en Raspberry Pi-kamera monterats på ett Jetracer-fordon för att uppskatta avståndet till ett målobjekt. Med hjälp av sensorfusion, kommer dess utsignal att kombineras med utsignalen från en roterande pulsgivare. Denna sensorfusion sker med hjälp av ett Kalman-filter vilket eventuellt kommer att minska på de osäkerheter som både bildbehandlingsalgoritmen för kameran och utsignalen från pulsgivaren medför. Det finns ett samband mellan osäkerheten i utsignalerna från datorseendesystemet och den effektiva upplösningen. Sensorfusion har delvis implementerats i ett online-scenario, men fokuset låg på att sammanfoga data från inspelningar för att undvika latensproblem från datorseendesystemet. Fusion av sensordata minskade osäkerheten i position jämfört med att endast använda datorseendesystemet. Det är dock oklart om det är bättre än att endast använda pulsgivaren. Ökade latenser och minskade samplingsfrekvenser hade en negativ inverkan på osäkerheten i position. Latenser som ligger inom realistiska gränser visar har dock försumbar inverkan. Vidare finns det en avvägning att göra mellan precision och noggrannhet - högre samplingsfrekvenser ökar noggrannheten men minskar precisionen.

APA, Harvard, Vancouver, ISO, and other styles

12

Martinsson, Jonas. "Examine vision technology for small object recognition in an industrial robotics application." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-28218.

Full text

Abstract:

This thesis explains the development of a computer vision system able to find and orient relatively small objects. The motivations is exchanging a monotonous work done by hand and replace it with an automation system with help of an ABB IRB 140 industrial robot. The vision system runs on a standard PC and is developed using the OpenCV environment, originally made by Intel in Russia. The algorithms of the system is written in C++ and the user interface in C++/CLI. With a derived test case, multiple vision algorithms is tested and evaluated for this kind of application. The result shows that SIFT/SURF works poorly with multiple instances of the search object and HAAR classifiers produces many false positives. Template matching with image moment calculation gave a satisfying result regarding multiple object in the scene and produces no false positives. Drawbacks of the selected algorithm developed where sensibility to light invariance and lack of performance in a skewed scene. The report also contains suggestions on how to precede with further improvements or research.

APA, Harvard, Vancouver, ISO, and other styles

13

Johnson, Abioseh Saeley. "Automatic number-plate recognition : an application of computer vision technology to automatic vehicle identification." Thesis, University of Bristol, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.300053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Muchaneta, Irikidzai Zorodzai. "Enhancing colour-coded poll sheets using computer vision as a viable Audience Response System (ARS) in Africa." Master's thesis, University of Cape Town, 2018. http://hdl.handle.net/11427/27854.

Full text

Abstract:

Audience Response Systems (ARS) give a facilitator accurate feedback on a question posed to the listeners. The most common form of ARS are clickers; Clickers are handheld response gadgets that act as a medium of communication between the students and facilitator. Clickers are prohibitively expensive creating a need to innovate low-cost alternatives with high accuracy. This study builds on earlier research by Gain (2013) which aims to show that computer vision and coloured poll sheets can be an alternative to clicker based ARS. This thesis examines a proposal to create an alternative to clickers applicable to the African context, where the main deterrent is cost. This thesis studies the computer vision structures of feature detection, extraction and recognition. In this research project, an experimental study was conducted using various lecture theatres with students ranging from 50 - 150. Python and OpenCV tools were used to analyze the photographs and document the performance as well as observing the different conditions in which to acquire results. The research had an average detection rate of 75% this points to a promising alternative audience response system as measured by time, cost and error rate. Further work on the capture of the poll sheet would significantly increase this result. With regards to cost, the computer vision coloured poll sheet alternative is significantly cheaper than clickers.

APA, Harvard, Vancouver, ISO, and other styles

15

Aragon, Camarasa Gerardo. "A hierarchical active binocular robot vision architecture for scene exploration and object appearance learning." Thesis, University of Glasgow, 2012. http://theses.gla.ac.uk/3640/.

Full text

Abstract:

This thesis presents an investigation of a computational model of hierarchical visual behaviours within an active binocular robot vision architecture. The robot vision system is able to localise multiple instances of the same object class, while simultaneously maintaining vergence and directing its gaze to attend and recognise objects within cluttered, complex scenes. This is achieved by implementing all image analysis in an egocentric symbolic space without creating explicit pixel-space maps and without the need for calibration or other knowledge of the camera geometry. One of the important aspects of the active binocular vision paradigm requires that visual features in both camera eyes must be bound together in order to drive visual search to saccade, locate and recognise putative objects or salient locations in the robot's field of view. The system structure is based on the “attentional spotlight” metaphor of biological systems and a collection of abstract and reactive visual behaviours arranged in a hierarchical structure. Several studies have shown that the human brain represents and learns objects for recognition by snapshots of 2-dimensional views of the imaged scene that happens to contain the object of interest during active interaction (exploration) of the environment. Likewise, psychophysical findings specify that the primate’s visual cortex represents common everyday objects by a hierarchical structure of their parts or sub-features and, consequently, recognise by simple but imperfect 2D view object part approximations. This thesis incorporates the above observations into an active visual learning behaviour in the hierarchical active binocular robot vision architecture. By actively exploring the object viewing sphere (as higher mammals do), the robot vision system automatically synthesises and creates its own part-based object representation from multiple observations while a human teacher indicates the object and supplies a classification name. Its is proposed to adopt the computational concepts of a visual learning exploration mechanism that controls the accumulation of visual evidence and directs attention towards the spatial salient object parts. The behavioural structure of the binocular robot vision architecture is loosely modelled by a WHAT and WHERE visual streams. The WHERE stream maintains and binds spatial attention on the object part coordinates that egocentrically characterises the location of the object of interest and extracts spatio-temporal properties of feature coordinates and descriptors. The WHAT stream either determines the identity of an object or triggers a learning behaviour that stores view-invariant feature descriptions of the object part. Therefore, the robot vision is capable to perform a collection of different specific visual tasks such as vergence, detection, discrimination, recognition localisation and multiple same-instance identification. This classification of tasks enables the robot vision system to execute and fulfil specified high-level tasks, e.g. autonomous scene exploration and active object appearance learning.

APA, Harvard, Vancouver, ISO, and other styles

16

Nicander, Torun. "Indoor triangulation system using vision sensors." Thesis, Uppsala universitet, Signaler och system, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-429676.

Full text

Abstract:

This thesis aims to investigate a triangulation system for indoor positioning in two dimensions (2D). The system was implemented using three Pixy2 vision sensors placed on a straight baseline. A Pixy2 consists of a camera lens and an image sensor (Aptina MT9M114) as well as a microcontroller (NXP LPC4330), and other components. It can track one or multiple colours, or a combination of colours. To position an object using triangulation, one needs to determine the angles (α) to the object from a pair of known observing points (i.e., any pair of the three Pixy2s' placed in fixed positions on the baseline in this project). This is done from the Pixy2s' images. Using the Pinhole Camera Model, the tangent of the angle, tan(α), is found to have a linear relation with the displacement Δx in the image plane (in pixels), namely, tan(α) = k Δx, where k is a constant depending on the specific Pixy2. A wooden test board was made specially to determine k for all the Pixy2s. It had distance marks made in two dimensions and had a Pixy2 affixed at the origin. By placing a coloured object at three different sets of spatial sampling points (marks), the constant k for each Pixy2 was determined with the error variance of < 5%. Position estimations of the triangulation system were conducted using all three pairs formed from the three Pixy2s and placing the positioned object at different positions in the 2D plane on the board. A combination using estimation values from all three pairs to make a more accurate estimate was also evaluated. The estimation results show the positioning accuracy ranging from 0.03678 cm to 2.064 cm for the z-coordinate, and from 0.02133 cm to 0.9785 cm for the x-coordinate, which are very satisfactory results. The vision sensors were quite sensitive to the light environment when finely tuned to track one object, which therefore has a significant effect on the performance of the vision sensor-based triangulation. An extension of the system to use more than three Pixy2s has been looked into and shown to be feasible. A method for auto-calibrating the Pixy2s' positions on the baseline was suggested and implemented. After auto-calibration, the system still performed satisfactory position estimations.

APA, Harvard, Vancouver, ISO, and other styles

17

Ringaby, Erik. "Geometric Computer Vision for Rolling-shutter and Push-broom Sensors." Licentiate thesis, Linköpings universitet, Datorseende, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-77391.

Full text

Abstract:

Almost all cell-phones and camcorders sold today are equipped with a CMOS (Complementary Metal Oxide Semiconductor) image sensor and there is also a general trend to incorporate CMOS sensors in other types of cameras. The sensor has many advantages over the more conventional CCD (Charge-Coupled Device) sensor such as lower power consumption, cheaper manufacturing and the potential for on-chip processing. Almost all CMOS sensors make use of what is called a rolling shutter. Compared to a global shutter, which images all the pixels at the same time, a rolling-shutter camera exposes the image row-by-row. This leads to geometric distortions in the image when either the camera or the objects in the scene are moving. The recorded videos and images will look wobbly (jello effect), skewed or otherwise strange and this is often not desirable. In addition, many computer vision algorithms assume that the camera used has a global shutter, and will break down if the distortions are too severe. In airborne remote sensing it is common to use push-broom sensors. These sensors exhibit a similar kind of distortion as a rolling-shutter camera, due to the motion of the aircraft. If the acquired images are to be matched with maps or other images, then the distortions need to be suppressed. The main contributions in this thesis are the development of the three dimensional models for rolling-shutter distortion correction. Previous attempts modelled the distortions as taking place in the image plane, and we have shown that our techniques give better results for hand-held camera motions. The basic idea is to estimate the camera motion, not only between frames, but also the motion during frame capture. The motion can be estimated using inter-frame image correspondences and with these a non-linear optimisation problem can be formulated and solved. All rows in the rolling-shutter image are imaged at different times, and when the motion is known, each row can be transformed to the rectified position. In addition to rolling-shutter distortions, hand-held footage often has shaky camera motion. It has been shown how to do efficient video stabilisation, in combination with the rectification, using rotation smoothing. In the thesis it has been explored how to use similar techniques as for the rolling-shutter case in order to correct push-broom images, and also how to rectify 3D point clouds from e.g. the Kinect depth sensor.
VGS

APA, Harvard, Vancouver, ISO, and other styles

18

Villaroman, Norman. "Face Tracking User Interfaces Using Vision-Based Consumer Devices." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/3941.

Full text

Abstract:

Some individuals have difficulty using standard hand-manipulated input devices such as a mouse and a keyboard effectively. For such users who at the same time have sufficient control over face and head movement, a robust perceptual or vision-based user interface that can track face movement can significantly help them. Using vision-based consumer devices makes such a user interface readily available and allows its use to be non-intrusive. Designing this type of user interface presents some significant challenges particularly with accuracy and usability. This research investigates such problems and proposes solutions to create a usable and robust face tracking user interface using currently available state-of-the-art technology. In particular, the input control in such an interface is divided into its logical components and studied one by one, namely, user input, capture technology, feature retrieval, feature processing, and pointer behavior. Different options for these components are studied and evaluated to see if they contribute to more efficient use of the interface. The evaluation is done using standard tests created for this purpose. The tests were done by a single user. The results can serve as a precursor to a full-scale usability study, various improvements, and eventual deployment for actual use. The primary contributions of this research include a logical organization and evaluation of the input process and its different components in face tracking user interfaces, a common library for computer control that can be used by various face tracking engines, an adaptive pointing input style that makes pointing using natural movement easier, and a test suite that can be used to measure performance of various user interfaces for desktop systems.

APA, Harvard, Vancouver, ISO, and other styles

19

Mi, Yongcui. "Novel beam shaping and computer vision methods for laser beam welding." Licentiate thesis, Högskolan Väst, Avdelningen för produktionssystem (PS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hv:diva-16970.

Full text

Abstract:

Laser beam welding has been widely applied in different industrial sectors due to its unique advantages. However, there are still challenges, such as beam positioning in T-joint welding, and gap bridging in butt joint welding,especially in the case of varying gap width along a joint. It is expected that enabling more advanced control to a welding system, and obtaining more in-depth process knowledge could help to solve these issues. The aim of this work is to address such welding issues by a laser beam shaping technology using a novel deformable mirror together with computer vision methods and also to increase knowledge about the benefits and limitations with this approach. Beam shaping in this work was realized by a novel deformable mirror system integrated into an industrial processing optics. Together with a wave front sensor, a controlled adaptive beam shaping system was formed with a response time of 10 ms. The processes were monitored by a coaxial camera with selected filters and passive or active illumination. Conduction mode autogenous bead-on-plate welding and butt joint welding experiments have been used to understand the effect of beam shaping on the melt pool geometry. Circular Gaussian, and elliptical Gaussian shapes elongated transverse to and along the welding direction were studied. In-process melt pool images and cross section micrographs of the weld seams/beads were analyzed. The results showed that the melt pool geometry can be significantly modified by beam shaping using the deformable mirror. T-joint welding with different beam offset deviations relative to the center of the joint line was conducted to study the potential of using machine learning to track the process state. The results showed that machine learning can reach sufficient detection and estimation performance, which could also be used for on-line control. In addition, in-process and multidimensional data were accurately acquired using computer vision methods. These data reveal weaknesses of current thermo-fluid simulation model, which in turn can help to better understand and control laser beam welding. The obtained results in this work shows a huge potential in using the proposed methods to solve relevant challenges in laser beam welding.
Lasersvetsning används i stor utsträckning i olika industrisektorer på grund av dess unika fördelar. Det finns emellertid fortfarande utmaningar, såsom rätt positionering av laserstrålen vid genomträngningssvetsning av T-fogar och hantering av varierande spaltbredd längs fogen vid svetsning av stumfogar. Sådana problem förväntas kunna lösas med avancerade metoder för automatisering, metoder som också förväntas ge fördjupade kunskaper om processen. Syftet med detta arbete är att ta itu med dessa problem med hjälp av en teknik för lasereffektens fördelning på arbetsstycket, s.k. beam shaping. Det sker med hjälp av en ny typ av i realtid deformerbar spegel tillsammans med bildbehandling av kamerabilder från processen. För- och nackdelar med detta tillvägagångssätt undersöks.Beam shaping åstadkoms med hjälp av ny typ av deformerbart spegelsystem som integreras i en industriell processoptik. Tillsammans med en vågfrontsensor bildas ett adaptivt system för beam shaping med en svarstid på 10 ms. Processen övervakas av en kamera linjerad koaxialt med laserstrålen. För att kunna ta bilder av svetspunkten belyses den med ljus av lämplig våglängd, och kameran är försedd med ett motsvarande optiskt filter. Försök har utförts med svetsning utan tillsatsmaterial, direkt på plåtar, svetsning utan s.k. nyckelhål, för att förstå effekten av beam shaping på svetssmältans geometri. Gauss fördelade cirkulära och elliptiska former, långsträckta både tvärs och längs svetsriktningen har studerats. Bilder från svetssmältan har analyserats och även mikrostrukturen i tvärsnitt från de svetsade plåtarna. Resultaten visar att svetssmältans geometri kan modifieras signifikant genom beam shaping med hjälp av det deformerbara spegelsystemet. Genomträngningssvetsning av T-fogar med avvikelser relativt foglinjens centrum genomfördes för att studera potentialen i att använda maskininlärning för att fånga processens tillstånd. Resultaten visade att maskininlärning kan nå tillräcklig prestanda för detektering och skattning av denna avvikelse. Något som också kan användas för återkopplad styrning. Flerdimensionell processdata har samlats i realtid och analyserats med hjälp av bildbehandlingsmetoder. Dessa data avslöjar brister i nuvarande simuleringsmodeller,vilket i sin tur hjälper till med att bättre förstå och styra lasersvetsning.Resultaten från detta arbete uppvisar en god potential i att använda de föreslagna metoderna för att lösa relevanta utmaningar inom lasersvetsning.

Till licentiatuppsats hör 2 inskickade artiklar, som visas inte nu.

APA, Harvard, Vancouver, ISO, and other styles

20

CABAN, JESUS. "INFORMATION TECHNOLOGY FOR NEXT-GENERATION OF SURGICAL ENVIRONMENTS." UKnowledge, 2006. http://uknowledge.uky.edu/gradschool_theses/229.

Full text

Abstract:

Minimally invasive surgeries (MIS) are fundamentally constrained by image quality,access to the operative field, and the visualization environment on which thesurgeon relies for real-time information. Although invasive access benefits the patient,it also leads to more challenging procedures, which require better skills andtraining. Endoscopic surgeries rely heavily on 2D interfaces, introducing additionalchallenges due to the loss of depth perception, the lack of 3-Dimensional imaging,and the reduction of degrees of freedom.By using state-of-the-art technology within a distributed computational architecture,it is possible to incorporate multiple sensors, hybrid display devices, and3D visualization algorithms within a exible surgical environment. Such environmentscan assist the surgeon with valuable information that goes far beyond what iscurrently available. In this thesis, we will discuss how 3D visualization and reconstruction,stereo displays, high-resolution display devices, and tracking techniques arekey elements in the next-generation of surgical environments.

APA, Harvard, Vancouver, ISO, and other styles

21

Kutiyanawala, Aliasgar. "Eyes-Free Vision-Based Scanning of Aligned Barcodes and Information Extraction from Aligned Nutrition Tables." DigitalCommons@USU, 2013. http://digitalcommons.usu.edu/etd/1522.

Full text

Abstract:

Visually impaired (VI) individuals struggle with grocery shopping and have to rely on either friends, family or grocery store associates for shopping. ShopMobile 2 is a proof-of-concept system that allows VI shoppers to shop independently in a grocery store using only their smartphone. Unlike other assistive shopping systems that use dedicated hardware, this system is a software only solution that relies on fast computer vision algorithms. It consists of three modules - an eyes free barcode scanner, an optical character recognition (OCR) module, and a tele-assistance module. The eyes-free barcode scanner allows VI shoppers to locate and retrieve products by scanning barcodes on shelves and on products. The OCR module allows shoppers to read nutrition facts on products and the tele-assistance module allows them to obtain help from sighted individuals at remote locations. This dissertation discusses, provides implementations of, and presents laboratory and real-world experiments related to all three modules.

APA, Harvard, Vancouver, ISO, and other styles

22

Mollberg, Alexander. "A Resource-Efficient and High-Performance Implementation of Object Tracking on a Programmable System-on-Chip." Thesis, Linköpings universitet, Datorteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-124044.

Full text

Abstract:

The computer vision problem of object tracking is introduced and explained. An approach to interest point based feature detection and tracking using FAST and BRIEF is presented and the selection of algorithms suitable for implementation on a Xilinx Zynq7000 with an XC7Z020 field-programmable gate array (FPGA) is detailed. A modification to the smoothing strategy of BRIEF which significantly reduces memory utilization on the FPGA is presented and benchmarked against a reference strategy. Measures of performance and resource efficiency are presented and utilized in an iterative development process. A system for interest point based object tracking that uses FAST for feature detection and BRIEF for feature description with the proposed smoothing modification is implemented on the FPGA. The design is described and important design choices are discussed.

APA, Harvard, Vancouver, ISO, and other styles

23

Luwes, Nicolaas Johannes. "Artificial intelligence machine vision grading system." Thesis, Bloemfontein : Central University of Technology, Free State, 2014. http://hdl.handle.net/11462/35.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Eidehall, Andreas. "Tensor representation of 3D structures." Thesis, Linköping University, Department of Electrical Engineering, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-1241.

Full text

Abstract:

This is a thesis written for a master's degree at the Computer Vision Laboratory, University of Linköping. An abstract outer product is defined and used as a bridge to reach 2:nd and 4:th order tensors. Some applications of these in geometric analysis of range data are discussed and illustrated. In idealized setups, simple geometric objects, like spheres or polygons, are successfully detected. Finally, the generalization to n:th order tensors for storing and analysing geometric information is discussed.

APA, Harvard, Vancouver, ISO, and other styles

25

Viljoen, Vernon. "Integration of a vision-guided robot into a reconfigurable component- handling platform." Thesis, [Bloemfontein?] : Central University of Technology, Free State, 2014. http://hdl.handle.net/11462/120.

Full text

Abstract:

Thesis (M. Tech.) -- Central University of Technology, Free State, 2010
The latest technological trend in manufacturing worldwide is automation. Reducing human labour by using robots to do the work is purely a business decision. The reasons for automating a plant include: Improving productivity Reducing labour and equipment costs Reducing product damage Monitoring system reliability Improving plant safety. The use of robots in the automation sector adds value to the production line because of their versatility. They can be programmed to follow specific paths when moving material from one point to another and their biggest advantage is that they can operate for twenty-four hours a day while delivering consistent quality and accuracy. Vision-Guided Robots (VGRs) are developed for many different applications and therefore many different combinations of VGR systems are available. All VGRs are equipped with vision sensors which are used to locate and inspect various objects. In this study a robot and a vision system were combined for a pick-and-place application. Research was done on the design of a robot for locating, inspecting and picking selected components from a moving conveyor system.

APA, Harvard, Vancouver, ISO, and other styles

26

Johansson, Alexander. "Automated panorama sequence detection using the Narrative platform." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-108402.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Hüdig, Daniel H. "The vision of a future information and communication society computer-mediated communication and technology policy in the United States, the European Union and Japan /." [S.l. : s.n.], 2000. http://deposit.ddb.de/cgi-bin/dokserv?idn=961031476.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Andreasson, Henrik. "Local visual feature based localisation and mapping by mobile robots." Doctoral thesis, Örebro : Örebro University, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-2444.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Axelsson, Viktor. "Automatisk segmentering och maskering av implantat i mammografibilder." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-113459.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Håkansson, Staffan. "Detektering av sprickor i vägytor med hjälp av Datorseende." Thesis, Linköping University, Department of Electrical Engineering, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2818.

Full text

Abstract:

This thesis describes new methods for automatic crack detection in pavements. Cracks in pavements can be used as an early indication for the need of reparation.

Automatic crack detection is preferable compared to manual inventory; the repeatability can be better, the inventory can be done at a higher speed and can be done without interruption of the traffic.

The automatic and semi-automatic crack detection systems that exist today use Image Analysis methods. There are today powerful methods available in the area of Computer Vision. These methods work in higher dimensions with greater complexity and generate measures of local signal properties, while Image Analyses methods for crack detection use morphological operations on binary images.

Methods for digitalizing video data on VHS-cassettes and stitching images from nearby frames have been developed.

Four methods for crack detection have been evaluated, and two of them have been used to form a crack detection and classification program implemented in the calculation program Matlab.

One image set was used during the implementation and another image set was used for validation. The crack detection system did perform correct detection on 99.2 percent when analysing the images which were used during implementation. The result of the crack detection on the validation data was not very good. When the program is being used on data from other pavements than the one used during implementation, information about the surface texture is required to calibrate the crack detection.

APA, Harvard, Vancouver, ISO, and other styles

31

Chen, Yiqiang. "Person re-identification in images with deep learning." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEI074/document.

Full text

Abstract:

La vidéosurveillance est d’une grande valeur pour la sécurité publique. En tant que l’un des plus importantes applications de vidéosurveillance, la ré-identification de personnes est définie comme le problème de l’identification d’individus dans des images captées par différentes caméras de surveillance à champs non-recouvrants. Cependant, cette tâche est difficile à cause d’une série de défis liés à l’apparence de la personne, tels que les variations de poses, de point de vue et de l’éclairage etc. Pour régler ces différents problèmes, dans cette thèse, nous proposons plusieurs approches basées sur l’apprentissage profond de sorte d’améliorer de différentes manières la performance de ré-identification. Dans la première approche, nous utilisons les attributs des piétons tels que genre, accessoires et vêtements. Nous proposons un système basé sur un réseau de neurones à convolution(CNN) qui est composé de deux branches : une pour la classification d’identité et l’autre pour la reconnaissance d’attributs. Nous fusionnons ensuite ces deux branches pour la ré-identification. Deuxièmement, nous proposons un CNN prenant en compte différentes orientations du corps humain. Le système fait une estimation de l’orientation et, de plus, combine les caractéristiques de différentes orientations extraites pour être plus robuste au changement de point de vue. Comme troisième contribution de cette thèse, nous proposons une nouvelle fonction de coût basée sur une liste d’exemples. Elle introduit une pondération basée sur le désordre du classement et permet d’optimiser directement les mesures d’évaluation. Enfin, pour un groupe de personnes, nous proposons d’extraire une représentation de caractéristiques visuelles invariante à la position d’un individu dans une image de group. Cette prise en compte de contexte de groupe réduit ainsi l’ambigüité de ré-identification. Pour chacune de ces quatre contributions, nous avons effectué de nombreuses expériences sur les différentes bases de données publiques pour montrer l’efficacité des approches proposées
Video surveillance systems are of a great value for public safety. As one of the most import surveillance applications, person re-identification is defined as the problem of identifying people across images that have been captured by different surveillance cameras without overlapping fields of view. With the increasing need for automated video analysis, this task is increasingly receiving attention. However, this problem is challenging due to the large variations of lighting, pose, viewpoint and background. To tackle these different difficulties, in this thesis, we propose several deep learning based approaches to obtain a better person re-identification performance in different ways. In the first proposed approach, we use pedestrian attributes to enhance the person re-identification. The attributes are defined as semantic mid-level descriptions of persons, such as gender, accessories, clothing etc. They could be helpful to extract characteristics that are invariant to the pose and viewpoint variations thanks to the descriptor being on a higher semantic level. In order to make use of the attributes, we propose a CNN-based person re-identification framework composed of an identity classification branch and of an attribute recognition branch. At a later stage, these two cues are combined to perform person re-identification. Secondly, among the challenges, one of the most difficult is the variation under different viewpoint. The same person shows very different appearances from different points of view. To deal with this issue, we consider that the images under various orientations are from different domains. We propose an orientation-specific CNN. This framework performs body orientation regression in a gating branch, and in another branch learns separate orientation-specific layers as local experts. The combined orientation-specific CNN feature representations are used for the person re-identification task. Thirdly, learning a similarity metric for person images is a crucial aspect of person re-identification. As the third contribution, we propose a novel listwise loss function taking into account the order in the ranking of gallery images with respect to different probe images. Further, an evaluation gain-based weighting is introduced in the loss function to optimize directly the evaluation measures of person re-identification. At the end, in a large gallery set, many people could have similar clothing. In this case, using only the appearance of single person leads to strong ambiguities. In realistic settings, people often walk in groups rather than alone. As the last contribution, we propose to learn a deep feature representation with displacement invariance for group context and introduce a method to combine the group context and single-person appearance. For all the four contributions of this thesis, we carry out extensive experiments on popular benchmarks and datasets to demonstrate the effectiveness of the proposed systems

APA, Harvard, Vancouver, ISO, and other styles

32

Hemery, Edgar. "Modélisation, reconnaissance du geste des doigts et du haut du corps dans le design d’interaction musicale." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEM075/document.

Full text

Abstract:

Cette thèse présente un nouvel instrument de musique, appelé Embodied Musical Instrument (EMI), qui a été conçu pour répondre à deux problèmes : comment pouvons-nous “capturer” et modéliser des gestes musicaux et comment utiliser ce modèle afin de contrôler des paramètres de synthèse sonore de manière expressive. L'EMI est articulé autour d'une stratégie de “mapping explicite” qui s'inspire de techniques du jeu pianistique, mais aussi du potentiel gestuel de certains objets. Le système que nous proposons utilise des caméras 3D et des algorithmes de vision par ordinateur afin de libérer le geste de dispositifs intrusifs, tout en facilitant le processus de capture et de performance. Nous utilisons différentes caméras 3D pour le suivi de geste et exploitons pleinement leur potentiel en ajoutant une plaque transparente. Cette plaque créer un seuil de détection pour les doigtés, mais fournit aussi une rétroaction haptique, simple mais nécessaire. Nous avons examiné les gestes des doigts par rapport à la surface de l’EMI et nous avons décomposé leurs trajectoires en phases élémentaires, ce qui nous a permis de modéliser et d'analyser des gestes de type pianistique. Une étude préliminaire sur les gestes musicaux a porté notre intérêt non seulement sur les gestes “effectifs” opérés par les doigts - dans le cas des instruments à claviers - mais aussi sur les gestes “d’accompagnements” et “figuratifs”, caractérisés principalement par les mouvements des bras et de la tête. Par conséquent, nous distinguons deux niveaux d'interactions, délimités par deux volumes d’interaction: un volume englobant dit “micro” inclut les micro-gestes opérés par les doigts, tandis qu’un volume englobant dit “macro” comprend des mouvements plus importants du haut du corps. À partir de cela, nous étendons notre modèle de jeu pianistique à un paradigme d'interaction 3D, où les paramètres musicaux de haut niveau, tels que les effets sonores (filtres, réverbération, spatialisation), peuvent être contrôlés en continu par des mouvements du haut du corps. Nous avons exploré un ensemble de scénarios réels pour cet instrument, à savoir la pratique, la composition et la performance. L'EMI introduit un cadre pour la capture et l'analyse de gestes musicaux spécifiques. Une analyse hors ligne des fonctionnalités gestuelles peut révéler des tendances, des défauts et des spécificités d'une interprétation musicale. Plusieurs œuvres musicales ont été créées pour l’EMI, réalisées en solo, accompagnées d'un quatuor à cordes, et d’autres ensembles musicaux. Un retour d'expérience montre que l'instrument peut être facilement enseigné - sinon de manière autodidacte - grâce aux paradigmes gestuels intuitifs tirés de gestes pianistiques et d'autres gestes métaphoriques
This thesis presents a novel musical instrument, named the Embodied Musical Instrument (EMI), which has been designed to answer two problems : how can we capture and model musical gestures and how can we use this model to control sound synthesis parameters expressively. The EMI is articulated around an explicit mapping strategy, which draws inspiration from the piano-playing techniques and other objects’ affordances. The system we propose makes use of 3D cameras and computer vision algorithms in order to free the gesture from intrusive devices and ease the process of capture and performance, while enabling precise and reactive tracking of the fingertips and upper-body. Having recourse to different 3D cameras tracking solutions, we fully exploit their potential by adding a transparent sheet, which serves as a detection threshold for fingerings as well as bringing a simple but essential haptic feedback. We examined finger movements while tapping on the surface of the EMI and decomposed their trajectories into essential phases, which enabled us to model and analyse piano-like gestures. A preliminary study of generic musical gestures directed our interest not only on the effective gestures operated by the fingers - in the case of keyboard instruments - but also on the accompanying and figurative gestures, which are mostly characterised by the arms and head movements. Consequently, we distinguish two level of interactions, delimited by two bounding volumes. The micro bounding volume includes the micro-gestures operated with the fingers, while the macro bounding volume includes larger movements with the upper-body. Building from this, we extend our piano-like model to a 3D interaction paradigm, where higher-level musical parameters, such as sound effects, can be controlled continuously by upper-body free movements. We explored a set of real-world scenarios for this instrument, namely practice, composition and performance. The EMI introduces a framework for capture and analysis, of specific musical gestures. An off-line analysis of gesture features can reveal trends, faults and musical specificities of an interpret. Several musical works have been created and performed live; either solo or accompanied by a string quartet, revealing the body gesture specificities through the sounds it synthesises. User experience feedback shows that the instrument can be easily taught - if not self-taught - thanks to the intuitive gesture paradigms drawn from piano-like gestures and other metaphorical gestures

APA, Harvard, Vancouver, ISO, and other styles

33

Bihi, Thabo George. "Assembly-setup verification and quality control using machine vision within a reconfigurable assembly system." Thesis, [Bloemfontein?] : Central University of Technology, Free State, 2014. http://hdl.handle.net/11462/188.

Full text

Abstract:

Thesis (M. Tech. (Engineering: Electrical)) -- Central University of technology, Free State, [2014]
The project is aimed at exploring the application of Machine Vision in a Reconfigurable Manufacturing System (RMS) Environment. The Machine Vision System interfaces with the RMS to verify the reconfiguration and positioning of devices within the assembly system, and inspects the product for defects that infringe on the quality of that product. The vision system interfaces to the Multi-agent System (MAS), which is in charge of scheduling and allocating resources of the RMS, in order to communicate and exchange data regarding the quality of the product. The vision system is comprised of a Compact Vision System (CVS) device with fire-wire cameras to aid in the image acquisition, inspection and verification process. Various hardware and software manufacturers offer a platform to implement this with a multiple array of vision equipment and software packages. The most appropriate devices and software platform were identified for the implementation of the project. An investigation into illumination was also undertaken in order to determine whether external lighting sources would be required at the point of inspection. Integration into the assembly system involved the establishment communication between the vision system and assembly system controller.

APA, Harvard, Vancouver, ISO, and other styles

34

Johansson, Filip, and Ali Karabiber. "Övervakningssystem för inomhusmiljöer." Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20874.

Full text

Abstract:

Övervakningssystem har haft en viktig roll i samhället under en lång period. Syftet med dessasystemen har inte ändrats, det är alltid säkerhetsaspekten som har varit huvudpunkten.Effektiviteten på ett övervakningssystem beror oftast på den mänskliga faktorn, någon måstebearbeta bildflödet som spelas in för att dra definitiva slutsatser. Under senare år har dessaövervakningssystem, i takt med teknologins utveckling, blivit mer intelligenta. Metoder somtillämpar automatisk analysering av en given bild har introducerats, vissa av dessa teknikerhar redan hunnit bli en standard. Denna studie ska utveckla en prototyp som använder sig avtekniken och stresstesta för att identifiera eventuella brister. Det är i grunden en feasibilitystudy som ska visa vad som kan uppnås med begränsade resurser. Med en funktionellprototyp tillgänglig skall diverse stresstester specifieras för att mäta systemets gränser.Resultatet har överlag varit positiva men ett antal kritiska brister identifierades.
Security systems have always had an important role in society for a long time. The purpose ofthese systems has always remained primarily focused on the security aspect. In most cases theefficiency of current security systems depend on human factors. The frames have to beprocessed by a person to reach a definitive conclusion. In recent years, security systems havebecome increasingly intelligent as technology continues to develop. Methods that implementautomatic analysis of any given frame in a video have been introduced and some of thesemethods are already standardized. The aim of this study is to develop a prototype and stresstest it to identify possible flaws. This study is primarily a feasibility study to show thepossibilities of what can be achieved with limited resources. With a fully functional prototypeavailable a series of stress tests will be specified to measure the system’s limits. The resultshave been positive overall but a number of critical flaws have been identified.

APA, Harvard, Vancouver, ISO, and other styles

35

Park, Chung Hyuk. "Robot-based haptic perception and telepresence for the visually impaired." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44848.

Full text

Abstract:

With the advancements in medicine and welfare systems, the average life span of modern human beings is expanding, creating a new market for elderly care and assistive technology. Along with the development of assistive devices based on traditional aids such as voice-readers, electronic wheelchairs, and prosthetic limbs, a robotic platform is one of the most suitable platforms for providing multi-purpose assistance in human life. This research focuses on the transference of environmental perception to a human user through the use of interactive multi-modal feedback and an assistive robotic platform. A novel framework for haptic telepresence is presented to solve the problem, and state-of-the-art methodologies from computer vision, haptics, and robotics are utilized. The objective of this research is to design a framework that achieves the following: 1) This framework integrates visual perception from heterogeneous vision sensors, 2) it enables real-time interactive haptic representation of the real world through a mobile manipulation robotic platform and a haptic interface, and 3) it achieves haptic fusion of multiple sensory modalities from the robotic platform and provides interactive feedback to the human user. Specifically, a set of multi-disciplinary algorithms such as stereo-vision processes, three-dimensional (3D) map-building algorithms, and virtual-proxy based haptic volume representation processes will be integrated into a unified framework to successfully accomplish the goal. The application area of this work is focused on, but not limited to, assisting people with visual impairment with a robotic platform by providing multi-modal feedback of the environment.

APA, Harvard, Vancouver, ISO, and other styles

36

Samanci, Ozge. "Embodying comics reinventing comics and animation for a digital performance /." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29630.

Full text

Abstract:

Thesis (Ph.D)--Literature, Communication, and Culture, Georgia Institute of Technology, 2010.
Committee Chair: Mazalek, Alexandra; Committee Member: Bolter, Jay; Committee Member: Knospel, Kenneth; Committee Member: Murray, Janet; Committee Member: Winegarden, Claudia Rebola. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

37

Filip, Mori. "A 2D video player for Virtual Reality and Mixed Reality." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217359.

Full text

Abstract:

While 3D degree video in recent times have been object of research, 2D flat frame videos in virtual environments (VE) seemingly have not received the same amount of attention. Specifically, 2D video playback in Virtual Reality (VR) and Mixed Reality (MR) appears to lack exploration in both features and qualities of resolution, audio and interaction, which finally are contributors of presence. This paper reflects on the definitions of Virtual Reality and Mixed Reality, while extending known concepts of immersion and presence to 2D videos in VEs. Relevant attributes of presence that can applied to 2D videos were then investigated in the literature. The main problem was to find out the components and processes of the playback software in VR and MR with company request features and delimitations in consideration, and possibly, how to adjust those components to induce a greater presence within primarily the 2D video, and secondary the VE, although the mediums of visual information indeed are related and thus influence each other. The thesis work took place at Advrty, a company developing a brand advertising platform for VR and MR. The exploration and testing of the components, was done through the increment of a creating a basic standalone 2D video player, then through a second increment by implementing a video player into VR and MR. Comparisons with the proof-of-concept video players in VR and MR as well as the standalone video player were made. The results of the study show a feasible way of making a video player for VR and MR. In the discussion of the work, open source libraries in a commercial software; the technical limitations of the current VR and MR Head-mounted Displays (HMD); relevant presence inducing attributes as well as the choice of method were reflected upon.
Medan 360 graders video under senare tid varit föremål för studier, så verkar inte traditionella rektangulära 2D videos i virtuella miljöer ha fått samma uppmärksamhet. Mer specifikt, 2D videouppspelning i Virtual Reality (VR) och Mixed Reality (MR) verkar sakna utforskning i egenskaper som upplösning, ljud och interaktion, som slutligen bidrar till ”presence” i videon och den virtuella miljön. Det här pappret reflekterar över definitionerna VR och MR, samtidigt som den utökar de kända koncepten ”immersion” och ”presence” för 2D video i virtuella miljöer. Relevanta attribut till ”presence” som kan appliceras på 2D video utreddes sedan med hjälp av litteraturen. Det huvudsakliga problemet var att ta reda på komponenterna och processerna i den mjukvara som skall spela upp video i VR och MR med företagsönskemål och avgränsningar i åtanke, och möjligen, hur man kan justera dessa komponenter för att utöka närvaron i framförallt 2D video och sekundärt den virtuella miljön, även om dessa medium är relaterade och kan påverka varandra. Examensarbetet tog plats på Advrty, ett företag som utvecklar en annonseringsplattform för VR och MR. Utveckling och framtagande av komponenterna, var gjorda genom inkrementell utveckling där en enklare 2D videospelare skapades, sedan genom en andra inkrementell fas där videospelaren implementerades i VR och MR. Jämförelser med proof-of-concept-videospelaren i VR och MR samt den enklare videospelaren gjordes. I diskussionen om arbetet, gjordes reflektioner på användningen av open source-bibliotek i en kommersiell applikation, de tekniska begränsningarna i nuvarande VR och MR Head-mounted displays, relevanta ”presence” inducerande attribut samt val av metod för utvecklingen av videospelaren.

APA, Harvard, Vancouver, ISO, and other styles

38

BALDHAGEN, FREDRIK, and ANTON HEDSTRÖM. "Chess Playing Robot : Robotic arm capable of playing chess." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279829.

Full text

Abstract:

The purpose of this thesis was to create a robot that through the use of visual recognition and robotics could play chess.The idea for this project came from the fact that there is an increasing demand for smart robots that can make their own decisions in a changing environment, and the fact that chess has recently seen a surge of new players. The optimal design of the arm making the moves was decided to be of SCARA type, which is a common robotic arm that excels in pick-and-place operations. The movement of the arm was driven by two stepper motors connected to a Raspberry Pi and an external power supply. Movement in the Z-direction was achieved through the use of a servo motor driving a gear rack vertically. A camera was placed above the chessboard, and through the use of numerous programs and functions, images were converted to chess notation which was then sent to a chess engine running on the Raspberry Pi. The visual recognition worked optimally when the chessboard was well and evenly lit. When lighting was poor, values that defined colors could be changed, allowing for proper evaluation of the colors, however when the illuminance dropped below 15 lux the blue pieces became indistinguishable from the black squares and therefore the visual recognition stopped working.
Syftet med det här examensarbetet var att skapa en robot som genom användning av bildigenkänning och robotik kunde spela schack. Idén till detta projekt kom från det faktum att det finns ett ökande behov av smarta robotar som kan fatta sina egna beslut i en förändring miljö och det faktum att schack nyligen har sett en ökning av nya spelare. Den optimala utformningen av armen som flyttar pjäserna beslutades vara av SCARA-typ, som är en vanlig robotarm som utmärker sig i ’pick-and-place’ operationer. Armens rörelse drivs av två stegmotorer anslutna till en Raspberry Pi och en extern strömkälla. Rörelse i Z-riktningen uppnåddes genom användning av en servomotor som drev en kuggstång vertikalt. En kamera placerades ovanför schackbrädet, och genom användning av flera program och funktioner konverterades bilder till schacknotation som sedan skickades till en schackmotor som körs på Raspberry Pi. Bildigenkänningen fungerade optimalt när schackbrädet var väl och jämnt upplyst. När belysningen var dålig kunde värden som definierade färger ändras för att möjliggöra korrekta utvärderingar av färgen, men när belysningsnivån sjönk under 15 lux blev de blå pjäserna oskiljbara från de svarta rutorna och programmet slutade därför att fungera

APA, Harvard, Vancouver, ISO, and other styles

39

Blåwiik, Per. "Fusing Stereo Measurements into a Global 3D Representation." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177586.

Full text

Abstract:

The report describes the thesis project with the aim of fusing an arbitrary sequence of stereo measurements into a global 3D representation in real-time. The proposed method involves an octree-based signed distance function for representing the 3D environment, where the geomtric data is fused together using a cumulative weighted update function, and finally rendered by incremental mesh extraction using the marching cubes algorithm. The result of the project was a prototype system, integrated into a real-time stereo reconstruction system, which was evaluated by benchmark tests as well as qualitative comparisons with an older method of overlapping meshes.

Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet

APA, Harvard, Vancouver, ISO, and other styles

40

Moujtahid, Salma. "Exploiting scene context for on-line object tracking in unconstrained environments." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEI110/document.

Full text

Abstract:

Avec le besoin grandissant pour des modèles d’analyse automatiques de vidéos, le suivi visuel d’objets est devenu une tache primordiale dans le domaine de la vision par ordinateur. Un algorithme de suivi dans un environnement non contraint fait face à de nombreuses difficultés: changements potentiels de la forme de l’objet, du fond, de la luminosité, du mouvement de la camera, et autres. Dans cette configuration, les méthodes classiques de soustraction de fond ne sont pas adaptées, on a besoin de méthodes de détection d’objet plus discriminantes. De plus, la nature de l’objet est a priori inconnue dans les méthodes de tracking génériques. Ainsi, les modèles d’apparence d’objets appris off-ligne ne peuvent être utilisés. L’évolution récente d’algorithmes d’apprentissage robustes a permis le développement de nouvelles méthodes de tracking qui apprennent l’apparence de l’objet de manière en ligne et s’adaptent aux variables contraintes en temps réel. Dans cette thèse, nous démarrons par l’observation que différents algorithmes de suivi ont différentes forces et faiblesses selon l’environnement et le contexte. Afin de surmonter les variables contraintes, nous démontrons que combiner plusieurs modalités et algorithmes peut améliorer considérablement la performance du suivi global dans les environnements non contraints. Plus concrètement, nous introduisant dans un premier temps un nouveau framework de sélection de trackers utilisant un critère de cohérence spatio-temporel. Dans ce framework, plusieurs trackers indépendants sont combinés de manière parallèle, chacun d’entre eux utilisant des features bas niveau basée sur différents aspects visuels complémentaires tel que la couleur, la texture. En sélectionnant de manière récurrente le tracker le plus adaptée à chaque trame, le système global peut switcher rapidement entre les différents tracker selon les changements dans la vidéo. Dans la seconde contribution de la thèse, le contexte de scène est utilisé dans le mécanisme de sélection de tracker. Nous avons conçu des features visuelles, extrait de l’image afin de caractériser les différentes conditions et variations de scène. Un classifieur (réseau de neurones) est appris grâce à ces features de scène dans le but de prédire à chaque instant le tracker qui performera le mieux sous les conditions de scènes données. Ce framework a été étendu et amélioré d’avantage en changeant les trackers individuels et optimisant l’apprentissage. Finalement, nous avons commencé à explorer une perspective intéressante où, au lieu d’utiliser des features conçu manuellement, nous avons utilisé un réseau de neurones convolutif dans le but d’apprendre automatiquement à extraire ces features de scène directement à partir de l’image d’entrée et prédire le tracker le plus adapté. Les méthodes proposées ont été évaluées sur plusieurs benchmarks publiques, et ont démontré que l’utilisation du contexte de scène améliore la performance globale du suivi d’objet
With the increasing need for automated video analysis, visual object tracking became an important task in computer vision. Object tracking is used in a wide range of applications such as surveillance, human-computer interaction, medical imaging or vehicle navigation. A tracking algorithm in unconstrained environments faces multiple challenges : potential changes in object shape and background, lighting, camera motion, and other adverse acquisition conditions. In this setting, classic methods of background subtraction are inadequate, and more discriminative methods of object detection are needed. Moreover, in generic tracking algorithms, the nature of the object is not known a priori. Thus, off-line learned appearance models for specific types of objects such as faces, or pedestrians can not be used. Further, the recent evolution of powerful machine learning techniques enabled the development of new tracking methods that learn the object appearance in an online manner and adapt to the varying constraints in real time, leading to very robust tracking algorithms that can operate in non-stationary environments to some extent. In this thesis, we start from the observation that different tracking algorithms have different strengths and weaknesses depending on the context. To overcome the varying challenges, we show that combining multiple modalities and tracking algorithms can considerably improve the overall tracking performance in unconstrained environments. More concretely, we first introduced a new tracker selection framework using a spatial and temporal coherence criterion. In this algorithm, multiple independent trackers are combined in a parallel manner, each of them using low-level features based on different complementary visual aspects like colour, texture and shape. By recurrently selecting the most suitable tracker, the overall system can switch rapidly between different tracking algorithms with specific appearance models depending on the changes in the video. In the second contribution, the scene context is introduced to the tracker selection. We designed effective visual features, extracted from the scene context to characterise the different image conditions and variations. At each point in time, a classifier is trained based on these features to predict the tracker that will perform best under the given scene conditions. We further improved this context-based framework and proposed an extended version, where the individual trackers are changed and the classifier training is optimised. Finally, we started exploring one interesting perspective that is the use of a Convolutional Neural Network to automatically learn to extract these scene features directly from the input image and predict the most suitable tracker

APA, Harvard, Vancouver, ISO, and other styles

41

Ringaby, Erik. "Optical Flow Computation on Compute Unified Device Architecture." Thesis, Linköping University, Department of Electrical Engineering, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15426.

Full text

Abstract:

There has been a rapid progress of the graphics processor the last years, much because of the demands from computer games on speed and image quality. Because of the graphics processor’s special architecture it is much faster at solving parallel problems than the normal processor. Due to its increasing programmability it is possible to use it for other tasks than it was originally designed for.

Even though graphics processors have been programmable for some time, it has been quite difficult to learn how to use them. CUDA enables the programmer to use C-code, with a few extensions, to program NVIDIA’s graphics processor and completely skip the traditional programming models. This thesis investigates if the graphics processor can be used for calculations without knowledge of how the hardware mechanisms work. An image processing algorithm calculating the optical flow has been implemented. The result shows that it is rather easy to implement programs using CUDA, but some knowledge of how the graphics processor works is required to achieve high performance.

APA, Harvard, Vancouver, ISO, and other styles

42

Haskins, Bertram Peter. "A feasibility study on the use of agent-based image recognition on a desktop computer for the purpose of quality control in a production environment." Thesis, [Bloemfontein?] : Central University of Technology, Free State, 2006. http://hdl.handle.net/11462/66.

Full text

Abstract:

Thesis (M. Tech.) - Central University of Technology, Free State, 2006
A multi-threaded, multi-agent image recognition software application called RecMaster has been developed specifically for the purpose of quality control in a production environment. This entails using the system as a monitor to identify invalid objects moving on a conveyor belt and to pass on the relevant information to an attached device, such as a robotic arm, which will remove the invalid object. The main purpose of developing this system was to prove that a desktop computer could run an image recognition system efficiently, without the need for high-end, high-cost, specialised computer hardware. The programme operates by assigning each agent a task in the recognition process and then waiting for resources to become available. Tasks related to edge detection, colour inversion, image binarisation and perimeter determination were assigned to individual agents. Each agent is loaded onto its own processing thread, with some of the agents delegating their subtasks to other processing threads. This enables the application to utilise the available system resources more efficiently. The application is very limited in its scope, as it requires a uniform image background as well as little to no variance in camera zoom levels and object to lens distance. This study focused solely on the development of the application software, and not on the setting up of the actual imaging hardware. The imaging device, on which the system was tested, was a web cam capable of a 640 x 480 resolution. As such, all image capture and processing was done on images with a horizontal resolution of 640 pixels and a vertical resolution of 480 pixels, so as not to distort image quality. The application locates objects on an image feed - which can be in the format of a still image, a video file or a camera feed - and compares these objects to a model of the object that was created previously. The coordinates of the object are calculated and translated into coordinates on the conveyor system. These coordinates are then passed on to an external recipient, such as a robotic arm, via a serial link. The system has been applied to the model of a DVD, and tested against a variety of similar and dissimilar objects to determine its accuracy. The tests were run on both an AMD- and Intel-based desktop computer system, with the results indicating that both systems are capable of efficiently running the application. On average, the AMD-based system tended to be 81% faster at matching objects in still images, and 100% faster at matching objects in moving images. The system made matches within an average time frame of 250 ms, making the process fast enough to be used on an actual conveyor system. On still images, the results showed an 87% success rate for the AMD-based system, and 73% for Intel. For moving images, however, both systems showed a 100% success rate.

APA, Harvard, Vancouver, ISO, and other styles

43

Tan, Jason, and Jake O'Donnell. "Hand gestures as a trigger for system initialisation." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20876.

Full text

Abstract:

Biometriska lösningar för åtkomstkontroll är ett blomstrande koncept. Precise Biometrics är ett företag som fokuserar på just biometriska lösningar relaterade till åtkomstkontroll. YOUNiQ är en produkt som fokuserar på ansiktsigenkänning. Denna produkt använder ansiktsigenkännig för att ge åtkomst till registrerade användare i systemet. Ett problem som uppstår vid att använda ansiktsigenkänning är att alla som befinner sig tillräckligt nära kameran blir skannade, även de som inte är registrerade. Denna avhandlingen har som mål att implementera ett avsiktsmedvetet system som använder en utlösare för att starta ett system. Istället för att använda ansiktsigenkänning på alla individer använder systemet gester som en utlösare för att starta systemet. Denna avhandlingen fokuserar inte på ansiktsigenkännning utan istället på utlösaren för att starta en process. Utvecklingsfasen sker i form utav en iterativ process för att skapa en prototyp. För att utvärdera systemet utfördes testfall för varje gest som är inkluderat i systemet. Efter testfallen var färdigställda sattes dem i ett verkligt scenario för att simulera en komplett interaktion med systemet. Utvärderingen användes sedan för att bestämma och vägleda för implementationen av ett avsiktsmedvetet system. Denna implementation kan ses som en signal till underliggande funktioner för att extrahera biometrisk data för till exempel ansiktsigenkänning.
Biometric solutions for access control is a thriving concept, Precise Biometrics is a company that focuses on just that. YOUNiQ is a product that focuses on facial identification for access control, with it comes an issue in where every person's face is being identified. This means identifying people that do not want to use the facial identification module. This thesis focuses on implementing an intent-aware system, a system which uses a trigger to begin a process. This thesis was done in collaboration with engineers at Precise Biometrics. Instead of identifying faces without permission the intent-aware system uses a trigger based on different hand gestures to begin the process. This thesis does not focus on face identification but instead the trigger before a specific process begins. The development phase consisted of an iterative process in creating the prototype system. In order to evaluate the system, test cases were done to verify accuracy of each hand gesture. Thereafter, a scenario was created to simulate an activation of the prototype system. The evaluation was used to determine the convenience and guidance when implementing intent-aware systems. Furthermore, the system can be seen as a form of trigger to allow for extracting biometric data in for example face identification.

APA, Harvard, Vancouver, ISO, and other styles

44

Husseini, Orabi Ahmed. "Multi-Modal Technology for User Interface Analysis including Mental State Detection and Eye Tracking Analysis." Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36451.

Full text

Abstract:

We present a set of easy-to-use methods and tools to analyze human attention, behaviour, and physiological responses. A potential application of our work is evaluating user interfaces being used in a natural manner. Our approach is designed to be scalable and to work remotely on regular personal computers using expensive and noninvasive equipment. The data sources our tool processes are nonintrusive, and captured from video; i.e. eye tracking, and facial expressions. For video data retrieval, we use a basic webcam. We investigate combinations of observation modalities to detect and extract affective and mental states. Our tool provides a pipeline-based approach that 1) collects observational, data 2) incorporates and synchronizes the signal modality mentioned above, 3) detects users' affective and mental state, 4) records user interaction with applications and pinpoints the parts of the screen users are looking at, 5) analyzes and visualizes results. We describe the design, implementation, and validation of a novel multimodal signal fusion engine, Deep Temporal Credence Network (DTCN). The engine uses Deep Neural Networks to provide 1) a generative and probabilistic inference model, and 2) to handle multimodal data such that its performance does not degrade due to the absence of some modalities. We report on the recognition accuracy of basic emotions for each modality. Then, we evaluate our engine in terms of effectiveness of recognizing basic six emotions and six mental states, which are agreeing, concentrating, disagreeing, interested, thinking, and unsure. Our principal contributions include the implementation of a 1) multimodal signal fusion engine, 2) real time recognition of affective and primary mental states from nonintrusive and inexpensive modality, 3) novel mental state-based visualization techniques, 3D heatmaps, 3D scanpaths, and widget heatmaps that find parts of the user interface where users are perhaps unsure, annoyed, frustrated, or satisfied.

APA, Harvard, Vancouver, ISO, and other styles

45

Forssén, Per-Erik. "Detection of Man-made Objects in Satellite Images." Thesis, Linköping University, Linköping University, Computer Vision, 1997. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54356.

Full text

Abstract:

In this report, the principles of man-made object detection in satellite images is investigated. An overview of terminology and of how the detection problem is usually solved today is given. A three level system to solve the detection problem is proposed. The main branches of this system handle road, and city detection respectively. To achieve data source flexibility, the Logical Sensor notion is used to model the low level system components. Three Logical Sensors have been implemented and tested on Landsat TM and SPOT XS scenes. These are: BDT (Background Discriminant Transformation) to construct a man-made object property field; Local-orientation for texture estimation and road tracking; Texture estimation using local variance and variance of local orientation. A gradient magnitude measure for road seed generation has also been tested.

APA, Harvard, Vancouver, ISO, and other styles

46

Casserfelt, Karl. "A Deep Learning Approach to Video Processing for Scene Recognition in Smart Office Environments." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20429.

Full text

Abstract:

The field of computer vision, where the goal is to allow computer systems to interpret and understand image data, has in recent years seen great advances with the emergence of deep learning. Deep learning, a technique that emulates the information processing of the human brain, has been shown to almost solve the problem of object recognition in image data. One of the next big challenges in computer vision is to allow computers to not only recognize objects, but also activities. This study is an exploration of the capabilities of deep learning for the specific problem area of activity recognition in office environments. The study used a re-labeled subset of the AMI Meeting Corpus video data set to comparatively evaluate different neural network models performance in the given problem area, and then evaluated the best performing model on a new novel data set of office activities captured in a research lab in Malmö University. The results showed that the best performing model was a 3D convolutional neural network (3DCNN) with temporal information in the third dimension, however a recurrent convolutional network (RCNN) using a pre-trained VGG16 model to extract features and put into a recurrent neural network with a unidirectional Long-Short-Term-Memory (LSTM) layer performed almost as well with the right configuration. An analysis of the results suggests that a 3DCNN's performance is dependent on the camera angle, specifically how well movement is spatially distributed between people in frame.

APA, Harvard, Vancouver, ISO, and other styles

47

Andersson, Anna, and Klara Eklund. "A Study of Oriented Mottle in Halftone Print." Thesis, Linköping University, Department of Science and Technology, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-9233.

Full text

Abstract:

Coated solid bleached board belongs to the top-segment of paperboards. One important property of paperboard is the printability. In this diploma work a specific print defect, oriented mottle, has been studied in association with Iggesund Paperboard. The objectives of the work were to develop a method for analysis of the dark and light areas of oriented mottle, to analyse these areas, and to clarify the effect from the print, coating and paperboard surface related factors. This would clarify the origin of oriented mottle and predict oriented mottle on unprinted paperboard. The objectives were fulfilled by analysing the areas between the dark halftone dots, the amount of coating and the ink penetration, the micro roughness and the topography. The analysis of the areas between the dark halftone dots was performed on several samples and the results were compared regarding different properties. The other methods were only applied on a limited selection of samples. The results from the study showed that the intensity differences between the dark halftone dots were enhanced in the dark areas, the coating amount was lower in the dark areas and the ink did not penetrate into the paperboard. The other results showed that areas with high transmission corresponded to dark areas, smoother micro roughness, lower coating amount and high topography. A combination of the information from these properties might be used to predict oriented mottle. The oriented mottle is probably an optical phenomenon in half tone prints, and originates from variations in the coating and other paperboard properties.

APA, Harvard, Vancouver, ISO, and other styles

48

Pistori, Hemerson. "Tecnologia adaptativa em engenharia de computação: estado da arte e aplicações." Universidade de São Paulo, 2003. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-02032004-145107/.

Full text

Abstract:

Neste trabalho é apresentado um conjunto de contribuições teóricas e práticas que buscam solidificar alguns conceitos da teoria dos dispositivos adaptativos baseados em regras, enfatizando a sua alta aplicabilidade. Uma ferramenta de apoio ao desenvolvimento de autômatos adaptativos, incluindo recursos de animação gráfica, foi desenvolvida de acordo com uma nova proposta de formalização que deverá complementar e simplificar a proposta original. A principal complementação está relacionada com a interpretação e a implementação de funções adaptativas, em sua forma mais geral: com ações elementares de consulta podendo retornar resultados múltiplos. A nossa proposta de formalização, que inclui um algoritmo para a execução de funções adaptativas, é uma ferramenta importante na determinação do impacto da execução da camada adaptativa no cálculo de complexidade geral de um autômato adaptativo. A tese apresenta também uma técnica para a integração de dispositivos adaptativos, basicamente discretos, com mecanismos capazes de manipular informação não-discreta. É mostrado também como estes resultados teóricos e as ferramentas desenvolvidas podem ser aplicadas na solução de problemas nas áreas de aprendizagem computacional, construção de compiladores, interface homem-máquina, visão computacional e diagnóstico médico.
This work presents a practical and theoretical assembly of contributions that consolidates some concepts from the rule-driven adaptive devices theory, emphasizing their high applicability. A supporting tool for the development of adaptive automata, which includes graphical animation resources, has been implemented, in agreement with our proposal of formalization. This proposal aims to complement and simplify the original proposal by including an in-depth analysis and formalization of adaptive functions implementation, in their most general form: with elementary query actions being able to return multiple results. The new formalization of adaptive functions, which includes an algorithm for adaptive function execution, is an important tool for determining the impact of an adaptive layer on the complexity analysis of general adaptive automata. The thesis also presents a new technique for the integration of adaptive automata with mechanisms for the manipulation of continuous values. Finally, the application of these theoretical results and the tools developed, to the solution of problems in the area of machine learning, compiler construction, man-machine interface, computational vision and medical diagnosis, is demonstrated.

APA, Harvard, Vancouver, ISO, and other styles

49

Asif, Muhammad Salman. "Dynamic compressive sensing: sparse recovery algorithms for streaming signals and video." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/49106.

Full text

Abstract:

This thesis presents compressive sensing algorithms that utilize system dynamics in the sparse signal recovery process. These dynamics may arise due to a time-varying signal, streaming measurements, or an adaptive signal transform. Compressive sensing theory has shown that under certain conditions, a sparse signal can be recovered from a small number of linear, incoherent measurements. The recovery algorithms, however, for the most part are static: they focus on finding the solution for a fixed set of measurements, assuming a fixed (sparse) structure of the signal. In this thesis, we present a suite of sparse recovery algorithms that cater to various dynamical settings. The main contributions of this research can be classified into the following two categories: 1) Efficient algorithms for fast updating of L1-norm minimization problems in dynamical settings. 2) Efficient modeling of the signal dynamics to improve the reconstruction quality; in particular, we use inter-frame motion in videos to improve their reconstruction from compressed measurements. Dynamic L1 updating: We present homotopy-based algorithms for quickly updating the solution for various L1 problems whenever the system changes slightly. Our objective is to avoid solving an L1-norm minimization program from scratch; instead, we use information from an already solved L1 problem to quickly update the solution for a modified system. Our proposed updating schemes can incorporate time-varying signals, streaming measurements, iterative reweighting, and data-adaptive transforms. Classical signal processing methods, such as recursive least squares and the Kalman filters provide solutions for similar problems in the least squares framework, where each solution update requires a simple low-rank update. We use homotopy continuation for updating L1 problems, which requires a series of rank-one updates along the so-called homotopy path. Dynamic models in video: We present a compressive-sensing based framework for the recovery of a video sequence from incomplete, non-adaptive measurements. We use a linear dynamical system to describe the measurements and the temporal variations of the video sequence, where adjacent images are related to each other via inter-frame motion. Our goal is to recover a quality video sequence from the available set of compressed measurements, for which we exploit the spatial structure using sparse representations of individual images in a spatial transform and the temporal structure, exhibited by dependencies among neighboring images, using inter-frame motion. We discuss two problems in this work: low-complexity video compression and accelerated dynamic MRI. Even though the processes for recording compressed measurements are quite different in these two problems, the procedure for reconstructing the videos is very similar.

APA, Harvard, Vancouver, ISO, and other styles

50

MELLO, SIMON. "VATS : Voice-Activated Targeting System." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279837.

Full text

Abstract:

Machine learning implementations in computer vision and speech recognition are wide and growing; both low- and high-level applications being required. This paper takes a look at the former and if basic implementations are good enough for real-world applications. To demonstrate this, a simple artificial neural network coded in Python and already existing libraries for Python are used to control a laser pointer via a servomotor and an Arduino, to create a voice-activated targeting system. The neural network trained on MNIST data consistently achieves an accuracy of 0.95 ± 0.01 when classifying MNIST test data, but also classifies captured images correctly if noise-levels are low. This also applies to the speech recognition, rarely giving wrong readings. The final prototype achieves success in all domains except turning the correctly classified images into targets that the Arduino can read and aim at, failing to merge the computer vision and speech recognition.
Maskininlärning är viktigt inom röstigenkänning och datorseende, för både små såväl som stora applikationer. Syftet med det här projektet är att titta på om enkla implementationer av maskininlärning duger för den verkligen världen. Ett enkelt artificiellt neuronnät kodat i Python, samt existerande programbibliotek för Python, används för att kontrollera en laserpekare via en servomotor och en Arduino, för att skapa ett röstaktiverat identifieringssystem. Neuronnätet tränat på MNIST data når en precision på 0.95 ± 0.01 när den försöker klassificera MNIST test data, men lyckas även klassificera inspelade bilder korrekt om störningen är låg. Detta gäller även för röstigenkänningen, då den sällan ger fel avläsningar. Den slutliga prototypen lyckas i alla domäner förutom att förvandla bilder som klassificerats korrekt till mål som Arduinon kan läsa av och sikta på, vilket betyder att prototypen inte lyckas sammanfoga röstigenkänningen och datorseendet.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Computer vision technology'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles