Dissertations / Theses: 'OCR,Computer Vision'

1

Poli, Flavio. "Robust string text detection for industrial OCR." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/12885/.

Full text

Abstract:

Tesi che propone un algoritmo per il ritrovamento di linee di testo per OCR industriali. Tramite un aproccio ad albero e sfruttando la conoscenza sulla stringa da cercare, vengono esplorate più soluzioni fino a trovare quella più promettente. Fornisce in uscita anche una stima su quanto l'algoritmo è confidente sul risultato.

APA, Harvard, Vancouver, ISO, and other styles

2

Belgiovine, Mauro. "Advanced industrial OCR using Autoencoders." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/13807/.

Full text

Abstract:

Il contenuto di questa tesi di laurea descrive il lavoro svolto durante un tirocinio di sei mesi presso Datalogic ADC. L'obiettivo del lavoro è stato quello di utilizzare uno specifico tipo di rete neurale, chiamata Autoencoder, per scopi legati al riconoscimento o alla convalida di caratteri in un sistema OCR industriale. In primo luogo è stato creato un classificatore di immagini di caratteri basato su Denoising Autoencoder; successivamente, è stato studiato un metodo per utilizzare l'Autoencoder come un classificatore di secondo livello, per meglio distinguere le false attivazioni da quelle corrette in condizioni di incertezza di un classificatore generico. Entrambe le architetture sono state valutate su dataset reali di clienti di Datalogic e i risultati sperimentali ottenuti sono presentati in questa tesi.

APA, Harvard, Vancouver, ISO, and other styles

3

Serafini, Sara. "Machine Learning applied to OCR tasks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text

Abstract:

The content of this thesis describes the work done during a six-month internship at Datalogic, in its research laboratories in Pasadena (CA). The aim of my research was to implement and evaluate a classifier as part of an industrial OCR system for learning purposes and to see how well it could work in comparison to current best Datalogic products, since it might be simpler/faster, it might be a good alternative for implementing on an embedded system (where current Datalogic products may not be able to run fast enough).

APA, Harvard, Vancouver, ISO, and other styles

4

Corsi, Giacomo. "Fast Neural Network Technique for Industrial OCR." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/15258/.

Full text

Abstract:

The content of my thesis describes the work done during my internship at Datalogic in Pasadena. This project improves the performance of the Optical Character Recognition (OCR) solution with use of Deep Learning (DL) techniques. It enhances the character detection process that had been previously developed and relies on template matching done on the Histogram of Gradients (HOG) features. This approach had been already validated with good performance, but detects only those characters which do not vary in the dataset. First, this document gives a introduction to OCR and DL topics, then describes the pipeline of the Datalogic OCR product. After that, it is explained the technique that was usedto raise the accuracy of the previous solution. It consists in applying DL to improve the robustness and keep good detection rate even though the character variations (scale and rotation) are considerable. The first phase was focused on speeding up the process and so the function used for gauging the matching with the templates, the Zero-mean Normalized Cross-Correlation, was replaced while a modified version, called Squared Normalization has been introduced. Secondly, the original system was cast as a Convolutional Neural Network (CNN) by turning the HOG templates into convolutional kernels. It was necessary to rethink its training process as it was noticed that, using standard target values, there was no gain. A novel way of computing the targets, named Graceful Improvement, has been developed. Then, the analysis on the results of this new solution showed that, even ifit detects characters that present variations with original templates, the false positive rate around the image was also higher. To decrease this negative side effect, a fast ROI (Region Of Interest) filter acting on the detections has been realized. Finally, during the above development steps, performances in terms of accuracy and time have been evaluated on some real Datalogic's customer datasets.

APA, Harvard, Vancouver, ISO, and other styles

5

Johansson, Elias. "Separation and Extraction of Valuable Information From Digital Receipts Using Google Cloud Vision OCR." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-88602.

Full text

Abstract:

Automatization is a desirable feature in many business areas. Manually extracting information from a physical object such as a receipt is something that can be automated to save resources for a company or a private person. In this paper the process will be described of combining an already existing OCR engine with a developed python script to achieve data extraction of valuable information from a digital image of a receipt. Values such as VAT, VAT%, date, total-, gross-, and net-cost; will be considered as valuable information. This is a feature that has already been implemented in existing applications. However, the company that I have done this project for are interested in creating their own version. This project is an experiment to see if it is possible to implement such an application using restricted resources. To develop a program that can extract the information mentioned above. In this paper you will be guided though the process of the development of the program. As well as indulging in the mindset, findings and the steps taken to overcome the problems encountered along the way. The program achieved a success rate of 86.6% in extracting the most valuable information: total cost, VAT% and date from a set of 53 receipts originated from 34 separate establishments.

APA, Harvard, Vancouver, ISO, and other styles

6

Grönlund, Jakob, and Angelina Johansson. "Defect Detection and OCR on Steel." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157508.

Full text

Abstract:

In large scale productions of metal sheets, it is important to maintain an effective way to continuously inspect the products passing through the production line. The inspection mainly consists of detection of defects and tracking of ID numbers. This thesis investigates the possibilities to create an automatic inspection system by evaluating different machine learning algorithms for defect detection and optical character recognition (OCR) on metal sheet data. Digit recognition and defect detection are solved separately, where the former compares the object detection algorithm Faster R-CNN and the classical machine learning algorithm NCGF, and the latter is based on unsupervised learning using a convolutional autoencoder (CAE). The advantage of the feature extraction method is that it only needs a couple of samples to be able to classify new digits, which is desirable in this case due to the lack of training data. Faster R-CNN, on the other hand, needs much more training data to solve the same problem. NCGF does however fail to classify noisy images and images of metal sheets containing an alloy, while Faster R-CNN seems to be a more promising solution with a final mean average precision of 98.59%. The CAE approach for defect detection showed promising result. The algorithm learned how to only reconstruct images without defects, resulting in reconstruction errors whenever a defect appears. The errors are initially classified using a basic thresholding approach, resulting in a 98.9% accuracy. However, this classifier requires supervised learning, which is why the clustering algorithm Gaussian mixture model (GMM) is investigated as well. The result shows that it should be possible to use GMM, but that it requires a lot of GPU resources to use it in an end-to-end solution with a CAE.

APA, Harvard, Vancouver, ISO, and other styles

7

Dürebrandt, Jesper. "Segmentation and Beautification of Handwriting using Mobile Devices." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-251948.

Full text

Abstract:

Converting handwritten or machine printed documents into a computer readable format allows more efficient storage and processing. The recognition of machine printed text is very reliable with today's technology, but the recognition of offline handwriting still remains a problem to the research community due to the high variance in handwriting styles. Modern mobile devices are capable of performing complex tasks such as scanning invoices, reading traffic signs, and online handwriting recognition, but there are only a few applications that treat offline handwriting. This thesis investigates the segmentation of handwritten documents into text lines and words, how the legibility of handwriting can be increased by beautification, as well as implementing it for modern mobile devices. Text line and word segmentation are crucial steps towards implementing a complete handwriting recognition system. The results of this thesis show that text line and word segmentation along with handwriting beautification can be implemented successfully for modern mobile devices and a survey concluding that the writing on processed documents is more legible than their unprocessed counterparts. An application for the operating system iOS is developed for demonstration.

APA, Harvard, Vancouver, ISO, and other styles

8

Paul, Priya. "Automated test development for vehicle instrument panel cluster using Hardware-in-the-loop (HIL) and Computer Vision." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

Automation of validation tests can significantly save time and improve accuracy. This thesis presents a method to automate the tests done on the Instrument Panel Cluster (IPC) of a car by using Hardware-in-the-Loop (HIL) and Optical Character Recognition (OCR). HIL technique helps to test the system with real signals and the system with the camera captures pictures to do OCR analysis for the extraction of messages displayed in the IPC. The developed OCR feature is added to the existing automation tool of the FCA group and the tests conducted in the internal test benches of Maserati. OCR technique is widely used in the automotive sector for validation testing of the IPC. In this thesis, the development is done by first performing a sequence of image processing on the captured image of the IPC and then feeding it to the OCR engine with the required language. The result showed the system to work efficiently as it extracted the messages from the captured images with confidence values close to 90 percent.The testing was done in different languages and low confidence values were found only for some languages with complex letters. After the developed OCR feature was integrated to the internal automation tool, tests were carried out both in the functional test bench and the integration test bench. A test case was defined based on a specific vehicle function and the final pass or fail report generated automatically.

APA, Harvard, Vancouver, ISO, and other styles

9

Zhu, Yuehan. "Automated Supply-Chain Quality Inspection Using Image Analysis and Machine Learning." Thesis, Högskolan Kristianstad, Fakulteten för naturvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hkr:diva-20069.

Full text

Abstract:

An image processing method for automatic quality assurance of Ericsson products is developed. The method consists of taking an image of the product, extract the product labels from the image, OCR the product numbers and make a database lookup to match the mounted product with the customer specification. The engineering innovation of the method developed in this report is that the OCR is performed using machine learning techniques. It is shown that machine learning can produce results that are on par or better than baseline OCR methods. The advantage with a machine learning based approach is that the associated neural network can be trained for the specific input images from the Ericsson factory. Imperfections in the image quality and varying type fonts etc. can be handled by properly training the net, a task that would have been very difficult with legacy OCR algorithms where poor OCR results typically need to be mitigated by improving the input image quality rather than changing the algorithm.

APA, Harvard, Vancouver, ISO, and other styles

10

Lamberti, Lorenzo. "A deep learning solution for industrial OCR applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19777/.

Full text

Abstract:

This thesis describes a project developed throughout a six months internship in the Machine Vision Laboratory of Datalogic based in Pasadena, California. The project aims to develop a deep learning system as a possible solution for industrial optical character recognition applications. In particular, the focus falls on a specific algorithm called You Only Look Once (YOLO), which is a general-purpose object detector based on convolutional neural networks that currently offers state-of-the-art performances in terms of trade-off between speed and accuracy. This algorithm is indeed well known for reaching impressive processing speeds, but its intrinsic structure makes it struggle in detecting small objects clustered together, which unfortunately matches our scenario: we are trying to read alphanumerical codes by detecting each single character and then reconstructing the final string. The final goal of this thesis is to overcome this drawback and push the accuracy performances of a general object detector convolutional neural network to its limits, in order to meet the demanding requirements of industrial OCR applications. To accomplish this, first YOLO's unique detecting approach was mastered in its original framework called Darknet, written in C and CUDA, then all the code was translated into Python programming language for a better flexibility, which also allowed the deployment of a custom architecture. Four different datasets with increasing complexity were used as case-studies and the final performances reached were surprising: the accuracy varies between 99.75\% and 99.97\% with a processing time of 15 ms for images $1000\times1000$ big, largely outperforming in speed the current deep learning solution deployed by Datalogic. On the downsides, the training phase usually requires a very large amount of data and time and YOLO also showed some memorization behaviours if not enough variability is given at training time.

APA, Harvard, Vancouver, ISO, and other styles

11

Krčmář, Martin. "Zpracování obrazu v zařízení Android - detekce a rozpoznání vizitky." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-240930.

Full text

Abstract:

The aim of this Master´s thesis is designing and developing Android application, which will be used for automatic recognition of business cards and import the contact information. The first part describes the history, architecture and development tools of operating system Android. The second part is an analysis of selected computer vision methods that were used during developing application. Libraries OpenCV and Tessaract OCR are described in this part. The main part describes the development of the application with conditions and limitations for the proper function of the application. The final part is an evaluation of the success and recognition of importing contact information from business cards.

APA, Harvard, Vancouver, ISO, and other styles

12

Zuffa, Flavio. "Data and ground-truth generation for industrial deep learning applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text

Abstract:

Questa tesi espone il lavoro che ho svolto durante la mia permanenza presso Datalogic USA Inc. che si trova ad Eugene(OR), USA. I sistemi OCR industriali si sono affidati per molti anni ai metodi classici della computer vision. I laboratori Datalogic stanno lavorando su sistemi basati sul deep learning per la creazione di sistemi di nuova generazione. I metodi che fanno uso di deep learning necessitano di una grande quantità di dati d'alta qualità per poter effettuare il processo di learning in modo efficace. Durante la mia permanenza ho lavorato sulla creazione di un sistema per la produzione di ground truth data, che verrà usata per il processo di deep learning dei sistemi di OCR. Si è lavorato su diversi tipi di hardware e in particolare su: Stampante ink jet industriale, tape reel, Arduino e fotocamera. L'hardware è stato modificato alla nostra bisogna; per esempio si è modificato il tape reel per automatizzare il cambiamento di velocità. Tutto l'hardware è stato interconnesso grazie ad un PC e l'Arduino. Allo scopo di produrre, gestire e stampare i messaggi che vengono stampati dalla stampante è stato creato del software; questo ci ha permesso di sfruttare la stampante al limite delle sue capacità. Necessitiamo di tantissimi messaggi diversi, mentre la stampante è fatta per stampare grandi quantità di messaggi uguali. Sono stati sviluppati due algoritmi indipendenti per la creazione della ground truth. L'algoritmo Top Down, inizia localizzando il messaggio all'interno dell'immagine catturata, poi i caratteri ed infine i punti che formano i caratteri. L'algoritmo Bottom Up, parte dalla localizzazione dei punti per poi ricostruire i caratteri ed infine l'intero messaggio. Entrambi gli algoritmi hanno dimostrato di essere validi, c'è la possibilità che una combinazione di entrambi possa produrre un algoritmo ancora migliore. Datalogic sta continuando nello sviluppo di questo progetto, migliorandolo e usandolo per la produzione di ground truth data.

APA, Harvard, Vancouver, ISO, and other styles

13

Bilda, Sebastian. "Optische Methoden zur Positionsbestimmung auf Basis von Landmarken." Master's thesis, Universitätsbibliothek Chemnitz, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-226934.

Full text

Abstract:

Die Innenraumpositionierung kommt in der heutigen Zeit immer mehr Aufmerksamkeit zu teil. Neben der Navigation durch das Gebäude sind vor allem Location Based Services von Bedeutung, welche Zusatzinformationen zu spezifischen Objekten zur Verfügung stellen Da für eine Innenraumortung das GPS Signal jedoch zu schwach ist, müssen andere Techniken zur Lokalisierung gefunden werden. Neben der häufig verwendeten Positionierung durch Auswertung von empfangenen Funkwellen existieren Methoden zur optischen Lokalisierung mittels Landmarken. Das kamerabasierte Verfahren bietet den Vorteil, dass eine oft zentimetergenaue Positionierung möglich ist. In dieser Masterarbeit erfolgt die Bestimmung der Position im Gebäude mittels Detektion von ArUco-Markern und Türschildern aus Bilddaten. Als Evaluationsgeräte sind zum einen die Kinect v2 von Microsoft, als auch das Lenovo Phab 2 Pro Smartphone verwendet worden. Neben den Bilddaten stellen diese auch mittels Time of Flight Sensoren generierte Tiefendaten zur Verfügung. Durch den Vergleich von aus dem Bild extrahierten Eckpunkten der Landmarke, mit den aus einer Datenbank entnommenen realen geometrischen Maßen des Objektes, kann die Entfernung zu einer gefundenen Landmarke bestimmt werden. Neben der optischen Distanzermittlung wird die Position zusätzlich anhand der Tiefendaten ermittelt. Abschließend werden beiden Verfahren miteinander verglichen und eine Aussage bezüglich der Genauigkeit und Zuverlässigkeit des in dieser Arbeit entwickelten Algorithmus getroffen
Indoor Positioning is receiving more and more attention nowadays. Beside the navigation through a building, Location Bases Services offer the possibility to get more information about certain objects in the enviroment. Because GPS signals are too weak to penetrate buildings, other techniques for localization must be found. Beneath the commonly used positioning via the evaluation of received radio signals, optical methods for localization with the help of landmarks can be used. These camera-based procedures have the advantage, that an inch-perfect positioning is possible. In this master thesis, the determination of the position in a building is chieved through the detection of ArUco-Marker and door signs in images gathered by a camera. The evaluation is done with the Microsoft Kinect v2 and the Lenovo Phab 2 Pro Smartphone. They offer depth data gained by a time of flight sensor beside the color images. The range to a detected landmark is calculated by comparing the object´s corners in the image with the real metrics, extracted from a database. Additionally, the distance is determined by the evaluation of the depth data. Finally, both procedures are compared with each other and a statement about the accuracy and responsibility is made

APA, Harvard, Vancouver, ISO, and other styles

14

Albertazzi, Riccardo. "A study on the application of generative adversarial networks to industrial OCR." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text

Abstract:

High performance and nearly perfect accuracy are the standards required by OCR algorithms for industrial applications. In the last years research on Deep Learning has proven that Convolutional Neural Networks (CNNs) are a very powerful and robust tool for image analysis and classification; when applied to OCR tasks, CNNs are able to perform much better than previously adopted techniques and reach easily 99% accuracy. However, Deep Learning models' effectiveness relies on the quality of the data used to train them; this can become a problem since OCR tools can run for months without interruption, and during this period unpredictable variations (printer errors, background modifications, light conditions) could affect the accuracy of the trained system. We cannot expect that the final user who trains the tool will take thousands of training pictures under different conditions until all imaginable variations have been captured; we then have to be able to generate these variations programmatically. Generative Adversarial Networks (GANs) are a recent breakthrough in machine learning; these networks are able to learn the distribution of the input data and therefore generate realistic samples belonging to that distribution. This thesis' objective is learning how GANs work in detail and perform experiments on generative models that allow to create unseen variations of OCR training characters, thus allowing the whole OCR system to be more robust to future character variations.

APA, Harvard, Vancouver, ISO, and other styles

15

Zemčík, Tomáš. "Detekce a čtení UIC kódů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400637.

Full text

Abstract:

Machine detection and reading of UIC identification codes on railway rolling stock allows for automation of some processes on the railway and makes running of the railway safer and more efficient. This thesis provides insight into the problem of machine text detection and reading. It further proposes and implements a solution to the problem of reading UIC codes in line camera scanned images.

APA, Harvard, Vancouver, ISO, and other styles

16

Wertheim, Michal. "Zpracování obrazu v systému Android - odečet hodnoty plynoměru." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-221062.

Full text

Abstract:

This thesis describes the design of the image processing for Android system, consisting of the choice of the development environment and its implementation. Workflow solution to the problem involves development of the Androidapplication and it’s graphical user interface. The text includes description of the application functionality, communicationwith a camera, storing and retrieving data. It also describes used algorithms and image processing methods used for detecting values from the counter of the gas meter.

APA, Harvard, Vancouver, ISO, and other styles

17

Dolci, Beatrice. "Development of a Deep Learning system for Optical Character Recognition." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text

Abstract:

This thesis is a documentation of the work resulting of an internship in Datalogic USA Inc. in Pasadena, California, of the duration of six months. The developed project had the purpose of designing a system for detecting text lines within high resolution images in an industrial framework, making use of Deep Learning and convolutional neural networks, focusing on the use of a general objects detection system in the context of optical character recognition. The chosen general purpose object detector was YOLO, currently providing state-of-the-art performances in terms of trade-off between speed and accuracy. The goal of the thesis work was to configure and specialize a general object detection convolutional neural network in such a way to optimize its performances for the purpose of optical character recognition. After laying down the theoretical bases, the specific object detection system (YOLO) was mastered, from the architecture of the network, to the structure of output and loss function. The same neural network framework as for the original implementation of YOLO was used, called Darknet. Darknet consists of a system for building, training and testing neural networks written in C, CUDA and featuring OpenCV libraries. Part of the thesis work consisted in gaining deep knowledge of the code and enhancing it with additional features. New solutions were proposed to maximize accuracy on the given datasets and solve technology-related problems that were impairing performances in some instances. It resulted that YOLO is impressively fast, providing a very large speedup with respect to the current OCR solution used by Datalogic. It is very accurate as long as its training set features enough variability. On the other hand, it struggles at generalizing on unknown patterns.

APA, Harvard, Vancouver, ISO, and other styles

18

Di, Luzio Andrea. "Reti Neurali Convoluzionali per il riconoscimento di caratteri." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016.

Find full text

Abstract:

Il problema del riconoscimento ottico di caratteri all’interno delle immagini è ormai studiato da decenni e, nel corso degli anni, sono stati proposti molti algoritmi che mirano a risolvere il problema a vari livelli di “generalità” (cioè con la presenza o meno di vincoli relativi alle caratteristiche dell’immagine da analizzare). Tuttavia, ad oggi, molti tra i software (che si occupano del riconoscimento di caratteri) sviluppati per il robot Nao di Aldebaran Robotics si appoggiano a librerie software di Optical Character Recognition (OCR) già pronte all'uso come, ad esempio, Tesseract. In questa tesi verrà illustrato un approccio alternativo al problema, che mostra come sia stato possibile creare da zero un software che non utilizza librerie di OCR preesistenti ma che, combinando l'utilizzo di alcune Reti Neurali Convoluzionali (addestrate ad hoc) con alcune funzioni basilari per la manipolazione delle immagini di una libreria di Computer Vision largamente utilizzata (quale è OpenCV), è comunque capace di garantire buoni risultati in termini di accuratezza per quanto riguarda la localizzazione del testo all'interno dell'immagine, la successiva segmentazione dello stesso nelle sue componenti più elementari (cioè righe di testo, parole e caratteri), il riconoscimento dei singoli caratteri e, infine, la riaggregazione delle lettere riconosciute in parole.

APA, Harvard, Vancouver, ISO, and other styles

19

Johansson, Björn. "Multiscale Curvature Detection in Computer Vision." Licentiate thesis, Linköping University, Linköping University, Computer Vision, 2001. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54966.

Full text

Abstract:

This thesis presents a new method for detection of complex curvatures such as corners, circles, and star patterns. The method is based on a second degree local polynomial model applied to a local orientation description in double angle representation. The theory of rotational symmetries is used to compute curvature responses from the parameters of the polynomial model. The responses are made more selective using a scheme of inhibition between different symmetry models. These symmetries can serve as feature points at a high abstraction level for use in hierarchical matching structures for 3D estimation, object recognition, image database search, etc.

A very efficient approximative algorithm for single and multiscale polynomial expansion is developed, which is used for detection of the complex curvatures in one or several scales. The algorithm is based on the simple observation that polynomial functions multiplied with a Gaussian function can be described in terms of partial derivatives of the Gaussian. The approximative polynomial expansion algorithm is evaluated in an experiment to estimate local orientation on 3D data, and the performance is comparable to previously tested algorithms which are more computationally expensive.

The curvature algorithm is demonstrated on natural images and in an object recognition experiment. Phase histograms based on the curvature features are developed and shown to be useful as an alternative compact image representation.

The importance of curvature is furthermore motivated by reviewing examples from biological and perceptual studies. The usefulness of local orientation information to detect curvature is also motivated by an experiment about learning a corner detector.

APA, Harvard, Vancouver, ISO, and other styles

20

Moe, Anders. "Passive Aircraft Altitude Estimation using Computer Vision." Licentiate thesis, Linköping University, Linköping University, Computer Vision, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-53415.

Full text

Abstract:

This thesis presents a number of methods to estimate 3D structures with a single translating camera. The camera is assumed to be calibrated and to have a known translation and rotation.

Applications for aircraft altitude estimation and ground structure estimation ahead of the aircraft are discussed. The idea is to mount a camera on the aircraft and use the motion estimates obtained in the inertia navigation system. One reason for this arrangement is to make the aircraft more passive, in comparison to conventional radar based altitude estimation.

Two groups of methods are considered, optical flow based and region tracking based. Both groups have advantages and drawbacks.

Two methods to estimate the optical flow are presented. The accuracy of the estimated ground structure is increased by varying the temporal distance between the frames used in the optical flow estimation algorithms.

Four region tracking algorithms are presented. Two of them use canonical correlation and the other two are based on sum of squared difference and complex correlation respectively.

The depth estimates are then temporally filtered using weighted least squares or a Kalman filter.

A simple estimation of the computational complexity and memory requirements for the algorithms is presented to aid estimation of the hardware requirements.

Tests on real flight sequences are performed, showing that the aircraft altitude can be estimated with a good accuracy.

APA, Harvard, Vancouver, ISO, and other styles

21

Jónsson, Ólafur Fannar. "Motorized testing framework for a computer vision application." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279966.

Full text

Abstract:

A successful design and implementation of a modern computer vision system, based on two moving cameras, requires a particular testing infrastructure to be in place. Two pairs of linear actuators, mounted on rigid aluminum frames, separated by a 90◦ angle, were assembled for this purpose. In combination with a Xilinx Zynq-7020 system-on-a-chip, four Trinamic TMC2130 stepper motor drivers and software written explicitly for the project, programmable motion control was made possible. Self-calibrating and positioning functionalities were tested and shown to work with a precision of +/- 1 mm. Successfully fulfilling its stated functionality, the resulting build can thus serve as a foundation for future projects.
En lyckad design och implementering av ett modernt system för datorseende baserat på två rörliga kameror, kräver tillgång till en viss testmiljö. Två par av elektriska linjära ställdon, monterade på varsin ram, byggda av aluminiumprofiler och separerade med en 90◦ vinkel, monterades för detta ändamål. I kombination med en Xilinx Zynq-7020 systemkrets, fyra Trinamic TMC2130 stegmotordrivarkretsar och en mjukvara speciellt skriven för detta projekt, blev programmerbar rörelsekontroll möjlig. Självkalibrering och rörelsefunktioner testades och visades fungera med +/- 1 mm precision. Genom att uppfylla sin angivna funktion kan bygget bli en grund för framtida projekt.

APA, Harvard, Vancouver, ISO, and other styles

22

Persson, Anton, and Niklas Dymne. "Classification of black plastic granulates using computer vision." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-45674.

Full text

Abstract:

Pollution and climate change are some of the biggest challenges facing humanity. Moreover, for a sustainable future, recycling is needed. Plas- tic is a big part of the recycled material today, but there are problems that the recycling world is facing. The modern-day recycling facilities can handle plastics of all colours except black plastics. For this reason, most recycling companies have resorted to methods unaffected by colour, like the method used at Stena Nordic Recycling Central. The unawareness of the individual plastics causes the problem that Stena Nordic Recycling Central has to wait until an entire bag of plastic granulates has been run through the production line and sorted to test its purity using a chemistry method. Finding out if the electrostats divider settings are correct using this testing method is costly and causes many re-runs. If the divider set- ting is valid in an earlier state, it will save both time and the number of re-runs needed.This thesis aims to create a system that can classify different types of plas- tics by using image analysis. This thesis will explore two techniques to solve this problem. The two computer vision techniques will be the RGB method see 3.3.2 and machine learning see 3.3.4 using transfer learning with an AlexNet. The aim is the accuracy of at least 95% when classifying the plastics granulates.The Convolutional neural network used in this thesis is an AlexNet. The choice of method to further explore is decided in the method part of this thesis. The results of the computer vision method and RGB method were difficult to determine more about in section 4.2. It was not clear if one plastic was blacker than the other. This uncertainty and the fact that a Convolutional neural network takes more features than just RGB into a count, discussed in section 3.3, makes the computer vision method, Con- volutional neural network, a method to further explore in this thesis. The results gathered from the Convolutional neural network’s training was 95% accuracy in classifying the plastic granulates. A separate test is also needed to make sure the accuracy is close to the network accuracy. The result from the stand-alone test was 86.6% accurate, where the plastic- type Polystyrene had a subpar result of 73.3% and 100% accuracy when classifying Acrylonitrile butadiene styrene. The results from the Convo- lutional neural network show that black plastics could be classified using machine learning and could be an excellent solution for classifying and recycling black plastics if further research on the field is conducted.

APA, Harvard, Vancouver, ISO, and other styles

23

Javadi, Mohammad Saleh. "Computer Vision Algorithms for Intelligent Transportation Systems Applications." Licentiate thesis, Blekinge Tekniska Högskola, Institutionen för matematik och naturvetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17166.

Full text

Abstract:

In recent years, Intelligent Transportation Systems (ITS) have emerged as an efficient way of enhancing traffic flow, safety and management. These goals are realized by combining various technologies and analyzing the acquired data from vehicles and roadways. Among all ITS technologies, computer vision solutions have the advantages of high flexibility, easy maintenance and high price-performance ratio that make them very popular for transportation surveillance systems. However, computer vision solutions are demanding and challenging due to computational complexity, reliability, efficiency and accuracy among other aspects. In this thesis, three transportation surveillance systems based on computer vision are presented. These systems are able to interpret the image data and extract the information about the presence, speed and class of vehicles, respectively. The image data in these proposed systems are acquired using Unmanned Aerial Vehicle (UAV) as a non-stationary source and roadside camera as a stationary source. The goal of these works is to enhance the general performance of accuracy and robustness of the systems with variant illumination and traffic conditions. This is a compilation thesis in systems engineering consisting of three parts. The red thread through each part is a transportation surveillance system. The first part presents a change detection system using aerial images of a cargo port. The extracted information shows how the space is utilized at various times aiming for further management and development of the port. The proposed solution can be used at different viewpoints and illumination levels e.g. at sunset. The method is able to transform the images taken from different viewpoints and match them together. Thereafter, it detects discrepancies between the images using a proposed adaptive local threshold. In the second part, a video-based vehicle's speed estimation system is presented. The measured speeds are essential information for law enforcement and they also provide an estimation of traffic flow at certain points on the road. The system employs several intrusion lines to extract the movement pattern of each vehicle (non-equidistant sampling) as an input feature to the proposed analytical model. In addition, other parameters such as camera sampling rate and distances between intrusion lines are also taken into account to address the uncertainty in the measurements and to obtain the probability density function of the vehicle's speed. In the third part, a vehicle classification system is provided to categorize vehicles into \private car", \light trailer", \lorry or bus" and \heavy trailer". This information can be used by authorities for surveillance and development of the roads. The proposed system consists of multiple fuzzy c-means clusterings using input features of length, width and speed of each vehicle. The system has been constructed by using prior knowledge of traffic regulations regarding each class of vehicle in order to enhance the classification performance.

APA, Harvard, Vancouver, ISO, and other styles

24

Yang, Chen. "Machine Learning and Computer Vision for PCB Verification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290370.

Full text

Abstract:

Digitizing printed circuit boards (PCB) from images with computer science techniques is efficient in analyzing the PCB circuit. This automatic optic processing could help electronic engineers have a faster and more in-depth insight into complex multilayer PCB. This automatic optic processing could help electronic engineers have a faster and more in-depth insight of complex multi- layer PCB. In this thesis, multiple machine learning and computer vision methods for extracting PCB circuits are investigated, designed, and tested with real- world PCB data. PCB image dataset is collected by professional delayer engineers, that consist of every layer of PCB and Xray 3D models of the whole PCB. Region of interest (RoI) cropping and image alignment are applied firstly as in the pre- process stage. Detection and localization of electronic components are implemented with deep learning networks (Faster RCNN), unsupervised machine learning clustering (XOR-based K- means), and multiple template matching, their accuracy results are 71.2%, 82.3% and 96.5%, respectively. For the multilayer circuit extraction, the metallic print circuit is segmented in YCbCr color space, then the connection of every circuit net is obtained.
Digitalisering av tryckta kretskort (PCB) från bilder med datavetenskapstekniker är effektivt för att analysera PCB: s kretsar. Denna automatiska optiska bearbetning kan hjälpa elektroniska ingenjörer att få en snabbare och mer djupgående inblick i komplexa flerlagers PCB. I denna avhandling undersöks, designas och testas flera maskininlärnings- och datorvisionsmetoder för att extrahera PCB- kretsar med verkliga PCB- data. PCB- bilddataset samlas av professionella de-layer-ingenjörer, som består av varje lager av PCB och röntgen 3Dmodeller av hela PCB. Beskärning av region av intresse (RoI) och bildjustering tillämpas först som i förprocessstadiet. Upptäckt och lokalisering av elektroniska komponenter implementeras med djupinlärningsnätverk (Faster RCNN), utan tillsyn av maskininlärningskluster (XOR- based K- means) och flera mallmatchningar. För extraktion med flera lager kretsar är den metalliska utskriftskretsen segmenterad i YCbCr- färgutrymme, då erhålls anslutningen av varje kretsnät.

APA, Harvard, Vancouver, ISO, and other styles

25

Almin, Fredrik. "Detection of Non-Ferrous Materials with Computer Vision." Thesis, Linköpings universitet, Datorseende, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-175519.

Full text

Abstract:

In one of the facilities at the Stena Recycling plant in Halmstad, Sweden, about 300 tonnes of metallic waste is processed each day with the aim of sorting out all non-ferrous material. At the end of this process, non-ferrous materials are manually sorted out from the ferrous materials. This thesis investigates a computer vision based approach to identify and localize the non-ferrous materials and eventually automate the sorting.Images were captured of ferrous and non-ferrous materials. The images areprocessed and segmented to be used as annotation data for a deep convolutionalneural segmentation network. Network models have been trained on different kinds and amounts of data. The resulting models are evaluated and tested in ac-cordance with different evaluation metrics. Methods of creating advanced train-ing data by merging imaging information were tested. Experiments with using classifier prediction confidence to identify objects of unknown classes were per-formed. This thesis shows that it is possible to discern ferrous from non-ferrous mate-rial with a purely vision based system. The thesis also shows that it is possible to automatically create annotated training data. It becomes evident that it is possi-ble to create better training data, tailored for the task at hand, by merging image data. A segmentation network trained on more than two classes yields lowerprediction confidence for objects unknown to the classifier.Substituting manual sorting with a purely vision based system seems like aviable approach. Before a substitution is considered, the automatic system needsto be evaluated in comparison to the manual sorting.

APA, Harvard, Vancouver, ISO, and other styles

26

Thazhurazhikath, Rajendran Prajit. "Operational Data Extraction from Frontal Vehicular Camera using Computer Vision." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-287359.

Full text

Abstract:

A data-driven understanding of how a vehicle is used can help transportation companies improve their products and provide better service to their customers. Sensors have been the usual source of the operational data of vehicles. However, in order to improve and provide new services to customers it is often necessary to understand the operation of the vehicle in new ways with collection of new data that the vehicle sensors do not capture today. We hypothesize that a camera alone would be able to capture a large number of operational variables, thereby eliminating the need of multiple, costly sensors. This project investigates the feasibility of collecting valuable operational data through image analysis of photos or videos from the on-board camera(s) and evaluate the best techniques to collect and analyse operational data from the vision of the vehicle. The goal of the project is to extract five variables- road curvature, traffic density, pedestrian density, nature of area and motion. Several experiments were carried out to determine the most suitable architecture for each of the variables as well as ensemble techniques, smoothing techniques and data storage techniques. We evaluate the models based on their performance on test frames from three different datasets. Additionally, we also evaluate the smoothing techniques based on analysis of data vectors over a short section of selected videos.
En datadriven förståelse för hur ett fordon används kan hjälpa transportföretag att förbättra sina produkter och ge bättre service till sina kunder. Sensorer har varit den vanliga källan till driftsdata för fordon. Men för att förbättra och tillhandahålla nya tjänster till kunder är det ofta nödvändigt att förstå fordonets drift på nya sätt med insamling av ny data som fordonsensorerna inte fångar idag. Vi antar att en kamera ensam skulle kunna fånga ett stort antal driftsvariabler och därmed eliminera behovet av flera, kostsamma sensorer. Detta projekt undersöker möjligheten att samla in nya värdefulla operativa data genom bildanalys av foton eller videoklipp från kameran / kamerorna och utvärdera de bästa teknikerna för att samla in och analysera driftsdata från fordonets vision. Projektets mål är att utvinna fem variabler - vägkurvning, trafikdensitet, fotgänglighetstäthet, naturens område och rörelse. Flera experiment genomfördes för att bestämma den mest passande arkitekturen för var och en av variablerna samt ensembletekniker, utjämningstekniker och datalagringstekniker för varje variable. Vi utvärderar modellerna utifrån deras prestanda på testramar från tre olika datasätt. Dessutom utvärderar vi också utjämningsteknikerna baserat på analys av datavektorer över en kort sektion av utvalda videor.

APA, Harvard, Vancouver, ISO, and other styles

27

Lindvall, Victor. "A Computer Vision-Based Approach for Automated Inspection of Cable Connections." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-448446.

Full text

Abstract:

The goal of the project was to develop an algorithm based on a Convolutional NeuralNetwork(CNN) for automatically detecting exposed metal components on coaxialcable connections, a.k.a. the detector. We show that the performance of such a CNN trained to identify bad weatherproofings can be improved by applying an image post processing technique. This post processing technique utilizes specular features as an advantage when predicting exposed metal components. Such specular features are notorious for posing problems in computer vision algorithms and therefore typically removed. The results achieved by applying the stand alone detector, without post processing, are compared with the image post processing approach to highlight the benefits of implementing such an algorithm.

APA, Harvard, Vancouver, ISO, and other styles

28

Ringaby, Erik. "Geometric Computer Vision for Rolling-shutter and Push-broom Sensors." Licentiate thesis, Linköpings universitet, Datorseende, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-77391.

Full text

Abstract:

Almost all cell-phones and camcorders sold today are equipped with a CMOS (Complementary Metal Oxide Semiconductor) image sensor and there is also a general trend to incorporate CMOS sensors in other types of cameras. The sensor has many advantages over the more conventional CCD (Charge-Coupled Device) sensor such as lower power consumption, cheaper manufacturing and the potential for on-chip processing. Almost all CMOS sensors make use of what is called a rolling shutter. Compared to a global shutter, which images all the pixels at the same time, a rolling-shutter camera exposes the image row-by-row. This leads to geometric distortions in the image when either the camera or the objects in the scene are moving. The recorded videos and images will look wobbly (jello effect), skewed or otherwise strange and this is often not desirable. In addition, many computer vision algorithms assume that the camera used has a global shutter, and will break down if the distortions are too severe. In airborne remote sensing it is common to use push-broom sensors. These sensors exhibit a similar kind of distortion as a rolling-shutter camera, due to the motion of the aircraft. If the acquired images are to be matched with maps or other images, then the distortions need to be suppressed. The main contributions in this thesis are the development of the three dimensional models for rolling-shutter distortion correction. Previous attempts modelled the distortions as taking place in the image plane, and we have shown that our techniques give better results for hand-held camera motions. The basic idea is to estimate the camera motion, not only between frames, but also the motion during frame capture. The motion can be estimated using inter-frame image correspondences and with these a non-linear optimisation problem can be formulated and solved. All rows in the rolling-shutter image are imaged at different times, and when the motion is known, each row can be transformed to the rectified position. In addition to rolling-shutter distortions, hand-held footage often has shaky camera motion. It has been shown how to do efficient video stabilisation, in combination with the rectification, using rotation smoothing. In the thesis it has been explored how to use similar techniques as for the rolling-shutter case in order to correct push-broom images, and also how to rectify 3D point clouds from e.g. the Kinect depth sensor.
VGS

APA, Harvard, Vancouver, ISO, and other styles

29

Sun, Ruiwen. "Detecting Faulty Tape-around Weatherproofing Cables by Computer Vision." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-272108.

Full text

Abstract:

More cables will be installed owing to setting up more radio towers when it comes to 5G. However, a large proportion of radio units are constructed high in the open space, which makes it difficult for human technicians to maintain the systems. Under these circumstances, automatic detections of errors among radio cabinets are crucial. Cables and connectors are usually covered with weatherproofing tapes, and one of the most common problems is that the tapes are not closely rounded on the cables and connectors. This makes the tape go out of the cable and look like a waving flag, which may seriously damage the radio systems. The thesis aims at detecting this flagging-tape and addressing the issues. This thesis experiments two methods for object detection, the convolutional neural network as well as the OpenCV and image processing. The former uses YOLO (You Only Look Once) network for training and testing, while in the latter method, the connected component method is applied for the detection of big objects like the cables and line segment detector is responsible for the flagging-tape boundary extraction. Multiple parameters, structurally and functionally unique, were developed to find the most suitable way to meet the requirement. Furthermore, precision and recall are used to evaluate the performance of the system output quality, and in order to improve the requirements, larger experiments were performed using different parameters. The results show that the best way of detecting faulty weatherproofing is with the image processing method by which the recall is 71% and the precision reaches 60%. This method shows better performance than YOLO dealing with flagging-tape detection. The method shows the great potential of this kind of object detection, and a detailed discussion regarding the limitation is also presented in the thesis.
Fler kablar kommer att installeras på grund av installation av fler radiotorn när det gäller 5G. En stor del av radioenheterna är dock konstruerade högt i det öppna utrymmet, vilket gör det svårt för mänskliga tekniker att underhålla systemen. Under dessa omständigheter är automatiska upptäckter av fel bland radioskåp avgörande. Kablar och kontakter täcks vanligtvis med väderbeständiga band, och ett av de vanligaste problemen är att banden inte är rundade på kablarna och kontakterna. Detta gör att tejpen går ur kabeln och ser ut som en viftande flagga, vilket allvarligt kan skada radiosystemen. Avhandlingen syftar till att upptäcka detta flaggband och ta itu med frågorna. Den här avhandlingen experimenterar två metoder för objektdetektering, det invändiga neurala nätverket såväl som OpenCV och bildbehandling. Den förstnämnda använder YOLO (You Only Look Once) nätverk för träning och testning, medan i den senare metoden används den anslutna komponentmetoden för detektering av stora föremål som kablarna och linjesegmentdetektorn är ansvarig för utvinning av bandbandgränsen. Flera parametrar, strukturellt och funktionellt unika, utvecklades för att hitta det mest lämpliga sättet att uppfylla kravet. Dessutom används precision och återkallande för att utvärdera prestandan för systemutgångskvaliteten, och för att förbättra kraven utfördes större experiment med olika parametrar. Resultaten visar att det bästa sättet att upptäcka felaktigt väderbeständighet är med bildbehandlingsmetoden genom vilken återkallelsen är 71% och precisionen når 60%. Denna metod visar bättre prestanda än YOLO som hanterar markering av flaggband. Metoden visar den stora potentialen för denna typ av objektdetektering, och en detaljerad diskussion om begränsningen presenteras också i avhandlingen.

APA, Harvard, Vancouver, ISO, and other styles

30

Hallenberg, Johan. "Robot Tool Center Point Calibration using Computer Vision." Thesis, Linköping University, Department of Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-9520.

Full text

Abstract:

Today, tool center point calibration is mostly done by a manual procedure. The method is very time consuming and the result may vary due to how skilled the operators are.

This thesis proposes a new automated iterative method for tool center point calibration of industrial robots, by making use of computer vision and image processing techniques. The new method has several advantages over the manual calibration method. Experimental verifications have shown that the proposed method is much faster, still delivering a comparable or even better accuracy. The setup of the proposed method is very easy, only one USB camera connected to a laptop computer is needed and no contact with the robot tool is necessary during the calibration procedure.

The method can be split into three different parts. Initially, the transformation between the robot wrist and the tool is determined by solving a closed loop of homogeneous transformations. Second an image segmentation procedure is described for finding point correspondences on a rotation symmetric robot tool. The image segmentation part is necessary for performing a measurement with six degrees of freedom of the camera to tool transformation. The last part of the proposed method is an iterative procedure which automates an ordinary four point tool center point calibration algorithm. The iterative procedure ensures that the accuracy of the tool center point calibration only depends on the accuracy of the camera when registering a movement between two positions.

APA, Harvard, Vancouver, ISO, and other styles

31

ALI, FAIZA, and MAKSIMS SVJATOHA. "Integration of Computer Vision Methods and Sensor Fusion Technologies for Precision Driving." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299793.

Full text

Abstract:

Increasing interest in artificial intelligence has given rise to new technologies. This has enabled advanced sensors within fields such as computer vision, which boast increased precision, consistency and lack the accumulation of small errors over time. However, they require increased computing power and are prone to processing delays. It is therefore interesting to combine them with faster and more traditional sensors in order to compensate for their weaknesses. While there exist such combinations today, is it interesting to see if there are ways to use computer vision techniques to bring the performance of cheaper sensors up to the standard of more expensive and industrial ones in terms of accuracy and precision. In this thesis, a standard Raspberry Pi camera has been installed on a Jetracer vehicle to estimate the distance to a target object, trying to fuse its output with that of a rotary encoder. A Kalman filter is used for this sensor fusion setup and it is designed to reduce any existing measurement uncertainties present in both the depth estimation algorithm for the camera, as well as the encoder position outputs. There exists a relationship between uncertainty mitigation and effective resolution. Sensor fusion was partially implemented in an online setting, but the focus was on fusing recorded sensor data to avoid issues with compensation for the inherent vision system latency. Fusing encoder measurements with those of a vision system significantly reduced position estimation uncertainty compared to only using the vision system, but it is unclear if it is better than using the encoder alone. Further investigation confirms that increased latencies and reduced sampling frequencies have a negative impact on position uncertainty. However, the impact of latencies in realistic ranges is negligible. There also exists a trade-off in sampling frequencies between precision and accuracy - higher frequencies are not necessarily better.
Med ett ökat intresse för artificiell intelligens har avancerade sensorer som använder datorseende slagit igenom stort i teknikvärlden. Dessa sensorer är kända för att ha hög precision, ger mer konsekventa mätningar och har inga fel som ackumuleras med tiden. En nackdel med dessa sensorer är att de kräver mer datorkraft för processering, vilket kan orsaka fördröjningar i utsignalen. Därför är det intressant att kombinera dessa med mer traditionella sensorer för att försöka kompensera för deras svagheter. Trots att denna metod redan har implementerats, är det intressant att studera om det finns ett sätt att öka prestandan hos billigare sensorer till en precisionsnivå som är likvärdig med dyrare och industriella varianter. Detta kan möjliggöras genom att använda olika bildbehandlingsmetoder och algoritmer för datorseende. I detta examensarbete, har en Raspberry Pi-kamera monterats på ett Jetracer-fordon för att uppskatta avståndet till ett målobjekt. Med hjälp av sensorfusion, kommer dess utsignal att kombineras med utsignalen från en roterande pulsgivare. Denna sensorfusion sker med hjälp av ett Kalman-filter vilket eventuellt kommer att minska på de osäkerheter som både bildbehandlingsalgoritmen för kameran och utsignalen från pulsgivaren medför. Det finns ett samband mellan osäkerheten i utsignalerna från datorseendesystemet och den effektiva upplösningen. Sensorfusion har delvis implementerats i ett online-scenario, men fokuset låg på att sammanfoga data från inspelningar för att undvika latensproblem från datorseendesystemet. Fusion av sensordata minskade osäkerheten i position jämfört med att endast använda datorseendesystemet. Det är dock oklart om det är bättre än att endast använda pulsgivaren. Ökade latenser och minskade samplingsfrekvenser hade en negativ inverkan på osäkerheten i position. Latenser som ligger inom realistiska gränser visar har dock försumbar inverkan. Vidare finns det en avvägning att göra mellan precision och noggrannhet - högre samplingsfrekvenser ökar noggrannheten men minskar precisionen.

APA, Harvard, Vancouver, ISO, and other styles

32

Rehnholm, Jonas. "Battery Pack Part Detection and Disassembly Verification Using Computer Vision." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-54852.

Full text

Abstract:

Developing the greenest battery cell and establishing a European supply of batteries is the main goal for Northvolt. To achieve this, the recycling of batteries is a key enabler towards closing the loop and enabling the future of energy.When it comes to the recycling of electric vehicle battery packs, dismantling is one of of the main process steps.Given the size, weight and high voltage of the battery packs, automatic disassembly using robots is the preferred solution. The work presented in this thesis aims to develop and integrate a vision system able to identify and verify the battery pack dismantling process. To achieve this, two cameras were placed in the robot cell and the object detectors You Only Look Once (YOLO) and template matching were implemented, tested and compared. The results show that YOLO is the best object detector out of the ones implemented. The integration of the vision system with the robot controller was also tested and showed that with the results from the vision system, the robot controller can make informed decisions regarding the disassembly.

APA, Harvard, Vancouver, ISO, and other styles

33

Turesson, Eric. "Multi-camera Computer Vision for Object Tracking: A comparative study." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21810.

Full text

Abstract:

Background: Video surveillance is a growing area where it can help with deterring crime, support investigation or to help gather statistics. These are just some areas where video surveillance can aid society. However, there is an improvement that could increase the efficiency of video surveillance by introducing tracking. More specifically, tracking between cameras in a network. Automating this process could reduce the need for humans to monitor and review since the tracking can track and inform the relevant people on its own. This has a wide array of usability areas, such as forensic investigation, crime alerting, or tracking down people who have disappeared. Objectives: What we want to investigate is the common setup of real-time multi-target multi-camera tracking (MTMCT) systems. Next up, we want to investigate how the components in an MTMCT system affect each other and the complete system. Lastly, we want to see how image enhancement can affect the MTMCT. Methods: To achieve our objectives, we have conducted a systematic literature review to gather information. Using the information, we implemented an MTMCT system where we evaluated the components to see how they interact in the complete system. Lastly, we implemented two image enhancement techniques to see how they affect the MTMCT. Results: As we have discovered, most often, MTMCT is constructed using a detection for discovering object, tracking to keep track of the objects in a single camera and a re-identification method to ensure that objects across cameras have the same ID. The different components have quite a considerable effect on each other where they can sabotage and improve each other. An example could be that the quality of the bounding boxes affect the data which re-identification can extract. We discovered that the image enhancement we used did not introduce any significant improvement. Conclusions: The most common structure for MTMCT are detection, tracking and re-identification. From our finding, we can see that all the component affect each other, but re-identification is the one that is mostly affected by the other components and the image enhancement. The two tested image enhancement techniques could not introduce enough improvement, but other image enhancement could be used to make the MTMCT perform better. The MTMCT system we constructed did not manage to reach real-time.

APA, Harvard, Vancouver, ISO, and other styles

34

Silvestri, Gianluigi. "One-Shot Neural Architecture Search for Deep Multi-Task Learning in Computer Vision." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-282831.

Full text

Abstract:

In this work, a neural architecture search algorithm for multi-task learning is proposed. Given any dataset and tasks group, the method aims to find the optimal way of sharing layers among tasks in convolutional neural networks. A search space suited to multi-task learning is designed, and a novel strategy to rank different Pareto optimal solutions is developed. The core of the algorithm is an adaptation of a state-of-the-art neural architecture search strategy. Experimental results on the Cityscapes dataset, on the tasks of semantic segmentation and depth estimation, do not provide the expected results. Despite the lack of stable results, this work lays down the fundamentals to further develop novel multi-task neural architecture search methods.
I detta arbete föreslås en sökalgoritm för arkitektur inom multiaktivitetsinlärning. Givet en generell datamängd och aktivitetsgrupp, syftar metoden till att hitta det optimala sättet att dela lager mellan aktiviteterna i ett faltningsnätverk. Ett sökrum anpassat till multiaktivitetsinlärning har designats och en ny strategi att ranka olika optimala Pareto-lösningar har utvecklats. Kärnan i algoritmen är en anpassad state-of-the-art sökstrategi för arkitektur. Experimentella resultat för Cityscapes-datasetet, i uppgifter rörande semantisk segmentation och estimation av djup, levererar inte förväntade resultat. Trots avsaknaden av stabila resultat, ger detta arbete en grund för fortsätt utveckling av sökmetoder för arkitektur inom multiaktivitetsinlärning.

APA, Harvard, Vancouver, ISO, and other styles

35

Zhang, Lichang. "Non-invasive detection algorithm of thermal comfort based on computer vision." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-241082.

Full text

Abstract:

The waste of building energy consumption is a major challenge in the world. And the real-time detection of human thermal comfort is an effective way to meet this issue. As mentioned in name, it means to detect the human’s comfort level in real-time and non-invasively. However, due to the various factors such as individual difference of thermal comfort, elements related to climatic (temperature, humidity, illumination, etc.) and so on, there is still a long way to implement this strategy in real life. From another perspective, the current HVAC (heating, ventilating and air-conditioning) systems cannot provide flexible interaction channels to adjust atmosphere, and naturally fails to satisfy requirements of users. All of them indicate the necessity to develop a detection method for human thermal comfort. In this paper, a non-invasion detection method toward human thermal comfort is proposed from two perspectives: macro human postures and skin textures. In posture part, OpenPose is used for analyzing the position coordinates of human body key points’ in images, for example, elbow, knee, and hipbone, etc. And the results of analyzing would be interpreted from the term of thermal comfort. In skin textures, deep neural network is used to predict the temperature of human skins via images. Based on Fanger’s theory of thermal comfort, the results of both parts are satisfying: subjects’ postures can be captured and interpreted into different thermal comfort level: hot, cold and comfort. And the absolute error of prediction from neurons network is less than 0.125 degrees centigrade which is the equipment error of thermometer used in data acquisition. With the solution proposed by this paper, it is promising to non-invasively detect the thermal comfort level of users from postures and skin textures. Finally, theconclusion and future work are discussed in final chapter.
Slöseriet med att bygga energiförbrukningen är en stor utmaning i världen. Ochdetektering av mänsklig termisk komfort i realtid är ett effektivt sätt att lösaproblemet. Som nämns i namn betyder det att detektera människans komfortnivå i realtid och icke-invasivt. På grund av de olika faktorerna som individuell skillnad i termisk komfort, är emellertid faktorer som är relaterade till klimat (temperatur, luftfuktighet, belysning etc.) det fortfarande en lång väg att implementera denna strategi i verkligheten. Från ett annat perspektiv kan nuvarande system för uppvärmning, ventilation och luftkonditionering inte tillhandahålla flexibla interaktionskanaler för att anpassa atmosfären och naturligtvis misslyckas till nöjda krav från användarna. Alla indikerar nödvändigheten av att utveckla en detekteringsmetod för mänsklig termisk komfort. I detta dokument föreslås en ickeinvasion detekteringsmetod mot mänsklig termisk komfort från två perspektiv: makro mänskliga hållningar och hudtexturer. I hållningspartiet används OpenPose för att analysera positionskoordinaterna för kroppens huvudpunkter i bilder, till exempel armbåge, knä och höftben osv. Och resultaten av analysen skulle tolkas från termen av termisk komfort. I hudtexturer används djupt neuralt nätverk för att förutse temperaturen på mänskliga skinn via bilder. Baserat på Fangers teorin om värmekomfort är resultaten av båda delarna tillfredsställande: subjektens hållningar kan fångas och tolkas till olika värmekomfortnivåer: varm, kall och komfort. Och det absoluta felet av prediktering från neuronnätverket är mindre än 0,125 grader Celsius, vilket är utrustningsfelet hos termometern som används vid datainsamling. Med lösningar i detta papper är det lovande att detektera användarens värmekomfortnivå fritt från invändningar och hudtexturer. Slutligen diskuteras slutsatserna och detframtida arbetet i sista kapitlet.

APA, Harvard, Vancouver, ISO, and other styles

36

Tarassu, Jonas. "GPU-Accelerated Frame Pre-Processing for Use in Low Latency Computer Vision Applications." Thesis, Linköpings universitet, Informationskodning, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-142019.

Full text

Abstract:

The attention for low latency computer vision and video processing applications are growing for every year, not least the VR and AR applications. In this thesis the Contrast Limited Adaptive Histogram Equalization (CLAHE) and Radial Dis- tortion algorithms are implemented using both CUDA and OpenCL to determine whether these type of algorithms are suitable for implementations aimed to run at GPUs when low latency is of utmost importance. The result is an implemen- tation of the block versions of the CLAHE algorithm which utilizes the built in interpolation hardware that resides on the GPU to reduce block effects and an im- plementation of the Radial Distortion algorithm that corrects a 1920x1080 frame in 0.3 ms. Further this thesis concludes that the GPU-platform might be a good choice if the data to be processed can be transferred to and possibly from the GPU fast enough and that the choice of compute API mostly is a matter of taste.

APA, Harvard, Vancouver, ISO, and other styles

37

Örn, Fredrik. "Computer Vision for Camera Trap Footage : Comparing classification with object detection." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447482.

Full text

Abstract:

Monitoring wildlife is of great interest to ecologists and is arguably even more important in the Arctic, the region in focus for the research network INTERACT, where the effects of climate change are greater than on the rest of the planet. This master thesis studies how artificial intelligence (AI) and computer vision can be used together with camera traps to achieve an effective way to monitor populations. The study uses an image data set, containing both humans and animals. The images were taken by camera traps from ECN Cairngorms, a station in the INTERACT network. The goal of the project is to classify these images into one of three categories: "Empty", "Animal" and "Human". Three different methods are compared, a DenseNet201 classifier, a YOLOv3 object detector, and the pre-trained MegaDetector, developed by Microsoft. No sufficient results were achieved with the classifier, but YOLOv3 performed well on human detection, with an average precision (AP) of 0.8 on both training and validation data. The animal detections for YOLOv3 did not reach an as high AP and this was likely because of the smaller amount of training examples. The best results were achieved by MegaDetector in combination with an added method to determine if the detected animals were dogs, reaching an average precision of 0.85 for animals and 0.99 for humans. This is the method that is recommended for future use, but there is potential to improve all the models and reach even more impressive results.Teknisk-naturvetenskapliga

APA, Harvard, Vancouver, ISO, and other styles

38

Öfjäll, Kristoffer. "Online Learning for Robot Vision." Licentiate thesis, Linköpings universitet, Datorseende, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-110892.

Full text

Abstract:

In tele-operated robotics applications, the primary information channel from the robot to its human operator is a video stream. For autonomous robotic systems however, a much larger selection of sensors is employed, although the most relevant information for the operation of the robot is still available in a single video stream. The issue lies in autonomously interpreting the visual data and extracting the relevant information, something humans and animals perform strikingly well. On the other hand, humans have great diculty expressing what they are actually looking for on a low level, suitable for direct implementation on a machine. For instance objects tend to be already detected when the visual information reaches the conscious mind, with almost no clues remaining regarding how the object was identied in the rst place. This became apparent already when Seymour Papert gathered a group of summer workers to solve the computer vision problem 48 years ago [35]. Articial learning systems can overcome this gap between the level of human visual reasoning and low-level machine vision processing. If a human teacher can provide examples of what to be extracted and if the learning system is able to extract the gist of these examples, the gap is bridged. There are however some special demands on a learning system for it to perform successfully in a visual context. First, low level visual input is often of high dimensionality such that the learning system needs to handle large inputs. Second, visual information is often ambiguous such that the learning system needs to be able to handle multi modal outputs, i.e. multiple hypotheses. Typically, the relations to be learned are non-linear and there is an advantage if data can be processed at video rate, even after presenting many examples to the learning system. In general, there seems to be a lack of such methods. This thesis presents systems for learning perception-action mappings for robotic systems with visual input. A range of problems are discussed, such as vision based autonomous driving, inverse kinematics of a robotic manipulator and controlling a dynamical system. Operational systems demonstrating solutions to these problems are presented. Two dierent approaches for providing training data are explored, learning from demonstration (supervised learning) and explorative learning (self-supervised learning). A novel learning method fullling the stated demands is presented. The method, qHebb, is based on associative Hebbian learning on data in channel representation. Properties of the method are demonstrated on a vision-based autonomously driving vehicle, where the system learns to directly map low-level image features to control signals. After an initial training period, the system seamlessly continues autonomously. In a quantitative evaluation, the proposed online learning method performed comparably with state of the art batch learning methods.

APA, Harvard, Vancouver, ISO, and other styles

39

Pettersson, Erik. "Signal- och bildbehandling på moderna grafikprocessorer." Thesis, Linköping University, Department of Electrical Engineering, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-5404.

Full text

Abstract:

En modern grafikprocessor är oerhört kraftfull och har en prestanda som potentiellt sett är många gånger högre än för en modern mikroprocessor. I takt med att grafikprocessorn blivit alltmer programmerbar har det blivit möjligt att använda den för beräkningstunga tillämpningar utanför dess normala användningsområde. Inom det här arbetet utreds vilka möjligheter och begränsningar som uppstår vid användandet av grafikprocessorer för generell programmering. Arbetet inriktas främst mot signal- och bildbehandlingstillämpningar men mycket av principerna är tillämpliga även inom andra områden.

Ett ramverk för bildbehandling implementeras och några algoritmer inom bildanalys realiseras och utvärderas, bland annat stereoseende och beräkning av optiskt flöde. Resultaten visar på att vissa tillämpningar kan uppvisa en avsevärd prestandaökning i en grafikprocessor jämfört med i en mikroprocessor men att andra tillämpningar kan vara ineffektiva eller mycket svåra att implementera.

The modern graphical processing unit, GPU, is an extremely powerful unit, potentially many times more powerful than a modern microprocessor. Due to its increasing programmability it has recently become possible to use it in computation intensive applications outside its normal usage. This work investigates the possibilities and limitations of general purpose programming on GPUs. The work mainly concentrates on signal and image processing although much of the principles are applicable to other areas as well.

A framework for image processing on GPUs is implemented and a few computer vision algorithms are implemented and evaluated, among them stereo vision and optical flow. The results show that some applications can gain a substantial speedup when implemented correctly in the GPU but others can be inefficent or extremly hard to implement.

APA, Harvard, Vancouver, ISO, and other styles

40

Mi, Yongcui. "Novel beam shaping and computer vision methods for laser beam welding." Licentiate thesis, Högskolan Väst, Avdelningen för produktionssystem (PS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hv:diva-16970.

Full text

Abstract:

Laser beam welding has been widely applied in different industrial sectors due to its unique advantages. However, there are still challenges, such as beam positioning in T-joint welding, and gap bridging in butt joint welding,especially in the case of varying gap width along a joint. It is expected that enabling more advanced control to a welding system, and obtaining more in-depth process knowledge could help to solve these issues. The aim of this work is to address such welding issues by a laser beam shaping technology using a novel deformable mirror together with computer vision methods and also to increase knowledge about the benefits and limitations with this approach. Beam shaping in this work was realized by a novel deformable mirror system integrated into an industrial processing optics. Together with a wave front sensor, a controlled adaptive beam shaping system was formed with a response time of 10 ms. The processes were monitored by a coaxial camera with selected filters and passive or active illumination. Conduction mode autogenous bead-on-plate welding and butt joint welding experiments have been used to understand the effect of beam shaping on the melt pool geometry. Circular Gaussian, and elliptical Gaussian shapes elongated transverse to and along the welding direction were studied. In-process melt pool images and cross section micrographs of the weld seams/beads were analyzed. The results showed that the melt pool geometry can be significantly modified by beam shaping using the deformable mirror. T-joint welding with different beam offset deviations relative to the center of the joint line was conducted to study the potential of using machine learning to track the process state. The results showed that machine learning can reach sufficient detection and estimation performance, which could also be used for on-line control. In addition, in-process and multidimensional data were accurately acquired using computer vision methods. These data reveal weaknesses of current thermo-fluid simulation model, which in turn can help to better understand and control laser beam welding. The obtained results in this work shows a huge potential in using the proposed methods to solve relevant challenges in laser beam welding.
Lasersvetsning används i stor utsträckning i olika industrisektorer på grund av dess unika fördelar. Det finns emellertid fortfarande utmaningar, såsom rätt positionering av laserstrålen vid genomträngningssvetsning av T-fogar och hantering av varierande spaltbredd längs fogen vid svetsning av stumfogar. Sådana problem förväntas kunna lösas med avancerade metoder för automatisering, metoder som också förväntas ge fördjupade kunskaper om processen. Syftet med detta arbete är att ta itu med dessa problem med hjälp av en teknik för lasereffektens fördelning på arbetsstycket, s.k. beam shaping. Det sker med hjälp av en ny typ av i realtid deformerbar spegel tillsammans med bildbehandling av kamerabilder från processen. För- och nackdelar med detta tillvägagångssätt undersöks.Beam shaping åstadkoms med hjälp av ny typ av deformerbart spegelsystem som integreras i en industriell processoptik. Tillsammans med en vågfrontsensor bildas ett adaptivt system för beam shaping med en svarstid på 10 ms. Processen övervakas av en kamera linjerad koaxialt med laserstrålen. För att kunna ta bilder av svetspunkten belyses den med ljus av lämplig våglängd, och kameran är försedd med ett motsvarande optiskt filter. Försök har utförts med svetsning utan tillsatsmaterial, direkt på plåtar, svetsning utan s.k. nyckelhål, för att förstå effekten av beam shaping på svetssmältans geometri. Gauss fördelade cirkulära och elliptiska former, långsträckta både tvärs och längs svetsriktningen har studerats. Bilder från svetssmältan har analyserats och även mikrostrukturen i tvärsnitt från de svetsade plåtarna. Resultaten visar att svetssmältans geometri kan modifieras signifikant genom beam shaping med hjälp av det deformerbara spegelsystemet. Genomträngningssvetsning av T-fogar med avvikelser relativt foglinjens centrum genomfördes för att studera potentialen i att använda maskininlärning för att fånga processens tillstånd. Resultaten visade att maskininlärning kan nå tillräcklig prestanda för detektering och skattning av denna avvikelse. Något som också kan användas för återkopplad styrning. Flerdimensionell processdata har samlats i realtid och analyserats med hjälp av bildbehandlingsmetoder. Dessa data avslöjar brister i nuvarande simuleringsmodeller,vilket i sin tur hjälper till med att bättre förstå och styra lasersvetsning.Resultaten från detta arbete uppvisar en god potential i att använda de föreslagna metoderna för att lösa relevanta utmaningar inom lasersvetsning.

Till licentiatuppsats hör 2 inskickade artiklar, som visas inte nu.

APA, Harvard, Vancouver, ISO, and other styles

41

Nicander, Torun. "Indoor triangulation system using vision sensors." Thesis, Uppsala universitet, Signaler och system, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-429676.

Full text

Abstract:

This thesis aims to investigate a triangulation system for indoor positioning in two dimensions (2D). The system was implemented using three Pixy2 vision sensors placed on a straight baseline. A Pixy2 consists of a camera lens and an image sensor (Aptina MT9M114) as well as a microcontroller (NXP LPC4330), and other components. It can track one or multiple colours, or a combination of colours. To position an object using triangulation, one needs to determine the angles (α) to the object from a pair of known observing points (i.e., any pair of the three Pixy2s' placed in fixed positions on the baseline in this project). This is done from the Pixy2s' images. Using the Pinhole Camera Model, the tangent of the angle, tan(α), is found to have a linear relation with the displacement Δx in the image plane (in pixels), namely, tan(α) = k Δx, where k is a constant depending on the specific Pixy2. A wooden test board was made specially to determine k for all the Pixy2s. It had distance marks made in two dimensions and had a Pixy2 affixed at the origin. By placing a coloured object at three different sets of spatial sampling points (marks), the constant k for each Pixy2 was determined with the error variance of < 5%. Position estimations of the triangulation system were conducted using all three pairs formed from the three Pixy2s and placing the positioned object at different positions in the 2D plane on the board. A combination using estimation values from all three pairs to make a more accurate estimate was also evaluated. The estimation results show the positioning accuracy ranging from 0.03678 cm to 2.064 cm for the z-coordinate, and from 0.02133 cm to 0.9785 cm for the x-coordinate, which are very satisfactory results. The vision sensors were quite sensitive to the light environment when finely tuned to track one object, which therefore has a significant effect on the performance of the vision sensor-based triangulation. An extension of the system to use more than three Pixy2s has been looked into and shown to be feasible. A method for auto-calibrating the Pixy2s' positions on the baseline was suggested and implemented. After auto-calibration, the system still performed satisfactory position estimations.

APA, Harvard, Vancouver, ISO, and other styles

42

Zukas, Paulius. "Raising Awareness of Computer Vision : How can a single purpose focused CV solution be improved?" Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-76625.

Full text

Abstract:

The concept of Computer Vision is not new or fresh. On contrary ideas have been shared and worked on for almost 60 years. Many use cases have been found throughout the years and various systems developed, but there is always a place for improvement. An observation was made, that methods used today are generally focused on a single purpose and implement expensive technology, which could be improved. In this report, we are going to go through an extensive research to find out if a professionally sold, expensive software, can be replaced by an off the shelf, low-cost solution entirely designed and developed in-house. To do that we are going to look at the history of Computer Vision, examples of applications, algorithms, and find general scenarios or computer vision problems which can be solved. We are then going take a step further and define solid use cases for each of the scenarios found. Finally, a prototype solution is going to be designed and presented. After analysing the results gathered we are going to reach out to the reader convincing him/her that such application can be developed and work efficiently in various areas saving investments to businesses.

APA, Harvard, Vancouver, ISO, and other styles

43

Thaung, Ludwig. "Advanced Data Augmentation : With Generative Adversarial Networks and Computer-Aided Design." Thesis, Linköpings universitet, Datorseende, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-170886.

Full text

Abstract:

CNN-based (Convolutional Neural Network) visual object detectors often reach human level of accuracy but need to be trained with large amounts of manually annotated data. Collecting and annotating this data can frequently be time-consuming and financially expensive. Using generative models to augment the data can help minimize the amount of data required and increase detection per-formance. Many state-of-the-art generative models are Generative Adversarial Networks (GANs). This thesis investigates if and how one can utilize image data to generate new data through GANs to train a YOLO-based (You Only Look Once) object detector, and how CAD (Computer-Aided Design) models can aid in this process. In the experiments, different models of GANs are trained and evaluated by visual inspection or with the Fréchet Inception Distance (FID) metric. The data provided by Ericsson Research consists of images of antenna and baseband equipment along with annotations and segmentations. Ericsson Research supplied the YOLO detector, and no modifications are made to this detector. Finally, the YOLO detector is trained on data generated by the chosen model and evaluated by the Average Precision (AP). The results show that the generative models designed in this work can produce RGB images of high quality. However, the quality reduces if binary segmentation masks are to be generated as well. The experiments with CAD input data did not result in images that could be used for the training of the detector. The GAN designed in this work is able to successfully replace objects in images with the style of other objects. The results show that training the YOLO detector with GAN-modified data compared to training with real data leads to the same detection performance. The results also show that the shapes and backgrounds of the antennas contributed more to detection performance than their style and colour.

APA, Harvard, Vancouver, ISO, and other styles

44

Haraldsson, Truls. "Real-time Vision-based Fall Detection : with Motion History Images and Convolutional Neural Networks." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-71137.

Full text

Abstract:

Falls among the elderly is a major health concern worldwide due to theserious consequences, such as higher mortality and morbidity. And as theelderly are the fastest growing age group, an important challenge for soci-ety is to provide support in their every day life activities. Given the socialand economical advantages of having an automatic fall detection system,these systems have attracted the attention from the healthcare industry.With the emerging trend of Smart Homes and the increasing numberof cameras in our daily environments, this creates an excellent opportu-nity for vision-based fall detection systems. In this work, an automaticreal-time vision-based fall detection system is presented. It uses motionhistory images to capture temporal features in a video sequence, spatialfeatures are then extracted efficiently for classification using depthwiseconvolutional neural network. The system is evaluated on three publicfall detection datasets, and furthermore compared to other state-of-the-art approaches.

APA, Harvard, Vancouver, ISO, and other styles

45

Häger, Gustav. "Improving Discriminative Correlation Filters for Visual Tracking." Thesis, Linköpings universitet, Datorseende, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-125963.

Full text

Abstract:

Generic visual tracking is one of the classical problems in computer vision. In this problem, no prior knowledge of the target is available aside from a bounding box in the initial frame of the sequence. The generic visual tracking is a difficult task due to a number of factors such as momentary occlusions, target rotations, changes in target illumination and variations in the target size. In recent years, discriminative correlation filter (DCF) based trackers have shown promising results for visual tracking. These DCF based methods use the Fourier transform to efficiently calculate detection and model updates, allowing significantly higher frame rates than competing methods. However, existing DCF based methods only estimate translation of the object while ignoring changes in size.This thesis investigates the problem of accurately estimating the scale variations within a DCF based framework. A novel scale estimation method is proposed by explicitly constructing translation and scale filters. The proposed scale estimation technique is robust and significantly improve the tracking performance, while operating at real-time. In addition, a comprehensive evaluation of feature representations in a DCF framework is performed. Experiments are performed on the benchmark OTB-2015 dataset, as well as the VOT 2014 dataset. The proposed methods are shown to significantly improve the performance of existing DCF based trackers.
Allmän visuell följning är ett klassiskt problem inom datorseende. I den vanliga formuleringen antas ingen förkunskap om objektet som skall följas, utöver en initial rektangel i en videosekvens första bild.Detta är ett mycket svårt problem att lösa allmänt på grund av occlusioner, rotationer, belysningsförändringar och variationer i objektets uppfattde storlek. På senare år har följningsmetoder baserade på diskriminativea korrelationsfilter gett lovande resultat inom området. Dessa metoder är baserade på att med hjälp av Fourertransformen effektivt beräkna detektioner och modellupdateringar, samtidigt som de har mycket bra prestanda och klarar av många hundra bilder per sekund. De nuvarande metoderna uppskattar dock bara translationen hos det följda objektet, medans skalförändringar ignoreras. Detta examensarbete utvärderar ett antal metoder för att göra skaluppskattningar inom ett korrelationsfilterramverk. En innovativ metod baserad på att konstruera separata skal och translationsfilter. Den föreslagna metoden är robust och har signifikant bättre följningsprestanda, samtidigt som den kan användas i realtid. Det utförs också en utvärdering av olika särdragsrepresentationer på två stora benchmarking dataset för följning.

APA, Harvard, Vancouver, ISO, and other styles

46

Ärleryd, Sebastian. "Realtime Virtual 3D Image of Kidney Using Pre-Operative CT Image for Geometry and Realtime US-Image for Tracking." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-234991.

Full text

Abstract:

In this thesis a method is presented to provide a 3D visualization of the human kidney and surrounding tissue during kidney surgery. The method takes advantage of the high detail of 3D X-Ray Computed Tomography (CT) and the high time resolution of Ultrasonography (US). By extracting the geometry from a single preoperative CT scan and animating the kidney by tracking its position in real time US images, a 3D visualization of the surgical volume can be created. The first part of the project consisted of building an imaging phantom as a simplified model of the human body around the kidney. It consists of three parts: the shell part representing surrounding tissue, the kidney part representing the kidney soft tissue and a kidney stone part embedded in the kidney part. The shell and soft tissue kidney parts was cast with a mixture of the synthetic polymer Polyvinyl Alchohol (PVA) and water. The kidney stone part was cast with epoxy glue. All three parts where designed to look like human tissue in CT and US images. The method is a pipeline of stages that starts with acquiring the CT image as a 3D matrix of intensity values. This matrix is then segmented, resulting in separate polygonal 3D models for the three phantom parts. A scan of the model is then performed using US, producing a sequence of US images. A computer program extracts easily recognizable image feature points from the images in the sequence. Knowing the spatial position and orientation of a new US image in which these features can be found again allows the position of the kidney to be calculated. The presented method is realized as a proof of concept implementation of the pipeline. The implementation displays an interactive visualization where the kidney is positioned according to a user-selected US image scanned for image features. Using the proof of concept implementation as a guide, the accuracy of the proposed method is estimated to be bounded by the acquired image data. For high resolution CT and US images, the accuracy can be in the order of a few millimeters.

APA, Harvard, Vancouver, ISO, and other styles

47

Persson, Martin. "Automatic Gait Recognition : using deep metric learning." Thesis, Linköpings universitet, Datorseende, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-167074.

Full text

Abstract:

Recent improvements in pose estimation has opened up the possibility of new areas of application. One of them is gait recognition, the task of identifying persons based on their unique style of walking, which is increasingly being recognized as an important method of biometric indentification. This thesis has explored the possibilities of using a pose estimation system, OpenPose, together with deep Recurrent Neural Networks (RNNs) in order to see if there is sufficient information in sequences of 2D poses to use for gait recognition. For this to be possible, a new multi-camera dataset consisting of persons walking on a treadmill was gathered, dubbed the FOI dataset. The results show that this approach has some promise. It achieved an overall classification accuracy of 95,5 % on classes it had seen during training and 83,8 % for classes it had not seen during training. It was unable to recognize sequences from angles it had not seen during training, however. For that to be possible, more data pre-processing will likely be required.

APA, Harvard, Vancouver, ISO, and other styles

48

Harms, Looström Julia, and Emma Frisk. "Bird's-eye view vision-system for heavy vehicles with integrated human-detection." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-54527.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Julin, Fredrik. "Vision based facial emotion detection using deep convolutional neural networks." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-42622.

Full text

Abstract:

Emotion detection, also known as Facial expression recognition, is the art of mapping an emotion to some sort of input data taken from a human. This is a powerful tool to extract valuable information from individuals which can be used as data for many different purposes, ranging from medical conditions such as depression to customer feedback. To be able to solve the problem of facial expression recognition, smaller subtasks are required and all of them together form the complete system to the problem. Breaking down the bigger task at hand, one can think of these smaller subtasks in the form of a pipeline that implements the necessary steps for classification of some input to then give an output in the form of emotion. In recent time with the rise of the art of computer vision, images are often used as input for these systems and have shown great promise to assist in the task of facial expression recognition as the human face conveys the subjects emotional state and contain more information than other inputs, such as text or audio. Many of the current state-of-the-art systems utilize computer vision in combination with another rising field, namely AI, or more specifically deep learning. These proposed methods for deep learning are in many cases using a special form of neural network called convolutional neural network that specializes in extracting information from images. Then performing classification using the SoftMax function, acting as the last part before the output in the facial expression pipeline. This thesis work has explored these methods of utilizing convolutional neural networks to extract information from images and builds upon it by exploring a set of machine learning algorithms that replace the more commonly used SoftMax function as a classifier, in attempts to further increase not only the accuracy but also optimize the use of computational resources. The work also explores different techniques for the face detection subtask in the pipeline by comparing two approaches. One of these approaches is more frequently used in the state-of-the-art and is said to be more viable for possible real-time applications, namely the Viola-Jones algorithm. The other is a deep learning approach using a state-of-the-art convolutional neural network to perform the detection, in many cases speculated to be too computationally intense to run in real-time. By applying a state-of-the-art inspired new developed convolutional neural network together with the SoftMax classifier, the final performance did not reach state-of-the-art accuracy. However, the machine-learning classifiers used shows promise and bypass the SoftMax function in performance in several cases when given a massively smaller number of samples as training. Furthermore, the results given from implementing and testing a pure deep learning approach, using deep learning algorithms for both the detection and classification stages of the pipeline, shows that deep learning might outperform the classic Viola-Jones algorithm in terms of both detection rate and frames per second.

APA, Harvard, Vancouver, ISO, and other styles

50

Johansson, Fredrik, and Oskar Dahl. "Autonomous Validation through Visual Inspection." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-34366.

Full text

Abstract:

The industrial testing phase of graphical user interfaces and the behaviour of screens, is still involving manual tests with human interaction. This type of testing is particularly difficult and time consuming to manually perform, due to time sensitive messages and information used within these interfaces. This thesis address this issue by introducing an approach to automate this process by utilizing high grade machine vision cameras and existing algorithm implementations from OpenCV 3.2.0. By knowing the expected graphical representation in advance, a comparison between the actual outcome and this expectation can be evaluated by applying image processing algorithms. It is found that this approach presents an Equal Error Rate of 6% while still maintaining a satisfactory time performance, in relation to the timeframe requirement of these time sensitive messages. Accuracy and time performance is profoundly affected by hardware equipment, partially due to the immense amount of image processing involved.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'OCR,Computer Vision'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles