Log in

Relevant bibliographies by topics / Real-time objects detection / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Real-time objects detection.

Dissertations / Theses on the topic 'Real-time objects detection'

Author: Grafiati

Published: 5 June 2025

Last updated: 24 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Real-time objects detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Olvång, Leif. "Real-time Collision Detection with Implicit Objects." Thesis, Uppsala University, Department of Information Technology, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-129453.

Full text

Abstract:

<p>Collision detection is a problem that has been studied in many different contexts. Lately one of the most common context has been rigid multi-body physics for different types of simulations.</p><p>A popular base algorithm in this context is Gilbert-Johnson-Keerthi's algorithm for measuring the distance between two convex objects. This algorithm belongs to a family of algorithms which share the common property of allowing implicitly defined objects.</p><p>In this thesis we give a theoretical overview of the algorithms in this family and discuss things to keep in mind when implementing them. We also give a presentation of how they behave in different situations based on our experiments. Finally we give recommendations for in which general cases one should use which algorithm.</p>

APA, Harvard, Vancouver, ISO, and other styles

2

Kumara, Muthukudage Jayantha. "Automated Real-time Objects Detection in Colonoscopy Videos for Quality Measurements." Thesis, University of North Texas, 2013. https://digital.library.unt.edu/ark:/67531/metadc283843/.

Full text

Abstract:

The effectiveness of colonoscopy depends on the quality of the inspection of the colon. There was no automated measurement method to evaluate the quality of the inspection. This thesis addresses this issue by investigating an automated post-procedure quality measurement technique and proposing a novel approach automatically deciding a percentage of stool areas in images of digitized colonoscopy video files. It involves the classification of image pixels based on their color features using a new method of planes on RGB (red, green and blue) color space. The limitation of post-procedure quality measurement is that quality measurements are available long after the procedure was done and the patient was released. A better approach is to inform any sub-optimal inspection immediately so that the endoscopist can improve the quality in real-time during the procedure. This thesis also proposes an extension to post-procedure method to detect stool, bite-block, and blood regions in real-time using color features in HSV color space. These three objects play a major role in quality measurements in colonoscopy. The proposed method partitions very large positive examples of each of these objects into a number of groups. These groups are formed by taking intersection of positive examples with a hyper plane. This hyper plane is named as 'positive plane'. 'Convex hulls' are used to model positive planes. Comparisons with traditional classifiers such as K-nearest neighbor (K-NN) and support vector machines (SVM) proves the soundness of the proposed method in terms of accuracy and speed that are critical in the targeted real-time quality measurement system.

APA, Harvard, Vancouver, ISO, and other styles

3

Karakas, Samet. "Detecting And Tracking Moving Objects With An Active Camera In Real Time." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613712/index.pdf.

Full text

Abstract:

Moving object detection techniques can be divided into two categories based on the type of the camera which is either static or active. Methods of static cameras can detect moving objects according to the variable regions on the video frame. However, the same method is not suitable for active cameras. The task of moving object detection for active cameras generally needs more complex algorithms and unique solutions. The aim of this thesis work is real time detection and tracking of moving objects with an active camera. For this purpose, feature based algorithms are implemented due to the computational efficiency of these kinds of algorithms and SURF (Speeded Up Robust Features) is mainly used for these algorithms. An algorithm is developed in C++ environment and OpenCV library is frequently used. The developed algorithm is capable of detecting and tracking moving objects by using a PTZ (Pan-Tilt-Zoom) camera at a frame rate of approximately 5 fps and with a resolution of 640x480.

APA, Harvard, Vancouver, ISO, and other styles

4

Söderlund, Henrik. "Real-time Detection and Tracking of Moving Objects Using Deep Learning and Multi-threaded Kalman Filtering : A joint solution of 3D object detection and tracking for Autonomous Driving." Thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160180.

Full text

Abstract:

Perception for autonomous drive systems is the most essential function for safe and reliable driving. LiDAR sensors can be used for perception and are vying for being crowned as an essential element in this task. In this thesis, we present a novel real-time solution for detection and tracking of moving objects which utilizes deep learning based 3D object detection. Moreover, we present a joint solution which utilizes the predictability of Kalman Filters to infer object properties and semantics to the object detection algorithm, resulting in a closed loop of object detection and object tracking.On one hand, we present YOLO++, a 3D object detection network on point clouds only. A network that expands YOLOv3, the latest contribution to standard real-time object detection for three-channel images. Our object detection solution is fast. It processes images at 20 frames per second. Our experiments on the KITTI benchmark suite show that we achieve state-of-the-art efficiency but with a mediocre accuracy for car detection, which is comparable to the result of Tiny-YOLOv3 on the COCO dataset. The main advantage with YOLO++ is that it allows for fast detection of objects with rotated bounding boxes, something which Tiny-YOLOv3 can not do. YOLO++ also performs regression of the bounding box in all directions, allowing for 3D bounding boxes to be extracted from a bird's eye view perspective. On the other hand, we present a Multi-threaded Object Tracking (MTKF) solution for multiple object tracking. Each unique observation is associated to a thread with a novel concurrent data association process. Each of the threads contain an Extended Kalman Filter that is used for predicting and estimating an associated object's state over time. Furthermore, a LiDAR odometry algorithm was used to obtain absolute information about the movement of objects, since the movement of objects are inherently relative to the sensor perceiving them. We obtain 33 state updates per second with an equal amount of threads to the number of cores in our main workstation.Even if the joint solution has not been tested on a system with enough computational power, it is ready for deployment. Using YOLO++ in combination with MTKF, our real-time constraint of 10 frames per second is satisfied by a large margin. Finally, we show that our system can take advantage of the predicted semantic information from the Kalman Filters in order to enhance the inference process in our object detection architecture.

APA, Harvard, Vancouver, ISO, and other styles

5

Hinterstoißer, Stefan Verfasser], Nassir [Akademischer Betreuer] [Navab, Bernt [Akademischer Betreuer] Schiele, and Kurt [Akademischer Betreuer] Konolige. "Real-time detection and pose estimation of low-textured and texture-less objects / Stefan Hinterstoißer. Gutachter: Bernt Schiele ; Kurt Konolige. Betreuer: Nassir Navab." München : Universitätsbibliothek der TU München, 2012. http://d-nb.info/1030099480/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Konstantinidis, Michalis. "Preimplantation genetic diagnosis : new methods for the detection of genetic abnormalities in human preimplantation embryos." Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:28611f65-7729-4293-9c3f-4fc3f0cc39d7.

Full text

Abstract:

Preimplantation genetic diagnosis (PGD) refers to the testing of embryos produced through in vitro fertilization (IVF) in order to identify those unaffected by a specific genetic disorder or chromosomal abnormality. In this study, different methodologies were examined and developed for performance of PGD. Investigation of various whole genome amplification (WGA) methods identified multiple displacement amplification as a reliable method for genotyping single cells. Furthermore, this technology was shown to be compatible with subsequent analysis using single nucleotide polymorphism (SNP) microarrays. Compared to conventional methods used in this study to perform single cell diagnosis (e.g. multiplex PCR), WGA techniques were found to be advantageous since they streamline the development of PGD protocols for couples at high risk of transmitting an inherited disorder and simultaneously offer the possibility of comprehensive chromosome screening (CCS). This study also aimed to develop a widely applicable protocol for accurate typing of the human leukocyte antigen (HLA) region with the purpose of identifying embryos that will be HLA-identical to an existing sibling affected by a disorder that requires haematopoietic stem cell transplantation. Additionally, a novel microarray platform was developed that, apart from accurate CCS, was capable of reliably determining the relative quantity of mitochondrial DNA in polar bodies removed from oocytes and single cells biopsied from embryos. Mitochondria are known to play an important role in oogenesis and preimplantation embryogenesis and their measurement may therefore be of clinical relevance. Moreover, real-time PCR was used for development of protocols for CCS, DNA fingerprinting of sperm samples and embryos and the relative quantitation of telomere length in embryos (since shortened telomeres might be associated with reduced viability). As well as considering the role of genetics in terms of oocyte and embryo viability assessment and the diagnosis of inherited genetic disorders, attention was given to a specific gene (Phospholipase C zeta) of relevance to male infertility. A novel mutation affecting the function of the resulting protein was discovered highlighting the growing importance of DNA sequence variants in the diagnosis and treatment of infertility.

APA, Harvard, Vancouver, ISO, and other styles

7

Morris, Gruffydd Beaufoy. "Autonomous real-time object detection and identification." Thesis, Lancaster University, 2017. http://eprints.lancs.ac.uk/88485/.

Full text

Abstract:

Sensor devices are regularly used on unmanned aerial vehicles (UAVs) as reconnaissance and intelligence gathering systems and as support for front line troops on operations. This platform provides a wealth of sensor data and has limited computational power available for processing. The objective of this work is to detect and identify objects in real-time, with a low power footprint so that it can operate on a UAV. An appraisal of current computer vision methods is presented, with reference to their performance and applicability to the objectives. Experimentation with real-time methods of background subtraction and motion estimation was carried out and limitations of each method described. A new, assumption free, data driven method for object detection and identification was developed. The core ideas of the development were based on models that propose that the human vision system analyses edges of objects to detect and separate them and perceives motion separately, a function which has been modelled here by optical flow. The initial development in the temporal domain combined object and motion detection in the analysis process. This approach was found to have limitations. The second iteration used a detection component in the spatial domain that extracts texture patches based on edge contours, their profile, and internal texture structure. Motion perception was performed separately on the texture patches using optical flow. The motion and spatial location of texture patches was used to define physical objects. A clustering method is used on the rich feature set extracted by the detection method to characterise the objects. The results show that the method carries out detection and identification of both moving and static objects, in real-time, irrespective of camera motion.

APA, Harvard, Vancouver, ISO, and other styles

8

Gunnarsson, Adam. "Real time object detection on a Raspberry Pi." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-89573.

Full text

Abstract:

With the recent advancement of deep learning, the performance of object detection techniques has greatly increased in both speed and accuracy. This has made it possible to run highly accurate object detection with real time speed on modern desktop computer systems. Recently, there has been a growing interest in developing smaller and faster deep neural network architectures suited for embedded devices. This thesis explores the suitability of running object detection on the Raspberry Pi 3, a popular embedded computer board. Two controlled experiments are conducted where two state of the art object detection models SSD and YOLO are tested in how they perform in accuracy and speed. The results show that the SSD model slightly outperforms YOLO in both speed and accuracy, but with the low processing power that the current generation of Raspberry Pi has to offer, none of the two performs well enough to be viable in applications where high speed is necessary.

APA, Harvard, Vancouver, ISO, and other styles

9

Chen, Meihong. "Real-Time Video Object Detection with Temporal Feature Aggregation." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42790.

Full text

Abstract:

In recent years, various high-performance networks have been proposed for single-image object detection. An obvious choice is to design a video detection network based on state-of-the-art single-image detectors. However, video object detection is still challenging due to the lower quality of individual frames in a video, and hence the need to include temporal information for high-quality detection results. In this thesis, we design a novel interleaved architecture combining a 2D convolutional network and a 3D temporal network. We utilize Yolov3 as the base detector. To explore inter-frame information, we propose feature aggregation based on a temporal network. Our temporal network utilizes Appearance-preserving 3D convolution (AP3D) for extracting aligned features in the temporal dimension. Our multi-scale detector and multi-scale temporal network communicate at each scale and also across scales. The number of inputs of our temporal network can be either 4, 8, or 16 frames in this thesis and correspondingly we name our temporal network TemporalNet-4, TemporalNet-8 and TemporalNet-16. Our approach achieves 77.1\% mAP (mean Average Precision) on ImageNet VID 2017 dataset with TemporalNet-4, where TemporalNet-16 achieves 80.9\% mAP which is a competitive result on this video object detection benchmark. Our network is also real-time with a running time of 35ms/frame.

APA, Harvard, Vancouver, ISO, and other styles

10

Tivelius, Malcolm. "Real-time Small Object Detection using Deep Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-300060.

Full text

Abstract:

Object detection is a research area within computer vision that consists of both localising and classifying objects in images. The applications of this kind of research in society are many, ranging from facial recognition to self driving cars. Some of these use cases requires the detection of objects in motion and are therefore considered to be in a separate category of object detection, commonly referred to as real time object detection. The goal of this thesis is to shed further light on the area of real time object detection by investigating the effectiveness of successful object detection techniques when applied to objects of smaller sizes. More specifically, the task of detecting small objects is described by the community as a difficult problem. This is also an area that has not been extensively researched before and the results could thus be used by the research community at large and/or for real life applications. This paper is a comparative study between the effectiveness of two different deep learning techniques within real time object detection, namely RetinaNet and YOLOv3. The objects used are small characters and digits that are engraved onto ball bearings. Ball bearings have been photographed while traveling on a production line, and a collection of such images are what constitutes the dataset used in this study. The goal is to classify as many characters and digits as possible on each bearing, with as low inference time as possible. The two deep learning models were implemented and then evaluated on their performance, measured in terms of precision and average inference time. The evaluation was performed on labeled bearings not previously seen by the two models. The results showthat RetinaNet vastly outperformsYOLOv3 when it comes to real-time object detection of small objects in terms of mAP@50. However, when it comes to average inference time YOLOv3 performed twice as fast as RetinaNet. In conclusion it can be noted that YOLOv3 struggles when it comes to smaller objects whereas RetinaNet excels in this area. It can also be concluded, from previous research, that an increase in mAP and average inference time is most likely limited by the hardware used during training. The verification of this could be a potential further investigation of this thesis<br>Objektdetektering är ett forskningsområde inom datorseende som går ut på att både lokalisera och klassificera objekt i bilder. Användingsområdena för den här typen av forskning är många och innefattar allt från ansiktsigenkänning till självkörande bilar. En del av användningsområdena kräver att man kan detektera objekt som är i rörelse. De här fallen ingår i ett separat forskningsområde som är känt som realtids-objektdetektering. Målet med den här studien är att belysa det här forsningsområdet ytterligare, genom att undersöka hur tidigare lyckade objektdetekteringsmodeller presterar när det kommer till små objekt. Det är ett område inom datorseende som inte har studerats extensivt tidigare. Vidare beskrivs detektering av små objekt generellt som ett svårt problem av forskningssamhället. Resultaten av den här studien skulle därför både kunna användas av andra forskare och tillämpas i praktiken för problem med samma konfigurationer. Den här studien är en jämförande studie mellan prestandan på två olika djupinlärningstekniker inom realtids-objektdetektering, som heter RetinaNet och YOLOv3. Objekten som har studerats för att besvara forskningsfrågan är små bokstäver och siffror som är inprintade på sidan av kullager. Kullagerna har fotats i en fabrik när de åker på en produktionslina, och en sammanställning av dessa fotografier är vad som utgör det dataset som använts för att träna och utvärdera modellerna. Målet är att klassificera så många av bokstäverna och siffrorna som möjligt, på en så kort tid som möjligt. De två djupinlärningsteknikerna har implementerats och deras prestanda har utvärderats, mätt i precision samt genomsnittlig inferenstid. Utvärderingen skedde på bilder med utmarkerade rätta svar som ingen av modellerna hade sett tidigare. Resultaten visar att RetinaNet presterar avsevärt bättre än YOLOv3 när det kommer till realtids-objektdetektering av små objekt med hänsyn till mAP@50. Dock, när det kommer till genomsnittlig inferenstid så är YOLOv3 dubbelt så snabb som RetinaNet. Slutsatsen kan dras att YOLOv3 har svårt att detektera små objekt, medan RetinaNet är relativt bra lämpad för det. Det kan också konkluderas att en ökning av prestanda mest troligt är begränsad av hårdvaran som använts under träning av modellerna. Det kan vara av intresse att vidare utforska det här i framtiden.

APA, Harvard, Vancouver, ISO, and other styles

11

Hammarkvist, Tom. "Automatic Annotation of Models for Object Classification in Real Time Object Detection." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-86061.

Full text

Abstract:

The times of manual labour are changing as automation grows larger and larger by the day. Self-driving vehicles being one of the more well known examples of automation (the vehicles in this thesis being those found in the construction industry), relies on a machine learning network to recognize its surroundings. To achieve this, the network needs a dataset. A dataset consists of two things, data, which usually come in the form of images, and annotated labels in order for it to learn what it sees. The labels is a descriptor that describes what objects exists in an image, and coordinates for where in the image these objects exists, and the area they occupy. As data is collected, it needs to be manually annotated, which can take several months to finish. With this in mind, is it possible to set up some form of semi-automatic annotation step that does a majority of the work? If so, what techniques can be used to achieve this? How does it compare to a dataset which have been annotated by a human? and is it even worth implementing in the first place? For this research, a dataset was collected where a remote controlled wheel loader approached a stationary dump truck, at various different angles, and during different conditions. Four videos were used in the trainingset, containing 679 images and their respective labels. Two other videos were used for the validationset, consisting of 120 images and their respective labels. The chosen object detector was YOLOv3, which has a low inference time and high accuracy. This helped with gathering results at a faster rate than what would've been possible if an older version was chosen. The method which was chosen for doing the automatic annotations was linear interpolation, which was implemented to work in conjunction with the labels of the trainingset to approximate the corresponding values. The interpolation was done at different frame gaps, a gap of 10 frames, a gap of 20 frames, all the way up to a gap of 60 frames. This was done in order to help locate a sweet spot, where the model had similar performance compared to the manually annotated dataset. The results showed that the fully manually annotated dataset approached a precision value of 0.8, a recall of 0.96, and a mean average precision (mAP) value of 0.95. Some of the models which had interpolated frames between a set gap, achieved similar results in the metrics, where interpolating between every 10th frame, every 20th frame, and every 30th frame, showed the most promise. They all approached precision values of around 0.8, a recall of around 0.94, and an mAP value of around 0.9.

APA, Harvard, Vancouver, ISO, and other styles

12

Treptow, André. "Optimization techniques for real time visual object detection and tracking." Berlin Logos-Verl, 2007. http://deposit.d-nb.de/cgi-bin/dokserv?id=2938420&prov=M&dok_var=1&dok_ext=htm.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

MOREIRA, GUSTAVO COSTA GOMES. "A METHOD FOR REAL-TIME OBJECT DETECTION IN HD VIDEOS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24507@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO<br>CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO<br>A detecção e o subsequente rastreamento de objetos em sequencias de vídeo é um desafio no que tange o processamento de vídeos em tempo real. Nesta tese propomos um método de detecção em tempo real adequado para o processamento de vídeos de alta definição. Neste método utilizamos um procedimento de segmentação de quadros usando as imagens integrais de frente, o que permite o rápido descarte de várias partes da imagem a cada quadro, desta maneira atingindo uma alta taxa de quadros processados por segundo. Estendemos ainda o algoritmo proposto para que seja possível detectar múltiplos objetos em paralelo. Além disto, através da utilização de uma GPU e técnicas que podem ter seu desempenho aumentado por meio de paralelismo, como o operador prefix sum, conseguimos atingir um desempenho ainda melhor do algoritmo, tanto para a detecção do objeto, como na etapa de treinamento de novas classes de objetos.<br>The detection and subsequent tracking of objects in video sequences is a challenge in terms of video processing in real time. In this thesis we propose an detection method suitable for processing high-definition video in real-time. In this method we use a segmentation procedure through integral image of the foreground, which allows a very quick disposal of various parts of the image in each frame, thus achieving a high rate of processed frames per second. Further we extend the proposed method to be able to detect multiple objects in parallel. Furthermore, by using a GPU and techniques that can have its performance enhanced through parallelism, as the operator prefix sum, we can achieve an even better performance of the algorithm, both for the detection of the object, as in the training stage of new classes of objects.

APA, Harvard, Vancouver, ISO, and other styles

14

Kalliomäki, Roger. "Real-time object detection for autonomous vehicles using deep learning." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-393999.

Full text

Abstract:

Self-driving systems are commonly categorized into three subsystems: perception, planning, and control. In this thesis, the perception problem is studied in the context of real-time object detection for autonomous vehicles. The problem is studied by implementing a cutting-edge real-time object detection deep neural network called Single Shot MultiBox Detector which is trained and evaluated on both real and virtual driving-scene data. The results show that modern real-time capable object detection networks achieve their fast performance at the expense of detection rate and accuracy. The Single Shot MultiBox Detector network is capable of processing images at over fifty frames per second, but scored a relatively low mean average precision score on a diverse driving- scene dataset provided by Berkeley University. Further development in both hardware and software technologies will presumably result in a better trade-off between run-time and detection rate. However, as the technologies stand today, general real-time object detection networks do not seem to be suitable for high precision tasks, such as visual perception for autonomous vehicles. Additionally, a comparison is made between two versions of the Single Shot MultiBox Detector network, one trained on a virtual driving-scene dataset from Ford Center for Autonomous Vehicles, and one trained on a subset of the earlier used Berkeley dataset. These results show that synthetic driving scene data possibly could be an alternative to real-life data when training object detecting networks

APA, Harvard, Vancouver, ISO, and other styles

15

Amplianitis, Konstantinos. "3D real time object recognition." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät, 2017. http://dx.doi.org/10.18452/17717.

Full text

Abstract:

Die Objekterkennung ist ein natürlicher Prozess im Menschlichen Gehirn. Sie ndet im visuellen Kortex statt und nutzt die binokulare Eigenschaft der Augen, die eine drei- dimensionale Interpretation von Objekten in einer Szene erlaubt. Kameras ahmen das menschliche Auge nach. Bilder von zwei Kameras, in einem Stereokamerasystem, werden von Algorithmen für eine automatische, dreidimensionale Interpretation von Objekten in einer Szene benutzt. Die Entwicklung von Hard- und Software verbessern den maschinellen Prozess der Objek- terkennung und erreicht qualitativ immer mehr die Fähigkeiten des menschlichen Gehirns. Das Hauptziel dieses Forschungsfeldes ist die Entwicklung von robusten Algorithmen für die Szeneninterpretation. Sehr viel Aufwand wurde in den letzten Jahren in der zweidimen- sionale Objekterkennung betrieben, im Gegensatz zur Forschung zur dreidimensionalen Erkennung. Im Rahmen dieser Arbeit soll demnach die dreidimensionale Objekterkennung weiterent- wickelt werden: hin zu einer besseren Interpretation und einem besseren Verstehen von sichtbarer Realität wie auch der Beziehung zwischen Objekten in einer Szene. In den letzten Jahren aufkommende low-cost Verbrauchersensoren, wie die Microsoft Kinect, generieren Farb- und Tiefendaten einer Szene, um menschenähnliche visuelle Daten zu generieren. Das Ziel hier ist zu zeigen, wie diese Daten benutzt werden können, um eine neue Klasse von dreidimensionalen Objekterkennungsalgorithmen zu entwickeln - analog zur Verarbeitung im menschlichen Gehirn.<br>Object recognition is a natural process of the human brain performed in the visual cor- tex and relies on a binocular depth perception system that renders a three-dimensional representation of the objects in a scene. Hitherto, computer and software systems are been used to simulate the perception of three-dimensional environments with the aid of sensors to capture real-time images. In the process, such images are used as input data for further analysis and development of algorithms, an essential ingredient for simulating the complexity of human vision, so as to achieve scene interpretation for object recognition, similar to the way the human brain perceives it. The rapid pace of technological advancements in hardware and software, are continuously bringing the machine-based process for object recognition nearer to the inhuman vision prototype. The key in this eld, is the development of algorithms in order to achieve robust scene interpretation. A lot of recognisable and signi cant e ort has been successfully carried out over the years in 2D object recognition, as opposed to 3D. It is therefore, within this context and scope of this dissertation, to contribute towards the enhancement of 3D object recognition; a better interpretation and understanding of reality and the relationship between objects in a scene. Through the use and application of low-cost commodity sensors, such as Microsoft Kinect, RGB and depth data of a scene have been retrieved and manipulated in order to generate human-like visual perception data. The goal herein is to show how RGB and depth information can be utilised in order to develop a new class of 3D object recognition algorithms, analogous to the perception processed by the human brain.

APA, Harvard, Vancouver, ISO, and other styles

16

Lee, Young Jin. "Real-Time Object Motion and 3D Localization from Geometry." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1408443773.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Murray, Samuel. "Real-Time Multiple Object Tracking : A Study on the Importance of Speed." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-215117.

Full text

Abstract:

Multiple object tracking consists of detecting and identifying objects in video. In some applications, such as robotics and surveillance, it is desired that the tracking is performed in real-time. This poses a challenge in that it requires the algorithm to run as fast as the frame-rate of the video. Today's top performing tracking methods run at only a few frames per second, and can thus not be used in real-time. Further, when determining the speed of the tracker, it is common to not include the time it takes to detect objects. We argue that this way of measuring speed is not relevant for robotics or embedded systems, where the detecting of objects is done on the same machine as the tracking. We propose that one way of running a method in real-time is to not look at every frame, but skip frames to make the video have the same frame-rate as the tracking method. However, we believe that this will lead to decreased performance. In this project, we implement a multiple object tracker, following the tracking-by-detection paradigm, as an extension of an existing method. It works by modelling the movement of objects by solving the filtering problem, and associating detections with predicted new locations in new frames using the Hungarian algorithm. Three different similarity measures are used, which use the location and shape of the bounding boxes. Compared to other trackers on the MOTChallenge leaderboard, our method, referred to as C++SORT, is the fastest non-anonymous submission, while also achieving decent score on other metrics. By running our model on the Okutama-Action dataset, sampled at different frame-rates, we show that the performance is greatly reduced when running the model - including detecting objects - in real-time. In most metrics, the score is reduced by 50%, but in certain cases as much as 90%. We argue that this indicates that other, slower methods could not be used for tracking in real-time, but that more research is required specifically on this.<br>För att spåra rörliga objekt i video (eng: multiple object tracking) krävs att man lokaliserar och identifierar dem. I vissa tillämpningar, såsom robotik och övervakning, kan det krävas att detta görs i realtid, vilket kan vara svårt i praktiken, då det förutsätter att algoritmen kan köras lika fort som videons bildfrekvensen. De kraftfullaste algoritmerna idag kan bara analysera ett fåtal bildrutor per sekund, och lämpar sig därför inte för realtidsanvändning. Dessutom brukar tiden per bildruta inte inkludera den tid det tar att lokalisera objekt, när hastigheten av en algoritm presenteras. Vi anser att det sättet att beräkna hastigheten inte är lämpligt inom robotik eller inbyggda system, där lokaliseringen och identifiering av objekt sker på samma maskin. Många algoritmer kan köras i realtid genom att hoppa över det antal bildrutor i videon som krävs för att bildfrekvensen ska bli densamma som algoritmens frekvens. Dock tror vi att detta leder till sämre prestanda. I det här projektet implementerar vi en algoritm för att identifiera rörliga objekt. Vår algoritm bygger på befintliga metoder inom paradigmen tracking-by-detection (ung. spårning genom detektion). Algoritmen uppskattar hastigheten hos varje objekt genom att lösa ett filtreringsproblem. Utifrån hastigheten beräknas en förväntad ny position, som kopplas till nya observationer med hjälp av Kuhn–Munkres algoritm. Tre olika likhetsmått används, som på olika sätt kombinerar positionen för och formen på objekten. Vår metod, C++SORT, är den snabbaste icke-anonyma metoden publicerad på MOTChallenge. Samtidigt presterar den bra enligt flera andra mått. Genom att testa vår algoritm på video från Okutama-Action, med varierande bildfrekvens, kan vi visa att prestandan sjunker kraftigt när hela modellen - inklusive att lokalisera objekt - körs i realtid. Prestandan enligt de flesta måtten sjunker med 50%, men i vissa fall med så mycket som 90%. Detta tyder på att andra, långsammare metoder inte kan användas i realtid, utan att mer forskning, specifikt inriktad på spårning i realtid, behövs.

APA, Harvard, Vancouver, ISO, and other styles

18

Katramados, Ioannis. "Real-time object detection using monocular vision for low-cost automotive sensing systems." Thesis, Cranfield University, 2013. http://dspace.lib.cranfield.ac.uk/handle/1826/10386.

Full text

Abstract:

This work addresses the problem of real-time object detection in automotive environments using monocular vision. The focus is on real-time feature detection, tracking, depth estimation using monocular vision and finally, object detection by fusing visual saliency and depth information. Firstly, a novel feature detection approach is proposed for extracting stable and dense features even in images with very low signal-to-noise ratio. This methodology is based on image gradients, which are redefined to take account of noise as part of their mathematical model. Each gradient is based on a vector connecting a negative to a positive intensity centroid, where both centroids are symmetric about the centre of the area for which the gradient is calculated. Multiple gradient vectors define a feature with its strength being proportional to the underlying gradient vector magnitude. The evaluation of the Dense Gradient Features (DeGraF) shows superior performance over other contemporary detectors in terms of keypoint density, tracking accuracy, illumination invariance, rotation invariance, noise resistance and detection time. The DeGraF features form the basis for two new approaches that perform dense 3D reconstruction from a single vehicle-mounted camera. The first approach tracks DeGraF features in real-time while performing image stabilisation with minimal computational cost. This means that despite camera vibration the algorithm can accurately predict the real-world coordinates of each image pixel in real-time by comparing each motion-vector to the ego-motion vector of the vehicle. The performance of this approach has been compared to different 3D reconstruction methods in order to determine their accuracy, depth-map density, noise-resistance and computational complexity. The second approach proposes the use of local frequency analysis of i ii gradient features for estimating relative depth. This novel method is based on the fact that DeGraF gradients can accurately measure local image variance with subpixel accuracy. It is shown that the local frequency by which the centroid oscillates around the gradient window centre is proportional to the depth of each gradient centroid in the real world. The lower computational complexity of this methodology comes at the expense of depth map accuracy as the camera velocity increases, but it is at least five times faster than the other evaluated approaches. This work also proposes a novel technique for deriving visual saliency maps by using Division of Gaussians (DIVoG). In this context, saliency maps express the difference of each image pixel is to its surrounding pixels across multiple pyramid levels. This approach is shown to be both fast and accurate when evaluated against other state-of-the-art approaches. Subsequently, the saliency information is combined with depth information to identify salient regions close to the host vehicle. The fused map allows faster detection of high-risk areas where obstacles are likely to exist. As a result, existing object detection algorithms, such as the Histogram of Oriented Gradients (HOG) can execute at least five times faster. In conclusion, through a step-wise approach computationally-expensive algorithms have been optimised or replaced by novel methodologies to produce a fast object detection system that is aligned to the requirements of the automotive domain.

APA, Harvard, Vancouver, ISO, and other styles

19

Limongiello, Alessandro. "Real-time video analysis from a mobile platform : moving object and obstacle detection." Lyon, INSA, 2007. http://www.theses.fr/2007ISAL0036.

Full text

Abstract:

We introduce a vision system for autonomous navigation of a mobile platform. This system is able to interact with is immediate environment by recognizing obstacles and moving objects and means of a stable representation of external world. This system is made of 3 components : external world representation, obstacles detection and avoidance, behavioral analysis. The main contribution of this work lies in the perceptive representation of the external world, e. G. A representation compared to the final goal of autonomous navigation. This representation is based on the stereovision paradigm and is able to determine in the scene obstacles and moving objects. Our approach returns the depth of any region. The location estimation of regions is precise with respect to navigation requirements and the system is fast enough for real time applications<br>Nous présentons un système de vision pour la navigation autonome d’une plateforme mobile. Ce système est en mesure d’interagir avec l’espace immédiatement environnant, en reconnaissant les obstacles et les objets en mouvement et en construisant une vision stable du monde extérieur. Le système est composé de trois composants : la représentation dans l’espace environnant ; la détection et l’évitement des obstacles et l’analyse comportementale. La contribution majeure de ce travail concerne la représentation « perceptive » de l’espace, c’est-à-dire une représentation qui est comparée à l’objectif final de la navigation autonome. Cette représentation est basée sur le paradigme de la vision stéréo et elle permet de déterminer dans la scène les obstacles et les objets en mouvement. Notre méthode fournit la profondeur moyenne par région. L’estimation de la position des régions est suffisamment précise pour la navigation et le système est assez rapide pour les applications en temps réel

APA, Harvard, Vancouver, ISO, and other styles

20

Coelho, Gavin. "Ota-quadrotor: An Object-tracking Autonomous Quadrotor for Real-time Detection and Recognition." Thesis, University of North Texas, 2012. https://digital.library.unt.edu/ark:/67531/metadc115056/.

Full text

Abstract:

The field of robotics and mechatronics is advancing at an ever-increasing rate and we are starting to see robots making the transition from the factories to the workplace and homes as cost is reduced and they become more useful. In recent years quadrotors have become a popular unmanned air vehicle (UAV) platform. These UAVs or micro air vehicles (MAV) are being used for many new and exciting applications such as aerial monitoring of wildlife, disaster sites, riots and protests. They are also being used in the film industry, as they are significantly cheaper means of getting aerial footage. While quadrotors are not extremely expensive a good system can cost in the range of $3000 - $8000 and thus too costly as a research platform for many. There are a number of cheaper open source platforms. The ArduCopter is under constant development, has the largest community and is inexpensive making it an ideal platform to work with. The goal of this thesis was to implement video processing on a ground control station allowing for the ArduCopter to track moving objects. This was achieved by using the OpenCV video-processing library to implement object tracking and the MAVLink communication protocol, available on the ArduCopter platform, for communication.

APA, Harvard, Vancouver, ISO, and other styles

21

Tran, Antoine. "Object representation in local feature spaces : application to real-time tracking and detection." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLY010/document.

Full text

Abstract:

La représentation visuelle est un problème fondamental en vision par ordinateur. Le but est de réduire l'information au strict nécessaire pour une tâche désirée. Plusieurs types de représentation existent, comme les caractéristiques de couleur (histogrammes, attributs de couleurs...), de forme (dérivées, points d'intérêt...) ou d'autres, comme les bancs de filtres.Les caractéristiques bas-niveau (locales) sont rapides à calculer. Elles ont un pouvoir de représentation limité, mais leur généricité présente un intérêt pour des systèmes autonomes et multi-tâches, puisque les caractéristiques haut-niveau découlent d'elles.Le but de cette thèse est de construire puis d'étudier l'impact de représentations fondées seulement sur des caractéristiques locales de bas-niveau (couleurs, dérivées spatiales) pour deux tâches : la poursuite d'objets génériques, nécessitant des caractéristiques robustes aux variations d'aspect de l'objet et du contexte au cours du temps; la détection d'objets, où la représentation doit décrire une classe d'objets en tenant compte des variations intra-classe. Plutôt que de construire des descripteurs d'objets globaux dédiés, nous nous appuyons entièrement sur les caractéristiques locales et sur des mécanismes statistiques flexibles visant à estimer leur distribution (histogrammes) et leurs co-occurrences (Transformée de Hough Généralisée). La Transformée de Hough Généralisée (THG), créée pour la détection de formes quelconques, consiste à créer une structure de données représentant un objet, une classe... Cette structure, d'abord indexée par l'orientation du gradient, a été étendue à d'autres caractéristiques. Travaillant sur des caractéristiques locales, nous voulons rester proche de la THG originale.En poursuite d'objets, après avoir présenté nos premiers travaux, combinant la THG avec un filtre particulaire (utilisant un histogramme de couleurs), nous présentons un algorithme plus léger et rapide (100fps), plus précis et robuste. Nous présentons une évaluation qualitative et étudierons l'impact des caractéristiques utilisées (espace de couleur, formulation des dérivées partielles...). En détection, nous avons utilisé l'algorithme de Gall appelé forêts de Hough. Notre but est de réduire l'espace de caractéristiques utilisé par Gall, en supprimant celles de type HOG, pour ne garder que les dérivées partielles et les caractéristiques de couleur. Pour compenser cette réduction, nous avons amélioré deux étapes de l'entraînement : le support des descripteurs locaux (patchs) est partiellement produit selon une mesure géométrique, et l'entraînement des nœuds se fait en générant une carte de probabilité spécifique prenant en compte les patchs utilisés pour cette étape. Avec l'espace de caractéristiques réduit, le détecteur n'est pas plus précis. Avec les mêmes caractéristiques que Gall, sur une même durée d'entraînement, nos travaux ont permis d'avoir des résultats identiques, mais avec une variance plus faible et donc une meilleure répétabilité<br>Visual representation is a fundamental problem in computer vision. The aim is to reduce the information to the strict necessary for a query task. Many types of representation exist, like color features (histograms, color attributes...), shape ones (derivatives, keypoints...) or filterbanks.Low-level (and local) features are fast to compute. Their power of representation are limited, but their genericity have an interest for autonomous or multi-task systems, as higher level ones derivate from them. We aim to build, then study impact of low-level and local feature spaces (color and derivatives only) for two tasks: generic object tracking, requiring features robust to object and environment's aspect changes over the time; object detection, for which the representation should describe object class and cope with intra-class variations.Then, rather than using global object descriptors, we use entirely local features and statisticals mecanisms to estimate their distribution (histograms) and their co-occurrences (Generalized Hough Transform).The Generalized Hough Transform (GHT), created for detection of any shape, consists in building a codebook, originally indexed by gradient orientation, then to diverse features, modeling an object, a class. As we work on local features, we aim to remain close to the original GHT.In tracking, after presenting preliminary works combining the GHT with a particle filter (using color histograms), we present a lighter and fast (100 fps) tracker, more accurate and robust.We present a qualitative evaluation and study the impact of used features (color space, spatial derivative formulation).In detection, we used Gall's Hough Forest. We aim to reduce Gall's feature space and discard HOG features, to keep only derivatives and color ones.To compensate the reduction, we enhanced two steps: the support of local descriptors (patches) are partially chosen using a geometrical measure, and node training is done by using a specific probability map based on patches used at this step.With reduced feature space, the detector is less accurate than with Gall's feature space, but for the same training time, our works lead to identical results, but with higher stability and then better repeatability

APA, Harvard, Vancouver, ISO, and other styles

22

Tran, Antoine. "Object representation in local feature spaces : application to real-time tracking and detection." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLY010.

Full text

Abstract:

La représentation visuelle est un problème fondamental en vision par ordinateur. Le but est de réduire l'information au strict nécessaire pour une tâche désirée. Plusieurs types de représentation existent, comme les caractéristiques de couleur (histogrammes, attributs de couleurs...), de forme (dérivées, points d'intérêt...) ou d'autres, comme les bancs de filtres.Les caractéristiques bas-niveau (locales) sont rapides à calculer. Elles ont un pouvoir de représentation limité, mais leur généricité présente un intérêt pour des systèmes autonomes et multi-tâches, puisque les caractéristiques haut-niveau découlent d'elles.Le but de cette thèse est de construire puis d'étudier l'impact de représentations fondées seulement sur des caractéristiques locales de bas-niveau (couleurs, dérivées spatiales) pour deux tâches : la poursuite d'objets génériques, nécessitant des caractéristiques robustes aux variations d'aspect de l'objet et du contexte au cours du temps; la détection d'objets, où la représentation doit décrire une classe d'objets en tenant compte des variations intra-classe. Plutôt que de construire des descripteurs d'objets globaux dédiés, nous nous appuyons entièrement sur les caractéristiques locales et sur des mécanismes statistiques flexibles visant à estimer leur distribution (histogrammes) et leurs co-occurrences (Transformée de Hough Généralisée). La Transformée de Hough Généralisée (THG), créée pour la détection de formes quelconques, consiste à créer une structure de données représentant un objet, une classe... Cette structure, d'abord indexée par l'orientation du gradient, a été étendue à d'autres caractéristiques. Travaillant sur des caractéristiques locales, nous voulons rester proche de la THG originale.En poursuite d'objets, après avoir présenté nos premiers travaux, combinant la THG avec un filtre particulaire (utilisant un histogramme de couleurs), nous présentons un algorithme plus léger et rapide (100fps), plus précis et robuste. Nous présentons une évaluation qualitative et étudierons l'impact des caractéristiques utilisées (espace de couleur, formulation des dérivées partielles...). En détection, nous avons utilisé l'algorithme de Gall appelé forêts de Hough. Notre but est de réduire l'espace de caractéristiques utilisé par Gall, en supprimant celles de type HOG, pour ne garder que les dérivées partielles et les caractéristiques de couleur. Pour compenser cette réduction, nous avons amélioré deux étapes de l'entraînement : le support des descripteurs locaux (patchs) est partiellement produit selon une mesure géométrique, et l'entraînement des nœuds se fait en générant une carte de probabilité spécifique prenant en compte les patchs utilisés pour cette étape. Avec l'espace de caractéristiques réduit, le détecteur n'est pas plus précis. Avec les mêmes caractéristiques que Gall, sur une même durée d'entraînement, nos travaux ont permis d'avoir des résultats identiques, mais avec une variance plus faible et donc une meilleure répétabilité<br>Visual representation is a fundamental problem in computer vision. The aim is to reduce the information to the strict necessary for a query task. Many types of representation exist, like color features (histograms, color attributes...), shape ones (derivatives, keypoints...) or filterbanks.Low-level (and local) features are fast to compute. Their power of representation are limited, but their genericity have an interest for autonomous or multi-task systems, as higher level ones derivate from them. We aim to build, then study impact of low-level and local feature spaces (color and derivatives only) for two tasks: generic object tracking, requiring features robust to object and environment's aspect changes over the time; object detection, for which the representation should describe object class and cope with intra-class variations.Then, rather than using global object descriptors, we use entirely local features and statisticals mecanisms to estimate their distribution (histograms) and their co-occurrences (Generalized Hough Transform).The Generalized Hough Transform (GHT), created for detection of any shape, consists in building a codebook, originally indexed by gradient orientation, then to diverse features, modeling an object, a class. As we work on local features, we aim to remain close to the original GHT.In tracking, after presenting preliminary works combining the GHT with a particle filter (using color histograms), we present a lighter and fast (100 fps) tracker, more accurate and robust.We present a qualitative evaluation and study the impact of used features (color space, spatial derivative formulation).In detection, we used Gall's Hough Forest. We aim to reduce Gall's feature space and discard HOG features, to keep only derivatives and color ones.To compensate the reduction, we enhanced two steps: the support of local descriptors (patches) are partially chosen using a geometrical measure, and node training is done by using a specific probability map based on patches used at this step.With reduced feature space, the detector is less accurate than with Gall's feature space, but for the same training time, our works lead to identical results, but with higher stability and then better repeatability

APA, Harvard, Vancouver, ISO, and other styles

23

Falk, Hampus. "Airborne Aircraft Detection for Multi-rotor Drones Feasibility Study of Robust Real-Time Long Distance Object Detection." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-450135.

Full text

Abstract:

A drone is a light unmanned aerial vehicle capable of precise and agile movement. As these traits and the usability of drones are recognized in more domains, the necessity to ensure a safe airspace increases. To minimize the risk of airborne collision, this paper aims to investigate the feasibility of real-time object detectionusing convolutional neural networks to detect aircrafts from distances over 1000 meters. Early detection of aircrafts increases the drone operator's overall time for avoidance, however make aircrafts display little to no distinguishing features, making object detection difficult. To test its applicability, the object detection model is incorporated in a sense-and-warn system to provide an end- to-end solution, requiring only average computational capabilities and a drone with a monocular camera. Results generated from a virtual environment show that detections far exceed the target of 1000 meters and is able to efficiently detect, track and estimate collisions of airborne aircrafts. Compared to a human observer, the proposed system is able to detect object at approximately twice the distance.

APA, Harvard, Vancouver, ISO, and other styles

24

Letzler, Daniel Charles 1975. "Surface detection and object recognition in a real-time three-dimensional ultrasonic imaging system." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80233.

Full text

Abstract:

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.<br>Includes bibliographical references (leaf 138).<br>by Daniel Charles Letzler.<br>M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

25

Liljeby, Jonas. "An evaluation of grid based broad phase collision detection for real time interactive environments." Thesis, Högskolan i Gävle, Avdelningen för Industriell utveckling, IT och Samhällsbyggnad, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-9591.

Full text

Abstract:

Detailed and exact collision detection for large amounts of objects has for a long time been a non real-time affair because of the immense amount of computations necessary. This was however not only because of the complexity of the algorithms but also because discussed of the computations would not have had to be done in the first place. This paper has through literature research and empirical testing examined two different broad phase approaches to object culling in a three dimensional environment. The aim of such a broad phase algorithm is to decrease the amount of computation heavy narrow phase collision detection checks and thus enhancing application performance. Potential weaknesses of these approaches were addressed and possible solutions discussed. Performance comparisons were made to give a better overview of what kind of performance enhancements can be expected and to give a theoretical base for further research.

APA, Harvard, Vancouver, ISO, and other styles

26

Dimovski, David, and Andersson Johan Hammargren. "Validation of a real-time automated production-monitoring system." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-44810.

Full text

Abstract:

In today’s industry, companies are, to an increasing degree, beginning to embrace the concept of industry 4.0. One of these companies is Diab which has a factory located in Laholm where they manufacture composite material. Some of the machines at the factory are older with outdated control systems and require a way to log data in real-time. The goal of the project is to create a working prototype system that can monitor the production flow in real-time by using sensors to collect data about the work efficiency of the machine, measuring the idle time when the machine is working and when it’s not and storing this data in a database which can be accessible by a Graphical User Interface (GUI). The purpose is to investigate the requirements to get a fully operatable system and what it takes to maintain it to get an idea if the system should be self-developed by the company or buy/license from a third party. The system was built by using a NodeMCU ESP32, a Raspberry Pi 4B and a SparkFun DistanceSensor Breakout VL53L1X, and for the software to program the NodeMCU ESP32, Arduino IDE was used; Java language was used to develop the server on the Raspberry Pi and, together with MariaDB, to store the data. The tests that were conducted showed that the data could be displayed within a second in the created GUI but could not guarantee a reading of a passing block; however, it gave a good overview of the workflow of the machine. An improvement of the system is suggested by using visual-based object detection. An overview of the production in real-time can allow for future possibilities of optimising the production flow and, with an improvement of the system, can increase the automation of the production, which can bring the company closer to the concept of industry 4.0.

APA, Harvard, Vancouver, ISO, and other styles

27

Glynn, Patrick Joseph, and n/a. "Collision Avoidance Systems for Mine Haul Trucks and Unambiguous Dynamic Real Time Single Object Detection." Griffith University. Griffith Business School, 2005. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20060809.163025.

Full text

Abstract:

A suite of new collision avoidance systems (CAS) is presented for use in heavy vehicles whose structure and size necessarily impede driver visibility is introduced. The main goal of the project is to determine the appropriate use of each of the commercially available technologies and, where possible, produce a low cost variant suitable for use in proximity detection on large mining industry haul trucks. CAS variants produced were subjected to a field demonstration and, linked to the output from the earlier CAS 1 project, (a production high-definition in-cabin video monitor and r/f tagging system). The CAS 2 system used low cost Doppler continuous wave radar antennae coupled to the CAS 1 monitor to indicate the presence of an object moving at any speed above 3 Km/h relative to the antennae. The novelty of the CAS 3 system lies in the design of 3 interconnected, modules. The modules are 8 radar antennae (as used in CAS 2) modules located on the truck, software to interface with the end user (i.e. the drivers of the trucks) and a display unit. Modularisation enables the components to be independently tested, evaluated and replaced when in use. The radar antennae modules and the system as a whole are described together with the empirical tests conducted and results obtained. The tests, drawing on Monte-Carlo simulation techniques, demonstrate both the 'correctness' of the implementations and the effectiveness of the system. The results of the testing of the final prototype unit were highly successful both as a computer simulation level and in practical tests on light vehicles. A number of points, (as a consequence of the field test), are reviewed and their application to future projects discussed.

APA, Harvard, Vancouver, ISO, and other styles

28

Glynn, Patrick Joseph. "Collision Avoidance Systems for Mine Haul Trucks and Unambiguous Dynamic Real Time Single Object Detection." Thesis, Griffith University, 2005. http://hdl.handle.net/10072/365488.

Full text

Abstract:

A suite of new collision avoidance systems (CAS) is presented for use in heavy vehicles whose structure and size necessarily impede driver visibility is introduced. The main goal of the project is to determine the appropriate use of each of the commercially available technologies and, where possible, produce a low cost variant suitable for use in proximity detection on large mining industry haul trucks. CAS variants produced were subjected to a field demonstration and, linked to the output from the earlier CAS 1 project, (a production high-definition in-cabin video monitor and r/f tagging system). The CAS 2 system used low cost Doppler continuous wave radar antennae coupled to the CAS 1 monitor to indicate the presence of an object moving at any speed above 3 Km/h relative to the antennae. The novelty of the CAS 3 system lies in the design of 3 interconnected, modules. The modules are 8 radar antennae (as used in CAS 2) modules located on the truck, software to interface with the end user (i.e. the drivers of the trucks) and a display unit. Modularisation enables the components to be independently tested, evaluated and replaced when in use. The radar antennae modules and the system as a whole are described together with the empirical tests conducted and results obtained. The tests, drawing on Monte-Carlo simulation techniques, demonstrate both the 'correctness' of the implementations and the effectiveness of the system. The results of the testing of the final prototype unit were highly successful both as a computer simulation level and in practical tests on light vehicles. A number of points, (as a consequence of the field test), are reviewed and their application to future projects discussed.<br>Thesis (PhD Doctorate)<br>Doctor of Philosophy (PhD)<br>Griffith Business School<br>Full Text

APA, Harvard, Vancouver, ISO, and other styles

29

De, Lucas Enrique. "Reducing redundancy of real time computer graphics in mobile systems." Doctoral thesis, Universitat Politècnica de Catalunya, 2018. http://hdl.handle.net/10803/552955.

Full text

Abstract:

The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem. First, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. VRO just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average. Then, we focus on avoiding redundant computations related to CPU Collision Detection (CD). Graphics applications such as 3D games represent a large percentage of downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. CD is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene and determines when they collide. However, real-time accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8\%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application.<br>El objetivo de esta tesis es proponer técnicas efectivas y originales para eliminar computaciones inútiles que aparecen en aplicaciones gráficas, con especial énfasis en micro-arquitectura de GPUs. Mejorar la eficiencia energética de los sistemas CPU/GPU no es solo clave para alargar la vida de la batería, sino también incrementar su rendimiento. Estudios previos han apuntado que la CPU y especialmente la GPU son los principales consumidores de energía en el sub-sistema gráfico, siendo los accesos a memoria off-chip y los procesadores dentro de la GPU los principales consumidores de energía del sub-sistema gráfico. Primero, nos hemos centrado en reducir computaciones redundantes de la fase de fragment processing mediante la mejora en la eliminación de superficies ocultas. Durante el renderizado de gráficos en tiempo real, los objetos son procesados por la GPU en el orden en el que son enviados por la CPU, y las superficies ocultas son a menudo procesadas incluso si no no acaban formando parte de la imagen final. Cuando la GPU averigua que el objeto o parte de él no es visible, toda la actividad requerida para computar su color y guardarlo ha sido realizada. Proponemos una técnica arquitectónica original para GPUs móviles, Visibility Rendering Order (VRO), la cual reordena los objetos de delante hacia atrás por completo en hardware para maximizar la efectividad del culling de la GPU y así minimizar el overshading, y por lo tanto reducir el tiempo de ejecución y el consumo de energía. VRO explota el hecho de que los objetos de las aplicaciones gráficas animadas tienden a mantener su orden relativo en profundidad a través de frames consecutivos (coherencia temporal) para proveer animaciones con transiciones suaves. Dado que las relaciones de orden en profundidad entre objetos son testeadas en la GPU, VRO introduce costes mínimos en energía. Solo requiere añadir una pequeña unidad hardware para capturar la información de visibilidad. Además, VRO trabaja en paralelo con el pipeline gráfico, por lo que introduce costes insignificantes en tiempo. Ilustramos los beneficios de VRO usango varias aplicaciones 3D comerciales para las cuales VRO consigue un 27% de speed-up y un 14.8% de reducción de energía en media. En segundo lugar, evitamos computaciones redundantes relacionadas con la Detección de Colisiones (CD) en la CPU. Las aplicaciones gráficas animadas como los juegos 3D representan un alto porcentaje de las aplicaciones descargadas en dispositivos móviles y la tendencia es hacia escenas más complejas y realistas con simulaciones físicas 3D precisas. La CD es uno de los algoritmos más importantes entre los kernel de físicas dado que identifica los puntos de contacto entre los objetos de una escena. Sin embargo, una CD en tiempo real y precisa es muy costosa en términos de consumo energético. Proponemos Render Based Collision Detection (RBCD), una técnica energéticamente eficiente y preciso de CD que utiliza resultados intermedios del rendering pipeline para realizar la CD. Comparando RBCD con una CD convencional completamente ejecutada en la CPU, mostramos que el tiempo de ejecución es reducido casi tres órdenes de magnitud (600x speedup), porque la mayoría de la CD de nuestro modelo reusa resultados intermedios del renderizado de la imagen. Aunque no es así necesariamente, esta espectacular en tiempo puede resultar en mejores frames por segundo si la simulación de físicas está en el camino crítico. Sin embargo, la ventaja más importante de nuestra técnica es el enorme ahorro de energía que resulta de eliminar las largas y costosas computaciones en la CPU, sustituyéndolas por unas pocas operaciones ejecutadas en un hardware especializado dentro de la GPU. Nuestros resultados muestran que la energía consumida por la CD es reducidad en media por un factor de 448x. Estos dramáticos beneficios vienen acompañados de una mayor fidelidad en la CD (i.e. con granularidad más fina)

APA, Harvard, Vancouver, ISO, and other styles

30

Mahammad, Sarfaraz Ahmad, and Vendrapu Sushma. "Raspberry Pi Based Vision System for Foreign Object Debris (FOD) Detection." Thesis, Blekinge Tekniska Högskola, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20198.

Full text

Abstract:

Background: The main purpose of this research is to design and develop a cost-effective system for detection of Foreign Object Debris (FOD), dedicated to airports. FOD detection has been a significant problem at airports as it can cause damage to aircraft. Developing such a device to detect FOD may require complicated hardware and software structures. The proposed solution is based on a computer vision system, which comprises of flexible off the shelf components such as a Raspberry Pi and Camera Module, allowing the simplistic and efficient way to detect FOD. Methods: The solution to this research is achieved through User-centered design, which implies to design a system solution suitably and efficiently. The system solution specifications, objectives and limitations are derived from this User-centered design. The possible technologies are concluded from the required functionalities and constraints to obtain a real-time efficient FOD detection system. Results: The results are obtained using background subtraction for FOD detection and implementation of SSD (single-shot multi-box detector) model for FOD classification. The performance evaluation of the system is analysed by testing the system to detect FOD of different size for different distances. The web design is also implemented to notify the user in real-time when there is an occurrence of FOD. Conclusions: We concluded that the background subtraction and SSD model are the most suitable algorithms for the solution design with Raspberry Pi to detect FOD in a real-time system. The system performs in real-time, giving the efficiency of 84% for detecting medium-sized FOD such as persons at a distance of 75 meters and 72% efficiency for detecting large-sized FOD such as cars at a distance of 125 meters, and the average frame per second (fps) that is the system ’s performance in recording and processing frames of the area required to detect FOD is 0.95.

APA, Harvard, Vancouver, ISO, and other styles

31

Ferm, Oliwer. "Real-time Object Detection on Raspberry Pi 4 : Fine-tuning a SSD model using Tensorflow and Web Scraping." Thesis, Mittuniversitetet, Institutionen för elektronikkonstruktion, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-39455.

Full text

Abstract:

Edge AI is a growing area. The use of deep learning on low cost machines, such as the Raspberry Pi, may be used more than ever due to the easy use, availability, and high performance. A quantized pretrained SSD object detection model was deployed to a Raspberry Pi 4 B to evaluate if the throughput is sufficient for doing real-time object recognition. With input size of 300x300, an inference time of 185 ms was obtained. This is an improvement as of the previous model; Raspberry Pi 3 B+, 238 ms with a input size of 96x96 which was obtained in a related study. Using a lightweight model is for the benefit of higher throughput as a trade-off for lower accuracy. To compensate for the loss of accuracy, using transfer learning and tensorflow, a custom object detection model has been trained by fine-tuning a pretrained SSD model. The fine-tuned model was trained on images scraped from the web with people in winter landscape. The pretrained model was trained to detect different objects, including people in various environments. Predictions shows that the custom model performs significantly better doing detections on people in snow. The conclusion from this is that web scraping can be used for fine-tuning a model. However, the images scraped is of bad quality and therefore it is important to thoroughly clean and select which images that is suitable to keep, given a specific application.<br>Användning av djupinlärning på lågkostnadsmaskiner, som Raspberry Pi, kan idag mer än någonsin användas på grund av enkel användning, tillgänglighet, och hög prestanda. En kvantiserad förtränad SSD-objektdetekteringsmodell har implementerats på en Raspberry Pi 4 B för att utvärdera om genomströmningen är tillräcklig för att utföra realtidsobjektigenkänning. Med en ingångsupplösning på 300x300 pixlar erhölls en periodtid på 185 ms. Detta är en stor förbättring med avseende på prestanda jämfört med den tidigare modellen; Raspberry Pi 3 B+, 238 ms med en ingångsupplösning på 96x96 som erhölls i en relaterad studie. Att använda en kvantiserad modell till förmån för hög genomströmning bidrar till lägre noggrannhet. För att kompensera för förlusten av noggrannhet har, med hjälp av överföringsinlärning och Tensorflow, en skräddarsydd modell tränats genom att finjustera en färdigtränad SSD-modell. Den finjusterade modellen tränas på bilder som skrapats från webben med människor i vinterlandskap. Den förtränade modellen var tränad att känna igen olika typer av objekt, inklusive människor i olika miljöer. Förutsägelser visar att den skräddarsydda modellen detekterar människor med bättre precision än den ursprungliga. Slutsatsen härifrån är att webbskrapning kan användas för att finjustera en modell. Skrapade bilder är emellertid av dålig kvalitet och därför är det viktigt att rengöra all data noggrant och välja vilka bilder som är lämpliga att behålla gällande en specifik applikation.

APA, Harvard, Vancouver, ISO, and other styles

32

Elavarthi, Pradyumna. "Semantic Segmentation of RGB images for feature extraction in Real Time." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1573575765136448.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Güven, Jakup. "Investigating techniques for improving accuracy and limiting overfitting for YOLO and real-time object detection on iOS." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-19999.

Full text

Abstract:

I detta arbete genomförs utvecklingen av ett realtids objektdetekteringssystem för iOS. För detta ändamål används YOLO, en ett-stegs objektdetekterare och ett s.k. ihoplänkat neuralt nätverk vilket åstadkommer betydligt bättre prestanda än övriga realtidsdetek- terare i termer av hastighet och precision. En dörrdetekterare baserad på YOLO tränas och implementeras i en systemutvecklingsprocess. Maskininlärningsprocessen sammanfat- tas och praxis för att undvika överträning eller “overfitting” samt för att öka precision och hastighet diskuteras och appliceras. Vidare genomförs en rad experiment vilka pekar på att dataaugmentation och inkludering av negativ data i ett dataset medför ökad precision. Hyperparameteroptimisering och kunskapsöverföring pekas även ut som medel för att öka en objektdetekringsmodells prestanda. Författaren lyckas öka modellens mAP, ett sätt att mäta precision för objektdetekterare, från 63.76% till 86.73% utifrån de erfarenheter som dras av experimenten. En modells tendens för överträning utforskas även med resultat som pekar på att träning med över 300 epoker rimligen orsakar en övertränad modell.<br>This paper features the creation of a real time object detection system for mobile iOS using YOLO, a state-of-the-art one stage object detector and convoluted neural network far surpassing other real time object detectors in speed and accuracy. In this process an object detecting model is trained to detect doors. The machine learning process is outlined and practices to combat overfitting and increasing accuracy and speed are discussed. A series of experiments are conducted, the results of which suggests that data augmentation, including negative data in a dataset, hyperparameter optimisation and transfer learning are viable techniques in improving the performance of an object detection model. The author is able to increase mAP, a measurement of accuracy for object detectors, from 63.76% to 86.73% based on the results of experiments. The tendency for overfitting is also explored and results suggest that training beyond 300 epochs is likely to produce an overfitted model.

APA, Harvard, Vancouver, ISO, and other styles

34

Desai, Alok. "An Efficient Feature Descriptor and Its Real-Time Applications." BYU ScholarsArchive, 2015. https://scholarsarchive.byu.edu/etd/5465.

Full text

Abstract:

Finding salient features in an image, and matching them to their corresponding features in another image is an important step for many vision-based applications. Feature description plays an important role in the feature matching process. A robust feature descriptor must works with a number of image deformations and should be computationally efficient. For resource-limited systems, floating point and complex operations such as multiplication and square root are not desirable. This research first introduces a robust and efficient feature descriptor called PRObability (PRO) descriptor that meets these requirements without sacrificing matching accuracy. The PRO descriptor is further improved by incorporating only affine features for matching. While performing well, PRO descriptor still requires larger descriptor size, higher offline computation time, and more memory space than other binary feature descriptors. SYnthetic BAsis (SYBA) descriptor is developed to overcome these drawbacks. SYBA is built on the basis of a new compressed sensing theory that uses synthetic basis functions to uniquely encode or reconstruct a signal. The SYBA descriptor is designed to provide accurate feature matching for real-time vision applications. To demonstrate its performance, we develop algorithms that utilize SYBA descriptor to localize the soccer ball in a broadcast soccer game video, track ground objects for unmanned aerial vehicle, and perform motion analysis, and improve visual odometry accuracy for advanced driver assistance systems. SYBA provides high feature matching accuracy with computational simplicity and requires minimal computational resources. It is a hardware-friendly feature description and matching algorithm suitable for embedded vision applications.

APA, Harvard, Vancouver, ISO, and other styles

35

White, Jacob Harley. "Real-Time Visual Multi-Target Tracking in Realistic Tracking Environments." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7486.

Full text

Abstract:

This thesis focuses on visual multiple-target tracking (MTT) from a UAV. Typical state-of-the-art multiple-target trackers rely on an object detector as the primary detection source. However, object detectors usually require a GPU to process images in real-time, which may not be feasible to carry on-board a UAV. Additionally, they often do not produce consistent detections for small objects typical of UAV imagery.In our method, we instead detect motion to identify objects of interest in the scene. We detect motion at corners in the image using optical flow. We also track points long-term to continue tracking stopped objects. Since our motion detection algorithm generates multiple detections at each time-step, we use a hybrid probabilistic data association filter combined with a single iteration of expectation maximization to improve tracking accuracy.We also present a motion detection algorithm that accounts for parallax in non-planar UAV imagery. We use the essential matrix to distinguish between true object motion and apparent object motion due to parallax. Instead of calculating the essential matrix directly, which can be time-consuming, we design a new algorithm that optimizes the rotation and translation between frames. This new algorithm requires only 4 ms instead of 47 ms per frame of the video sequence.We demonstrate the performance of these algorithms on video data. These algorithms are shown to improve tracking accuracy, reliability, and speed. All these contributions are capable of running in real-time without a GPU.

APA, Harvard, Vancouver, ISO, and other styles

36

Schennings, Jacob. "Deep Convolutional Neural Networks for Real-Time Single Frame Monocular Depth Estimation." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-336923.

Full text

Abstract:

Vision based active safety systems have become more frequently occurring in modern vehicles to estimate depth of the objects ahead and for autonomous driving (AD) and advanced driver-assistance systems (ADAS). In this thesis a lightweight deep convolutional neural network performing real-time depth estimation on single monocular images is implemented and evaluated. Many of the vision based automatic brake systems in modern vehicles only detect pre-trained object types such as pedestrians and vehicles. These systems fail to detect general objects such as road debris and roadside obstacles. In stereo vision systems the problem is resolved by calculating a disparity image from the stereo image pair to extract depth information. The distance to an object can also be determined using radar and LiDAR systems. By using this depth information the system performs necessary actions to avoid collisions with objects that are determined to be too close. However, these systems are also more expensive than a regular mono camera system and are therefore not very common in the average consumer car. By implementing robust depth estimation in mono vision systems the benefits from active safety systems could be utilized by a larger segment of the vehicle fleet. This could drastically reduce human error related traffic accidents and possibly save many lives. The network architecture evaluated in this thesis is more lightweight than other CNN architectures previously used for monocular depth estimation. The proposed architecture is therefore preferable to use on computationally lightweight systems. The network solves a supervised regression problem during the training procedure in order to produce a pixel-wise depth estimation map. The network was trained using a sparse ground truth image with spatially incoherent and discontinuous data and output a dense spatially coherent and continuous depth map prediction. The spatially incoherent ground truth posed a problem of discontinuity that was addressed by a masked loss function with regularization. The network was able to predict a dense depth estimation on the KITTI dataset with close to state-of-the-art performance.

APA, Harvard, Vancouver, ISO, and other styles

37

Vlahija, Chippen, and Ahmed Abdulkader. "Real-time vehicle and pedestrian detection, a data-driven recommendation focusing on safety as a perception to autonomous vehicles." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20089.

Full text

Abstract:

Object detection exists in many countries around the world after a recent growing interest for autonomous vehicles in the last decade. This paper focuses on a vision-based approach focusing on vehicles and pedestrians detection in real-time as a perception for autonomous vehicles, using a convolutional neural network for object detection. A developed YOLOv3-tiny model is trained with the INRIA dataset to detect vehicles and pedestrians, and the model also measures the distance to the detected objects. The machine learning process is leveraged to describe each step of the training process, it also combats overfitting and increases the speed and accuracy. The authors were able to increase the mean average precision; a way to measure accuracy for object detectors; 31.3\% to 62.14\% based on the result of the training that was done. Whilst maintaining a speed of 18 frames per second.

APA, Harvard, Vancouver, ISO, and other styles

38

Borngrund, Carl. "Machine vision for automation of earth-moving machines : Transfer learning experiments with YOLOv3." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-75169.

Full text

Abstract:

This master thesis investigates the possibility to create a machine vision solution for the automation of earth-moving machines. This research was done as without some type of vision system it will not be possible to create a fully autonomous earth moving machine that can safely be used around humans or other machines. Cameras were used as the primary sensors as they are cheap, provide high resolution and is the type of sensor that most closely mimic the human vision system. The purpose of this master thesis was to use existing real time object detectors together with transfer learning and examine if they can successfully be used to extract information in environments such as construction, forestry and mining. The amount of data needed to successfully train a real time object detector was also investigated. Furthermore, the thesis examines if there are specifically difficult situations for the defined object detector, how reliable the object detector is and finally how to use service-oriented architecture principles can be used to create deep learning systems. To investigate the questions formulated above, three data sets were created where different properties were varied. These properties were light conditions, ground material and dump truck orientation. The data sets were created using a toy dump truck together with a similarly sized wheel loader with a camera mounted on the roof of its cab. The first data set contained only indoor images where the dump truck was placed in different orientations but neither the light nor the ground material changed. The second data set contained images were the light source was kept constant, but the dump truck orientation and ground materials changed. The last data set contained images where all property were varied. The real time object detector YOLOv3 was used to examine how a real time object detector would perform depending on which one of the three data sets it was trained using. No matter the data set, it was possible to train a model to perform real time object detection. Using a Nvidia 980 TI the inference time of the model was around 22 ms, which is more than enough to be able to classify videos running at 30 fps. All three data sets converged to a training loss of around 0.10. The data set which contained more varied data, such as the data set where all properties were changed, performed considerably better reaching a validation loss of 0.164 compared to the indoor data set, containing the least varied data, only reached a validation loss of 0.257. The size of the data set was also a factor in the performance, however it was not as important as having varied data. The result also showed that all three data sets could reach a mAP score of around 0.98 using transfer learning.

APA, Harvard, Vancouver, ISO, and other styles

39

Mathiesen, Jarle. "Low-Latency Detection and Tracking of Aircraft in Very High-Resolution Video Feeds." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148848.

Full text

Abstract:

Applying machine learning techniques for real-time detection and tracking of objects in very high-resolution video is a problem that has not been extensively studied. In this thesis, the practical uses of object detection for airport remote towers are explored. We present a Kalman filter-based tracking framework for low-latency aircraft tracking in very high-resolution video streams. The object detector was trained and tested on a dataset containing 3000 labelled images of aircrafts taken at Swedish airports, reaching an mAP of 90.91% with an average IoU of 89.05% on the test set. The tracker was benchmarked on remote tower video footage from Örnsköldsvik and Sundsvall using slightly modified variants of the MOT-CLEAR and ID metrics for multiple object trackers, obtaining an IDF1 score of 91.9%, and a MOTA score of 83.3%. The prototype runs the tracking pipeline on seven high resolution cameras simultaneously at 10 Hz on a single thread, suggesting large potential speed gains being attainable through parallelization.

APA, Harvard, Vancouver, ISO, and other styles

40

Alqahtani, Faleh Mohammed A. "Three-dimensional facial tracker using a stereo vision system." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/131825/1/Faleh%20Mohammed%20A_Alqahtani_Thesis.pdf.

Full text

Abstract:

This thesis develops an algorithm enabling accurate tracking of human faces, precise estimation of head poses, efficient resolution of occlusions and improved depth perception under different lighting conditions. The system also utilises two stereo cameras that have the ability to track movements across six degrees of freedom, thereby accounting for pose variations. The system can address circumstances in which facial features are no longer discernible, as the results demonstrate increased accuracy in real-time estimation of head poses and facial landmark features. It can also precisely map facial features in different head poses, making it extremely robust for applying 3D facial tracking solutions.

APA, Harvard, Vancouver, ISO, and other styles

41

Regmi, Hem Kanta. "A Real-Time Computational Decision Support System for Compounded Sterile Preparations using Image Processing and Artificial Neural Networks." University of Toledo / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1469113622.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Derome, Maxime. "Vision stéréoscopique temps-réel pour la navigation autonome d'un robot en environnement dynamique." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS156/document.

Full text

Abstract:

L'objectif de cette thèse est de concevoir un système de perception stéréoscopique embarqué, permettant une navigation robotique autonome en environnement dynamique (i.e. comportant des objets mobiles). Pour cela, nous nous sommes imposé plusieurs contraintes : 1) Puisque l'on souhaite pouvoir naviguer en terrain inconnu et en présence de tout type d'objets mobiles, nous avons adopté une approche purement géométrique. 2) Pour assurer une couverture maximale du champ visuel nous avons choisi d'employer des méthodes d'estimation denses qui traitent chaque pixel de l'image. 3) Puisque les algorithmes utilisés doivent pouvoir s'exécuter en embarqué sur un robot, nous avons attaché le plus grand soin à sélectionner ou concevoir des algorithmes particulièrement rapides, pour nuire au minimum à la réactivité du système. La démarche présentée dans ce manuscrit et les contributions qui sont faites sont les suivantes. Dans un premier temps, nous étudions plusieurs algorithmes d’appariement stéréo qui permettent d'estimer une carte de disparité dont on peut déduire, par triangulation, une carte de profondeur. Grâce à cette évaluation nous mettons en évidence un algorithme qui ne figure pas sur les benchmarks KITTI, mais qui offre un excellent compromis précision/temps de calcul. Nous proposons également une méthode pour filtrer les cartes de disparité. En codant ces algorithmes en CUDA pour profiter de l’accélération des calculs sur cartes graphiques (GPU), nous montrons qu’ils s’exécutent très rapidement (19ms sur les images KITTI, sur GPU GeForce GTX Titan).Dans un deuxième temps, nous souhaitons percevoir les objets mobiles et estimer leur mouvement. Pour cela nous calculons le déplacement du banc stéréo par odométrie visuelle pour pouvoir isoler dans le mouvement apparent 2D ou 3D (estimé par des algorithmes de flot optique ou de flot de scène) la part induite par le mouvement propre à chaque objet. Partant du constat que seul l'algorithme d'estimation du flot optique FOLKI permet un calcul en temps-réel, nous proposons plusieurs modifications de celui-ci qui améliorent légèrement ses performances au prix d'une augmentation de son temps de calcul. Concernant le flot de scène, aucun algorithme existant ne permet d'atteindre la vitesse d'exécution souhaitée, nous proposons donc une nouvelle approche découplant structure et mouvement pour estimer rapidement le flot de scène. Trois algorithmes sont proposés pour exploiter cette décomposition structure-mouvement et l’un d’eux, particulièrement efficace, permet d'estimer très rapidement le flot de scène avec une précision relativement bonne. A notre connaissance, il s'agit du seul algorithme publié de calcul du flot de scène capable de s'exécuter à cadence vidéo sur les données KITTI (10Hz).Dans un troisième temps, pour détecter les objets en mouvement et les segmenter dans l'image, nous présentons différents modèles statistiques et différents résidus sur lesquels fonder une détection par seuillage d'un critère chi2. Nous proposons une modélisation statistique rigoureuse qui tient compte de toutes les incertitudes d'estimation, notamment celles de l'odométrie visuelle, ce qui n'avait pas été fait à notre connaissance dans le contexte de la détection d'objets mobiles. Nous proposons aussi un nouveau résidu pour la détection, en utilisant la méthode par prédiction d’image qui permet de faciliter la propagation des incertitudes et l'obtention du critère chi2. Le gain apporté par le résidu et le modèle d'erreur proposés est démontré par une évaluation des algorithmes de détection sur des exemples tirés de la base KITTI. Enfin, pour valider expérimentalement notre système de perception en embarqué sur une plateforme robotique, nous implémentons nos codes sous ROS et certains codes en CUDA pour une accélération sur GPU. Nous décrivons le système de perception et de navigation utilisé pour la preuve de concept qui montre que notre système de perception, convient à une application embarquée<br>This thesis aims at designing an embedded stereoscopic perception system that enables autonomous robot navigation in dynamic environments (i.e. including mobile objects). To do so, we need to satisfy several constraints: 1) We want to be able to navigate in unknown environment and with any type of mobile objects, thus we adopt a geometric approach. 2) We want to ensure the best possible coverage of the field of view, so we employ dense methods that process every pixel in the image. 3) The algorithms must be compliant with an embedded platform, therefore we must carefully design the algorithms so they are fast enough to keep a certain level of reactivity. The approach presented in this thesis manuscript and the contributions are summarized below. First, we study several stereo matching algorithms that estimate a disparity map from which we can deduce a depth map, by triangulation. This comparative study highlights one algorithm that is not in the KITTI benchmarks, but that gives a great accuracy/processing time tradeoff. We also propose a filtering method to post-process the disparity maps. By coding these algorithm in CUDA to benefit from hardware acceleration on Graphics Processing Unit, we show that they can perform very fast (19ms on KITTI images, with a GPU GeForce GTX Titan).Second, we want to detect mobile objects and estimate their motion. To do so we compute the stereo rig motion using visual odometry, in order to isolate the part induced by moving objects in the 2D or 3D apparent motion (estimated by optical flow or scene flow algorithms). Considering that the only optical flow algorithm able to perform in real-time is FOLKI, we propose several modifications of it to slightly improve its performances at the cost of a slower processing time. Regarding the scene flow estimation, existing algorithms cannot reach the desired computation speed, so we propose a new approach by decoupling structure and motion for a fast scene flow estimation. Three algorithms are proposed to use this structure-motion decomposition, and one of them, particularly efficient, enables very fast scene flow computing with a relatively good accuracy. To our knowledge it is the only published scene flow algorithm able to perform at framerate on KITTI dataset (10 Hz).Third, to detect moving objects and segment them in the image, we show several statistical models and residual quantities on which we can base the detection by thresholding a chi2 criterion. We propose a rigorous statistical modeling that takes into account all the uncertainties occurring during the estimation, in particular during the visual odometry, which had not been done to our knowledge, in the context of moving object detection. We also propose a new residual quantity for the detection, using an image prediction approach to facilitate uncertainty propagation and the chi2 criterion modeling. The benefit brought by the proposed residual quantity and error model is demonstrated by evaluating detection algorithms on a samples of annotated KITTI data. Finally, we implement our algorithms on ROS to run the perception system on en embedded platform, and we code some algorithms in CUDA to accelerate the computing using GPU. We describe the perception and the navigation system that we use for the experimental validation. We show in our experiments that the proposed stereovision perception system is suitable for embedded robotic applications

APA, Harvard, Vancouver, ISO, and other styles

43

Romanenko, Ilya. "Novel image processing algorithms and methods for improving their robustness and operational performance." Thesis, Loughborough University, 2014. https://dspace.lboro.ac.uk/2134/16340.

Full text

Abstract:

Image processing algorithms have developed rapidly in recent years. Imaging functions are becoming more common in electronic devices, demanding better image quality, and more robust image capture in challenging conditions. Increasingly more complicated algorithms are being developed in order to achieve better signal to noise characteristics, more accurate colours, and wider dynamic range, in order to approach the human visual system performance levels.

APA, Harvard, Vancouver, ISO, and other styles

44

Leyrit, Laetitia. "Reconnaissance d'objets en vision artificielle : application à la reconnaissance de piétons." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2010. http://tel.archives-ouvertes.fr/tel-00626492.

Full text

Abstract:

Ce mémoire présente les travaux réalisés dans le cadre de ma thèse. Celle-ci a été menée dans le groupe GRAVIR (1) du LASMEA (2) au sein de l'équipe ComSee (3) qui se consacre à la vision par ordinateur. Ces travaux s'inscrivent dans le cadre d'un projet de l'Agence Nationale pour la Recherche s'intitulant " Logiciels d'Observation des Vulnérables ". Son but est de concevoir des logiciels détectant des piétons en danger et d'améliorer ainsi la sécurité routière. Ma thèse a pour but de détecter et de reconnaître les piétons dans les images. Celles-ci proviennent d'une caméra embarquée dans un véhicule circulant en milieu urbain. Ce cahier des charges implique de nombreuses contraintes. Il faut notamment obtenir un système fonctionnant en temps réel pour être capable de détecter les piétons avant un éventuel impact. De plus, ces piétons peuvent être sujets à de nombreuses variations (taille, type de vêtements...), ce qui rend la tâche de reconnaissance d'autant plus ardue. La caméra étant mobile, aucune information ne pourra être extraite du fond. Dans ma thèse, nous mettons en oeuvre différentes méthodes de vision par ordinateur, toutes basées apprentissage, qui permettent de répondre à ces attentes. Le problème se traite en deux phases. Dans un premier temps, une étape de traitement hors ligne nous permet de concevoir une méthode valide pour reconnaître des piétons. Nous faisons appel à une base d'apprentissage. Tout d'abord, un descripteur d'images est employé pour extraire des informations des images.Puis, à partir de ces informations, un classifieur est entraîné à différencier les piétons des autres objets. Nous proposons l'utilisation de trois descripteurs (ondelettes de Haar, histogrammes de gradients et descripteur binaire). Pour la classification, nous avons recours à un algorithme de Boosting (AdaBoost) et à des méthodes à noyaux (SVM, RVM, moindres carrés). Chaque méthode a été paramétrée, testée et validée, tant au niveau description d'images que classification.La meilleure association de toutes ces méthodes est également recherchée. Dans un second temps, nous développons un système embarqué temps réel, qui soit capable de détecter les piétons avant une éventuelle collision. Nous exploitons directement des images brutes en provenance de la caméra et ajoutons un module pour segmenter l'image, afin de pouvoir intégrer les méthodes de description et classification précédentes et ainsi répondre à la problématique initiale.1. acronyme de " Groupe d'Automatique, VIsion et Robotique ".2. acronyme de " LAboratoire des Sciences et Matériaux Et d'Automatique ".3. acronyme de " Computers that See ".

APA, Harvard, Vancouver, ISO, and other styles

45

Zhou, Shuting. "Navigation of a quad-rotor to access the interior of a building." Thesis, Compiègne, 2015. http://www.theses.fr/2015COMP2237.

Full text

Abstract:

Ce travail de recherche est dédié à l’élaboration d’une stratégie de navigation autonome qui comprend la génération d’une trajectoire optimale en évitant des obstacles, la détection de l’objet d’intérêt spécifique (i.e. une fenêtre) et puis l’exécution de la manoeuvre postérieure à approcher la fenêtre et enfin accéder à l’intérieur du bâtiment. Le véhicule est navigué par un système de vision et une combinaison de capteurs inertiels et d’altitude, ce qui réalise une localisation relative du quadri-rotor par rapport à son environment. Une méthode de planification de trajectoire basée sur Model Predictive Control (MPC), qui utilise les informations fournies par le GPS et le capteur visuel, a été conçue pour générer une trajectoire optimale en temps réel avec des capacités d’évitement de collision, qui commence à partir d’un point initial donné par l’utilisateur et guide le véhicule pour atteindre le point final à l’extérieur du bâtiment de la cible. Dans le but de détecter et de localiser l’objet d’intérêt, deux stratégies de détection d’objet basées sur la vision sont proposées et sont respectivement appliquées dans le système de stéréo vision et le système de vision en utilisant la Kinect. Après l’estimation du modèle de la fenêtre cible, un cadre d’estimation de mouvement est conçu pour estimer ego-mouvement du véhicule à partir des images fournies par le capteur visuel. Il y a eu deux versions des cadres d’estimation de mouvement pour les deux systèmes de vision. Une plate-forme expérimentale de quad-rotor est développée. Pour l’estimation de la dynamique de translation du véhicule, un filtre de Kalman est mis en œuvre pour combiner les capteurs d’imagerie, inertiels et d’altitude. Un système de détection et de contrôle hiérarchique est conçu pour effectuer la navigation et le contrôle de l’hélicoptère quadri-rotor, ce qui permet au véhicule d’estimer l’état sans marques artificielles ou d’autres systèmes de positionnement externes<br>This research work is dedicated to the development of an autonomous navigation strategy which includes generating an optimal trajectory with obstacles avoiding capabilities, detecting specific object of interest (i.e. a window) and then conducting the subsequent maneuver to approach the window and finally access into the building. The vehicle is navigated by a vision system and a combination of inertial and altitude sensors, which achieve a relative localization of the quad-rotor with respect to its surrounding environment. A MPC-based path planning method using the information provided by the GPS and the visual sensor has been developed to generate an optimal real-time trajectory with collision avoidance capabilities, which starts from an initial point given by the user and guides the vehicle to achieve the final point outside the target building. With the aim of detecting and locating the object of interest, two different vision-based object detection strategies are proposed and are applied respectively in the stereo vision system and the vision system using the Kinect. After estimating the target window model, a motion estimation framework is developed to estimate the vehicle’s ego-motion from the images provided by the visual sensor. There have been two versions of the motion estimation frameworks for both vision systems. A quad-rotor experimental platform is developed. For estimating the translational dynamic of the vehicle, a Kalman filter is implemented to combine the imaging, inertial and altitude sensors. A hierarchical sensing and control system is designed to perform the navigation and control of the quad-rotor helicopter, which allows the vehicle to estimate the state without artificial marks or other external positioning systems

APA, Harvard, Vancouver, ISO, and other styles

46

Ciou, Yu-Jie, and 邱裕傑. "Real-time Multiple Objects Image Tracking Based on Moving Edges Detection." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/31000125840707333652.

Full text

Abstract:

碩士<br>龍華科技大學<br>工程技術研究所<br>95<br>This study is mainly to construct a real-time multiple objects image tracking system based on moving edges detection technique. The moving edges detection technique is utilized as the main detection rule, the moving target shifting method and background compensation method are also applied to solve the shortcoming of moving edges detection. This tracking system can find out the moving object correctly under a complicated environment, in addition, the template matching method is applied to search for multiple moving objects simultaneously, therefore, this system is suitable for more extensive situation.　The Back-Propagation Neural Network technology is utilized to compensate the gray level in template, and to solve the influence of unevenness of the luminance in image. Finally, the detected object position information is used for the control system to track or monitor the object movement. This whole real-time multiple objects image tracking system can be divided by the software and hardware. In software, a Visual Basic program is developed as the operation interfaces, and the Halcon image processing software library is applied as the image processing developing tool. In hardware, a color CCD camera with image acquisition card are used as the image source, and X-Y table with servo motors controlled by a motion control card are used as the movement tracking platform.

APA, Harvard, Vancouver, ISO, and other styles

47

Chen, Yi-Kuang, and 陳奕光. "Real Time Multiple Motion objects Detection and Tracking for Scene Surveillance Applications." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/51914258270430467915.

Full text

Abstract:

碩士<br>國立臺灣科技大學<br>資訊工程系<br>92<br>In the last decade, the development of computer vision techniques has quickly increased and their applications to surveillance systems also have been used widely in many kinds of fields, such as detection and identification of intruders in security systems, human motion and posture analysis systems, and analysis of traffic flows, critical motion detection of nearby vehicles on roads, vision-based driver-assistant systems, and monitor systems within a parking area of intelligent transportation systems(ITS). Motion objects detection and tracking techniques play important roles in these systems. In this thesis, we present a multiple motion objects detection and tracking system which is operated under monitoring conditions. This system consists of two major processing phases－motion object detection and tracking. In the detection phase, we adopt a statistical approach to performing background subtraction in which both brightness and chromaticity distortions are used to segment foreground and background images. Besides, we propose an automatic threshold selection method to solve manual threshold tuning problems. In the tracking phase, we can decide the status of multiple motion objects by analyzing the differences between two consecutive images. And a clustering analysis method is employed to track the trajectory of each object using its centroid which can be detected in the former step. We also use the Gabor filter to overcome the intersection problems which occur when multiple motion objects meet during the tracking time. The experimental results reveal that the performance of our proposed method is better than those of traditional ones with regard to both noise suppression and on condition that the color of foreground is similar to that of background.

APA, Harvard, Vancouver, ISO, and other styles

48

Song, Xuan-qing, and 宋炫慶. "Real-Time Visual Detection and Tracking of Multiple Moving Objects Based on Particle Filter Techniques." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/86592932642455768347.

Full text

Abstract:

碩士<br>國立臺灣科技大學<br>資訊工程系<br>93<br>In the last decade, due to the popularization of video products and the rapid development of computer vision techniques, the detection and tracking methods for dynamic images have been widely applied in many kinds of fields, such as video surveillance, intelligent transportation, and parking area management systems. They can replace a lot of bored and time-wasting work, and avoid mannal mistakes caused by fatigue of human. On the effectiveness for a given period of time, these visual detection and tracking systems possess the ability of reporting sudden situations in real time, so that the whole time costs of such systems can be greatly reduced. In this thesis, the detection phase of our developed system consists of four parts: background generation, foreground detection, shadow elimination, and background maintenance. In the background generation part, the median method is used for constructing background images from the past N frames. In the foreground detection part, an extraction function is applied to indirectly perform differencing to obtain foreground images. In the shadow elimination part, a deterministic nonmodel-based method is adopted to remove shadows. As to the background maintenance part, a history map which records the number of times of the changes of corresponding pixels is employed to maintain background images. In the tracking phase of the system, this thesis exploits a particle filter to track moving objects. The color distribution of a moving object is chosen as its features represented by a color probability histogram. In order to raise the accuracy of tracking, the background information serves as the increase candidate weight of a moving object. The experimental results reveal that in general situations our system can achieve real-time processing and can obtain robust detection and tracking results for multiple moving objects.

APA, Harvard, Vancouver, ISO, and other styles

49

Chuang, Chu-Lin, and 莊曲霖. "Study on Visual Surveillance-based Real-Time Detection and Encryption of Abandoned and Removed Objects." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/41059718996674624920.

Full text

Abstract:

碩士<br>國立澎湖科技大學<br>電資研究所<br>101<br>This thesis proposes a visual surveillance-based real-time detection and encryption of abandoned and removed objects. In the proposed “Visual surveillance-based real-time detection of abandoned and removed objects”, the features of objects are used to detect the static foreground objects, and then make the obtained static foreground objects as candidated abandoned or removed objects. Singular value decomposition (SVD) is applied on the region of interest (ROI) of the candidated abandoned or removed object in the current frame and the background image, respectively. Next, the maximum eigenvalues of the S matrices are compared to obtain the abandoned object or the removed object. “Encryption and forgery detection of the abandoned and removed objects” is proposed to obtain the encryption and forgery detection of the abandoned or removed objects in the key-frames. The DWT+DCT+SVD-based image watermarking is used to obtain forgery detection by using the obtained features under embedding and extracting schemes. The proposed method can effectively detect the abandoned objects and the removed objects, and further determine whether images have been forged. The experimental results demonstrate that the proposed method has good performance. Furthermore, the false-positive detection in the SVD-based image watermarking is removed. Therefore, the proposed method has high value in the application of the visual surveillance.

APA, Harvard, Vancouver, ISO, and other styles

50

NI, YU-SHU, and 倪堉書. "Real-Time Classification and Detection of Multiple Objects by Using Deep Convolutional Neural Network for AI Edge Computing." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/m2njny.

Full text

Abstract:

碩士<br>逢甲大學<br>電子工程學系<br>106<br>In the ever-changing computer vision technology, it was from early shallow neural network change into machine learning. In the recent years, it became using the deep convolutional neural network (DCNN). The accuracy of computer vision improves much more even higher than the person do. However, when the layer is deeper, the computation becomes heavier and reaches an unacceptable heavy level for common mobile devices. In order to solve this problem, we propose a CNN model, which has a higher accuracy and achieves the minimum computation. For the CNN model, the two most important issues are the establishment of models and that of datasets. The size and the composition of the dataset directly affect the initial accuracy. The depth and width of the model affect the computation. We built the dataset dedicated to on-the-road detection, including person, bicycles, motorbikes, and cars. We adjusted the dataset architecture and added the insufficient data samples. Then, we used images and video to test the accuracy of the dataset. Finally, we established a complete dataset. Then, we started to build the CNN model. In the first, we built the basic model with 14 layers which owns 70% Top-1 accuracy. After that, we used the small size filter and deleted the filter numbers and some convolutional layers. Finally, the model processes lower computing time and high performance. The accuracy of basic model is top-1 50%. By building the dataset, the accuracy improved to 80%. Next, the model reduction reduced the computation time 50% which losing only 7% accuracy. Finally, we could detect on the NVIDIA TX-1 in real-time for 240x360 resolution video formats at 30 FPS.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!