To see the other types of publications on this topic, follow the link: Keypoints detection.

Dissertations / Theses on the topic 'Keypoints detection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 38 dissertations / theses for your research on the topic 'Keypoints detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Gale, Timothy Edward. "Improved detection and quantisation of keypoints in the complex wavelet domain." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/277713.

Full text
Abstract:
An algorithm which is able to consistently identify features in an image is a basic building block of many object recognition systems. Attaining sufficient consistency is challenging, because factors such as pose and lighting can dramatically change a feature’s appearance. Effective feature identification therefore requires both a reliable and accurate keypoint detector and a discriminative categoriser (or quantiser). The Dual Tree Complex Wavelet Transform (DTCWT) decomposes an image into oriented subbands at a range of scales. The resulting domain is arguably well suited for further image analysis tasks such as feature identification. This thesis develops feature identification in the complex wavelet domain, building on previous keypoint detection work and exploring the use of random forests for descriptor quantisation. Firstly, we extended earlier work on keypoint detection energy functions. Existing complex wavelet based detectors were observed to suffer from two defects: a tendency to produce keypoints on straight edges at particular orientations and sensitivity to small translations of the image. We introduced a new corner energy function based on the Same Level Product (SLP) transform. This function performed well compared to previous ones, combining competitive edge rejection and positional stability properties. Secondly, we investigated the effect of changing the resolution at which the energy function is sampled. We used the undecimated DTCWT to calculate energy maps at the same resolution as the original images. This revealed the presence of fine details which could not be accurately interpolated from an energy map at the standard resolution. As a result, doubling the resolution of the map along each axis significantly improved both the reliability and posi-tional accuracy of detections. However, calculating the map using interpolated coefficients resulted in artefacts introduced by inaccuracies in the interpolation. We therefore proposed a modification to the standard DTCWT structure which doubles its output resolution for a modest computational cost. Thirdly, we developed a random forest based quantiser which operates on complex wavelet polar matching descriptors, with optional rotational invariance. Trees were evaluated on the basis of how consistently they quantised features into the same bins, and several examples of each feature were obtained by means of tracking. We found that the trees produced the most consistent quantisations when they were trained with a second set of tracked keypoints. Detecting keypoints using the the higher resolution energy maps also resulted in more consistent quantiser outputs, indicating the importance of the choice of detector on quantiser performance. Finally, we introduced a fast implementation of the DTCWT, keypoint detection and descriptor extraction algorithms for OpenCL-capable GPUs. Several aspects were optimised to enable it to run more efficiently on modern hardware, allowing it to process HD footage in faster than real time. This particularly aided the development of the detector algorithms by permitting interactive exploration of their failure modes using a live camera feed.
APA, Harvard, Vancouver, ISO, and other styles
2

Avigni, Andrea. "Learning to detect good image features." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/12856/.

Full text
Abstract:
State-of-the-art keypoint detection algorithms have been designed to extract specific structures from images and to achieve a high keypoint repeatability, which means that they should find the same points in images undergoing specific transformations. However, this criterion does not guarantee that the selected keypoints will be the optimal ones during the successive matching step. The approach that has been developed in this thesis work is aimed at extracting keypoints that maximize the matching performance according to a pre-selected image descriptor. In order to do that, a classifier has been trained on a set of “good” and “bad” descriptors extracted from training images that are affected by a set of pre-defined nuisances. The set of “good” keypoints used for the training is filled with those vectors that are related to the points that gave correct matches during an initial matching step. On the contrary, randomly chosen points that are far away from the positives are labeled as “bad” keypoints. Finally, the descriptors computed at the “good” and “bad” locations form the set of features used to train the classifier that will judge each pixel of an unseen input image as a good or bad candidate for driving the extraction of a set of keypoints. This approach requires, though, the descriptors to be computed at every pixel of the image and this leads to a high computational effort. Moreover, if a certain descriptor extractor is used during the training step, it must be used also during the testing. In order to overcome these problems, the last part of this thesis has been focused on the creation and training of a convolutional neural network (CNN) that uses as positive samples the patches centered at those locations that give correct correspondences during the matching step. Eventually, the results and the performances of the developed algorithm have compared to the state-of-the-art using a public benchmark.
APA, Harvard, Vancouver, ISO, and other styles
3

Hansen, Peter Ian. "Wide-baseline keypoint detection and matching with wide-angle images for vision based localisation." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/37667/1/Peter_Hansen_Thesis.pdf.

Full text
Abstract:
This thesis addresses the problem of detecting and describing the same scene points in different wide-angle images taken by the same camera at different viewpoints. This is a core competency of many vision-based localisation tasks including visual odometry and visual place recognition. Wide-angle cameras have a large field of view that can exceed a full hemisphere, and the images they produce contain severe radial distortion. When compared to traditional narrow field of view perspective cameras, more accurate estimates of camera egomotion can be found using the images obtained with wide-angle cameras. The ability to accurately estimate camera egomotion is a fundamental primitive of visual odometry, and this is one of the reasons for the increased popularity in the use of wide-angle cameras for this task. Their large field of view also enables them to capture images of the same regions in a scene taken at very different viewpoints, and this makes them suited for visual place recognition. However, the ability to estimate the camera egomotion and recognise the same scene in two different images is dependent on the ability to reliably detect and describe the same scene points, or ‘keypoints’, in the images. Most algorithms used for this purpose are designed almost exclusively for perspective images. Applying algorithms designed for perspective images directly to wide-angle images is problematic as no account is made for the image distortion. The primary contribution of this thesis is the development of two novel keypoint detectors, and a method of keypoint description, designed for wide-angle images. Both reformulate the Scale- Invariant Feature Transform (SIFT) as an image processing operation on the sphere. As the image captured by any central projection wide-angle camera can be mapped to the sphere, applying these variants to an image on the sphere enables keypoints to be detected in a manner that is invariant to image distortion. Each of the variants is required to find the scale-space representation of an image on the sphere, and they differ in the approaches they used to do this. Extensive experiments using real and synthetically generated wide-angle images are used to validate the two new keypoint detectors and the method of keypoint description. The best of these two new keypoint detectors is applied to vision based localisation tasks including visual odometry and visual place recognition using outdoor wide-angle image sequences. As part of this work, the effect of keypoint coordinate selection on the accuracy of egomotion estimates using the Direct Linear Transform (DLT) is investigated, and a simple weighting scheme is proposed which attempts to account for the uncertainty of keypoint positions during detection. A word reliability metric is also developed for use within a visual ‘bag of words’ approach to place recognition.
APA, Harvard, Vancouver, ISO, and other styles
4

Fefilatyev, Sergiy. "Algorithms for Visual Maritime Surveillance with Rapidly Moving Camera." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4037.

Full text
Abstract:
Visual surveillance in the maritime domain has been explored for more than a decade. Although it has produced a number of working systems and resulted in a mature technology, surveillance has been restricted to the port facilities or areas close to the coastline assuming a fixed-camera scenario. This dissertation presents several contributions in the domain of maritime surveillance. First, a novel algorithm for open-sea visual maritime surveillance is introduced. We explore a challenging situation with a camera mounted on a buoy or other floating platform. The developed algorithm detects, localizes, and tracks ships in the field of view of the camera. Specifically, our method is uniquely designed to handle a rapidly moving camera. Its performance is robust in the presence of a random relatively-large camera motion. In the context of ship detection, a new horizon detection scheme for a complex maritime domain is also developed. Second, the performance of the ship detection algorithm is evaluated on a dataset of 55,000 images. Accuracy of detection of up to 88% of ships is achieved. Lastly, we consider the topic of detection of the vanishing line of the ocean surface plane as a way to estimate the horizon in difficult situations. This allows extension of the ship-detection algorithm to beyond open-sea scenarios.
APA, Harvard, Vancouver, ISO, and other styles
5

Caha, Miloš. "Určení směru pohledu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237168.

Full text
Abstract:
Main object of this work is to design and implement the algorithm for look direction determination, respectively the head movement. More specifically, it is a system that searches face in the video and then detects points, suitable for view direction estimation of tracked person. Estimation is realized using searching transformation, which has been performed on key points during head movement. For accuracy enhancement the calibration frames are used. Calibration frames determines the key points transformation in defined view directions. Main result is an application able to determine deflection of head from straight position in horizontal and vertical direction for tracked person. Output doesn't contain only information about deflection direction, but it also contains the size of deflection.
APA, Harvard, Vancouver, ISO, and other styles
6

Kapoor, Prince. "Shoulder Keypoint-Detection from Object Detection." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/38015.

Full text
Abstract:
This thesis presents detailed observation of different Convolutional Neural Network (CNN) architecture which had assisted Computer Vision researchers to achieve state-of-the-art performance on classification, detection, segmentation and much more to name image analysis challenges. Due to the advent of deep learning, CNN had been used in almost all the computer vision applications and that is why there is utter need to understand the miniature details of these feature extractors and find out their pros and cons of each feature extractor meticulously. In order to perform our experimentation, we decided to explore an object detection task using a particular model architecture which maintains a sweet spot between computational cost and accuracy. The model architecture which we had used is LSTM-Decoder. The model had been experimented with different CNN feature extractor and found their pros and cons in variant scenarios. The results which we had obtained on different datasets elucidates that CNN plays a major role in obtaining higher accuracy and we had also achieved a comparable state-of-the-art accuracy on Pedestrian Detection Dataset. In extension to object detection, we also implemented two different model architectures which find shoulder keypoints. So, One of our idea can be explicated as follows: using the detected annotation from object detection, a small cropped image is generated which would be feed into a small cascade network which was trained for detection of shoulder keypoints. The second strategy is to use the same object detection model and fine tune their weights to predict shoulder keypoints. Currently, we had generated our results for shoulder keypoint detection. However, this idea could be extended to full-body pose Estimation by modifying the cascaded network for pose estimation purpose and this had become an important topic of discussion for the future work of this thesis.
APA, Harvard, Vancouver, ISO, and other styles
7

Loiseau-Witon, Nicolas. "Détection et description de points clés par apprentissage." Electronic Thesis or Diss., Lyon, INSA, 2023. http://www.theses.fr/2023ISAL0101.

Full text
Abstract:
Les hôpitaux génèrent de plus en plus d’images médicales en 3D. Ces volumes nécessitent un recalage automatique, en vue d’être analysés de manière systématique et à grande échelle. Les points clés sont utilisés pour réduire la durée et la mémoire nécessaires à ce recalage et peuvent être détectés et décrits à l’aide de différentes méthodes classiques, mais également à l’aide de réseaux neuronaux, comme cela a été démontré de nombreuses fois en 2D. Cette thèse présente les résultats et les discussions sur les méthodes de détection et de description de points clés à l’aide de réseaux neuronaux 3D. Deux types de réseaux ont été étudiés pour détecter et/ou décrire des points caractéristiques dans des images médicales 3D. Les premiers réseaux étudiés permettent de décrire les zones entourant directement les points clés, tandis que les seconds effectuent les deux étapes de détection et de description des points clés en une seule fois
Hospitals are increasingly generating 3D medical images that require automatic registration for systematic and large-scale analysis. Key points are used to reduce the time and memory required for this registration, and can be detected and described using various classical methods, as well as neural networks, as demonstrated numerous times in 2D. This thesis presents results and discussions on methods for detecting and describing key points using 3D neural networks. Two types of networks were studied to detect and/or describe characteristic points in 3D medical images. The first networks studied describe the areas directly surrounding key points, while the second type performs both detection and description of key points in a single step
APA, Harvard, Vancouver, ISO, and other styles
8

Zhao, Mingchang. "Keypoint-Based Binocular Distance Measurement for Pedestrian Detection System on Vehicle." Thesis, Université d'Ottawa / University of Ottawa, 2014. http://hdl.handle.net/10393/31693.

Full text
Abstract:
The Pedestrian Detection System (PDS) has become a significant area of research designed to protect pedestrians. Despite the huge number of research work, the most current PDSs are designed to detect pedestrians without knowing their distances from cars. In fact, a priori knowledge of the distance between a car and pedestrian allows this system to make the appropriate decision in order to avoid collisions. Typical methods of distance measurement require additional equipment (e.g., Radars) which, unfortunately, cannot identify objects. Moreover, traditional stereo-vision methods have poor precision in long-range conditions. In this thesis, we use the keypoint-based feature extraction method to generate the parallax in a binocular vision system in order to measure a detectable object; this is used instead of a disparity map. Our method enhances the tolerance to instability of a moving vehicle; and, it also enables binocular measurement systems to be equipped with a zoom lens and to have greater distance between cameras. In addition, we designed a crossover re-detection and tracking method in order to reinforce the robustness of the system (one camera helps the other reduce detection errors). Our system is able to measure the distance between cars and pedestrians; and, it can also be used efficiently to measure the distance between cars and other objects such as Traffic signs or animals. Through a real word experiment, the system shows a 7.5% margin of error in outdoor and long-range conditions.
APA, Harvard, Vancouver, ISO, and other styles
9

Eklund, Anton. "Cascade Mask R-CNN and Keypoint Detection used in Floorplan Parsing." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-415371.

Full text
Abstract:
Parsing floorplans have been a problem in automatic document analysis for long and have up until recent years been approached with algorithmic methods. With the rise of convolutional neural networks (CNN), this problem too has seen an upswing in performance. In this thesis the task is to recover, as accurately as possible, spatial and geometric information from floorplans. This project builds around instance segmentation models like Cascade Mask R-CNN to extract the bulk of information from a floorplan image. To complement the segmentation, a new style of using keypoint-CNN is presented to find precise locations of corners. These are then combined in a post-processing step to give the resulting segmentation. The resulting segmentation scores exceed the current baseline of the CubiCasa5k floorplan dataset with a mean IoU of 72.7% compared to 57.5%. Further, the mean IoU for individual classes is also improved for almost every class. It is also shown that Cascade Mask R-CNN is better suited than Mask R-CNN for this task.
APA, Harvard, Vancouver, ISO, and other styles
10

Kemp, Neal. "Content-Based Image Retrieval for Tattoos: An Analysis and Comparison of Keypoint Detection Algorithms." Scholarship @ Claremont, 2013. http://scholarship.claremont.edu/cmc_theses/784.

Full text
Abstract:
The field of biometrics has grown significantly in the past decade due to an increase in interest from law enforcement. Law enforcement officials are interested in adding tattoos alongside irises and fingerprints to their toolbox of biometrics. They often use these biometrics to aid in the identification of victims and suspects. Like facial recognition, tattoos have seen a spike in attention over the past few years. Tattoos, however, have not received as much attention by researchers. This lack of attention towards tattoos stems from the difficulty inherent in matching these tattoos. Such difficulties include image quality, affine transformation, warping of tattoos around the body, and in some cases, excessive body hair covering the tattoo. We will utilize context-based image retrieval to find a tattoo in a database which means using one image to query against a database in order to find similar tattoos. We will focus specifically on the keypoint detection process in computer vision. In addition, we are interested in finding not just exact matches but also similar tattoos. We will conclude that the ORB detector pulls the most relevant features and thus is the best chance for yielding an accurate result from content-based image retrieval for tattoos. However, we will also show that even ORB will not work on its own in a content-based image retrieval system. Other processes will have to be involved in order to return accurate matches. We will give recommendations on next-steps to create a better tattoo retrieval system.
APA, Harvard, Vancouver, ISO, and other styles
11

MAZZINI, DAVIDE. "Local Detectors and Descriptors for Object and Scene Recognition." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2018. http://hdl.handle.net/10281/199003.

Full text
Abstract:
Lo scopo di questa tesi è di studiare due principali categorie di algoritmi per la detection di oggetti e il loro uso in particolari applicazioni. La prima categoria esaminata riguarda approcci basati su Keypoint. Diversi esperimenti comparativi vengono eseguiti all'interno della pipeline standard del modello di test MPEG CDVS e viene proposta una pipeline estesa che fa uso di informazione colore. La seconda categoria di object detectors oggetto di indagine si basa su Reti neurali convoluzionali. In particolare, vengono affrontate due applicazioni di reti neurali convoluzionali per il riconoscimento di oggetti. Il primo riguarda il riconoscimento di loghi commerciali. Due pipeline di classificazione sono progettate e testate su un set di immagini raccolte da Flickr. La prima architettura utilizza una rete pre-addestrata come feature extractor e raggiunge risultati comparabili a quelli di algoritmi basati Keypoint. La seconda architettura si avvale di una rete neurale che supera le performances di metodi stato dell'arte basati su Keypoint. L'altra applicazione esaminata è la categorizzazione di dipinti che consiste nell'associare l'autore, nell'assegnare un dipinto alla scuola o al movimento artistico a cui appartiene, e classificare il genere del dipinto, ad es. paesaggio, ritratto, illustrazione ecc. Per affrontare questo problema, viene proposta una struttura di rete neurale multibranch e multitask che beneficia dell'uso congiunto di approcci basati su keypoint e di features neurali. In entrambe le applicazioni viene anche esaminato l'uso di tecniche di data augmentation per ampliare il training set. In particolare per i dipinti, un algoritmo di trasferimento di stile pittorico basato su reti neurali viene sfruttato per generare quadri sintetici da utilizzare in fase di training.
The aim of this thesis is to study two main categories of algorithms for object detection and their use in particular applications. The first category that is investigated concerns Keypoint-based approaches. Several comparative experiments are performed within the standard testing pipeline of the MPEG CDVS Test Model and an extended pipeline which make use of color information is proposed. The second category of object detectors that is investigated is based on Convolutional Neural Networks. Two applications of Convolutional Neural Networks for object recognition are in particular addressed. The first concerns logo recognition. Two classification pipelines are designed and tested on a real-world dataset of images collected from Flickr. The first architecture makes use of a pre-trained network as feature extractor and it achieves comparable results keypoint based approaches. The second architecture makes use of a tiny end-to-end trained Neural Network that outperformed state-of-the-art keypoint based methods. The other application addressed is Painting Categorization. It consists in associating the author, assigning a painting to the school or art movement it belongs to, and categorizing the genre of the painting, e.g. landscape, portrait, illustration etc. To tackle this problem, a novel multibranch and multitask Neural Network structure is proposed which benefit from joint use of keypoint-based approaches and neural features. In both applications the use of data augmentation techniques to enlarge the training set is also investigated. In particular for paintings, a neural style transfer algorithm is exploited for generating synthetic paintings to be used in training.
APA, Harvard, Vancouver, ISO, and other styles
12

Bendale, Pashmina Ziparu. "Development and evaluation of a multiscale keypoint detector based on complex wavelets." Thesis, University of Cambridge, 2011. https://www.repository.cam.ac.uk/handle/1810/252226.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Buck, Robert. "Cluster-Based Salient Object Detection Using K-Means Merging and Keypoint Separation with Rectangular Centers." DigitalCommons@USU, 2016. https://digitalcommons.usu.edu/etd/4631.

Full text
Abstract:
The explosion of internet traffic, advent of social media sites such as Facebook and Twitter, and increased availability of digital cameras has saturated life with images and videos. Never before has it been so important to sift quickly through large amounts of digital information. Salient Object Detection (SOD) is a computer vision topic that finds methods to locate important objects in pictures. SOD has proven to be helpful in numerous applications such as image forgery detection and traffic sign recognition. In this thesis, I outline a novel SOD technique to automatically isolate important objects from the background in images.
APA, Harvard, Vancouver, ISO, and other styles
14

Šimetka, Vojtěch. "3D Rekonstrukce historických míst z obrázků na Flickru." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234976.

Full text
Abstract:
Tato práce popisuje problematiku návrhu a vývoje aplikace pro rekonstrukci 3D modelů z 2D obrazových dat, označované jako bundle adjustment. Práce analyzuje proces 3D rekonstrukce a důkladně popisuje jednotlivé kroky. Prvním z kroků je automatizované získání obrazové sady z internetu. Je představena sada skriptů pro hromadné stahování obrázků ze služeb Flickr a Google Images a shrnuty požadavky na tyto obrázky pro co nejlepší 3D rekonstrukci. Práce dále popisuje různé detektory, extraktory a párovací algoritmy klíčových bodů v obraze s cílem najít nejvhodnější kombinaci pro rekonstrukci budov. Poté je vysvětlen proces rekonstrukce 3D struktury, její optimalizace a jak je tato problematika realizovaná v našem programu. Závěr práce testuje výsledky získané z implementovaného programu pro několik různých datových sad a porovnává je s výsledky ostatních podobných programů, představených v úvodu práce.
APA, Harvard, Vancouver, ISO, and other styles
15

Urban, Daniel. "Lokalizace mobilního robota v prostředí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-385923.

Full text
Abstract:
This diploma thesis deals with the problem of mobile robot localisation in the environment based on current 2D and 3D sensor data and previous records. Work is focused on detecting previously visited places by robot. The implemented system is suitable for loop detection, using the Gestalt 3D descriptors. The output of the system provides corresponding positions on which the robot was already located. The functionality of the system has been tested and evaluated on LiDAR data.
APA, Harvard, Vancouver, ISO, and other styles
16

Ricci, Thomas. "Individuazione di punti salienti in dati 3D mediante rappresentazioni strutturate." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amslaurea.unibo.it/3968/.

Full text
Abstract:
Questa tesi si inserisce nel filone di ricerca dell'elaborazione di dati 3D, e in particolare nella 3D Object Recognition, e delinea in primo luogo una panoramica sulle principali rappresentazioni strutturate di dati 3D, le quali rappresentano una prerogativa necessaria per implementare in modo efficiente algoritmi di processing di dati 3D, per poi presentare un nuovo algoritmo di 3D Keypoint Detection che è stato sviluppato e proposto dal Computer Vision Laboratory dell'Università di Bologna presso il quale ho effettuato la mia attività di tesi.
APA, Harvard, Vancouver, ISO, and other styles
17

Gardenier, John Hille. "How Now Lame Cow: Automatic Lameness Assessment for Dairy Cattle with 3D Sensors." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/23218.

Full text
Abstract:
Lameness in dairy cattle is a prevalent health issue impacting animal welfare and economic performance. Automatic lameness detection using 3D sensors is proposed in this thesis to automate the current time intensive manual locomotion scoring, resulting in more objective and frequent monitoring of individual lameness. Conventional visual locomotion scoring was analysed for consistency amongst observers, providing an estimate for human locomotion scoring performance and quantifying the ground truth consistency for an automatic lameness detection system. A 4-level scale had to be binarised to 2 levels to achieve consistent locomotion scoring from two expert observers. Novice observers achieved consistent scoring on a 4-level scale, and performed pairwise preference scoring with good consistency, where observers scored which of two passings was more lame. 3D sensors to record cattle kinematics were evaluated on data quality, positioning, and cost. A prototype with overhead and side-on Kinect-v2 sensors was developed, and locomotor keypoints were detected in 2D. Tracking, in particular of hooves and carpal/tarsal joints, was performed after projection of detections into 3D. Gait metrics were extracted from the resulting keypoint trajectories including a variety of hoof placement, spine curvature, and hip displacement metrics. The difference between gait metrics at different scores, and the repeatability of measurement were analysed. Gait metrics were used to perform regression, classification, and pairwise preference prediction of ground truth scores. A simple neural network using stacked gait metrics achieved near perfect performance for pairwise preference prediction on 28613 pairs (se. = 0.99, sp. = 0.98), showing that automatic classification of differences in gait performs better than predicting an absolute score. This thesis proposes a new research direction for lameness detection by rethinking (automatic) lameness monitoring as a relative measurement instead of absolute measurement. Both manual pairwise preference scoring and classification of pairs have shown encouraging results towards the ultimate goal of accurate on-farm automatic lameness monitoring.
APA, Harvard, Vancouver, ISO, and other styles
18

Bartončík, Michal. "Rozpoznávání výrazu tváře u neznámých osob." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-219322.

Full text
Abstract:
This paper describes the various components and phases of the search and recognition of facial expressions of unknown persons. They are presented here as well as possible solutions and methods of addressing each phase of the project. My master’s thesis is designed to recognize facial expressions of unknown persons. For this thesis, I was lent industrial video camera, computer, and place in a laboratory. Furthermore, we introduce the color spaces and their use. From the lead representatives selects the most appropriate assistance for the use of Matlab and the proposed algorithm. After finding a suitable color space segments skin color in the image. The skin, however, surrounds the entire body and so need to be found, the separated parts of the image representing the color of skin, a face. Once you find a face is needed to find relevant points for the identification subsequent deformation to definition of facial expressions. We define here the actual muscle movements in different expressions.
APA, Harvard, Vancouver, ISO, and other styles
19

Madrigali, Andrea. "Analysis of Local Search Methods for 3D Data." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016.

Find full text
Abstract:
In questa tesi sono stati analizzati alcuni metodi di ricerca per dati 3D. Viene illustrata una panoramica generale sul campo della Computer Vision, sullo stato dell’arte dei sensori per l’acquisizione e su alcuni dei formati utilizzati per la descrizione di dati 3D. In seguito è stato fatto un approfondimento sulla 3D Object Recognition dove, oltre ad essere descritto l’intero processo di matching tra Local Features, è stata fatta una focalizzazione sulla fase di detection dei punti salienti. In particolare è stato analizzato un Learned Keypoint detector, basato su tecniche di apprendimento di machine learning. Quest ultimo viene illustrato con l’implementazione di due algoritmi di ricerca di vicini: uno esauriente (K-d tree) e uno approssimato (Radial Search). Sono state riportate infine alcune valutazioni sperimentali in termini di efficienza e velocità del detector implementato con diversi metodi di ricerca, mostrando l’effettivo miglioramento di performance senza una considerabile perdita di accuratezza con la ricerca approssimata.
APA, Harvard, Vancouver, ISO, and other styles
20

Brue, Fabio. "Schemi di soluzione numerica dell'equazione delle onde per l'individuazione di punti salienti in immagini." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/6787/.

Full text
Abstract:
L’obiettivo di questa tesi è stato quello di migliorare l’efficacia e l’efficienza di una proposta allo stato dell’arte per l’individuazione di punti salienti in immagini digitali [1]. Questo algoritmo sfrutta le proprietà dell’equazione alle derivate parziali che modella l’evoluzione di un’onda. Per migliorarlo sono stati implementati alcuni schemi numerici di risoluzione dell’equazione delle onde bidimensionale e sono stati valutati rispetto allo schema già utilizzato. Sono stati implementati sia schemi impliciti sia schemi espliciti, tutti in due versioni: con interlacciamento con l’equazione del calore (diffusivi) e senza. Lo studio dei migliori schemi è stato approfondito e questi ultimi sono stati confrontati con successo con la versione precedentemente proposta dello schema esplicito INT 1/4 con diffusione [1]. In seguito è stata realizzata una versione computazionalmente più efficiente dei migliori schemi di risoluzione attraverso l’uso di una struttura piramidale ottenuta per sotto-campionamento dell’immagine. Questa versione riduce i tempi di calcolo con limitati cali di performance. Il tuning dei parametri caratteristici del detector è stato effettuato utilizzando un set di immagini varianti per scala, sfocamento (blur), punto di vista, compressione jpeg e variazione di luminosità noto come Oxford dataset. Sullo stesso sono stati ricavati risultati sperimentali che identificano la proposta presentata come il nuovo stato dell’arte. Per confrontare le performance di detection con altri detector allo stato dell’arte sono stati utilizzati tre ulteriori dataset che prendono il nome di Untextured dataset, Symbench dataset e Robot dataset. Questi ultimi contengono variazioni di illuminazione, momento di cattura, scala e punto di vista. I detector sviluppati risultano i migliori sull’Untextured dataset, raggiungono performance simili al miglior detector disponibile sul Symbench dataset e rappresentano il nuovo stato dell’arte sul Robot dataset.
APA, Harvard, Vancouver, ISO, and other styles
21

Labudová, Kristýna. "Rozpoznávání obrazů pro ovládání robotické ruky." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-316835.

Full text
Abstract:
This thesis concerns with processing of embedded terminals’ images and their classification. There is problematics of moire noise reduction thought filtration in frequency domain and the image normalization for further processing analyzed. Keypoints detectors and descriptors are used for image classification. Detectors FAST and Harris corner detector and descriptors SURF, BRIEF and BRISK are emphasized as well as their evaluation in terms of potential contribution to this work.
APA, Harvard, Vancouver, ISO, and other styles
22

Jelínek, Ondřej. "Podobnost obrazů na základě bodů zájmu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2015. http://www.nusl.cz/ntk/nusl-220409.

Full text
Abstract:
This paper presents a new object detection method. The method is based on keypoints analysis and their parameters. Computed parameters are used for building a decision model using machine learning methods. The model is able to detect object in the picture based on input data and compares its similarity to the chosen example. The new method is described in detail, its accuracy is evaluated and this accuracy is compared to other existing detectors. The new method’s detection ability is by more than 40% better than detection ability of detectors like SURF. In order to understand the object detection this paper describes the process step by step including popular algorithms designed for specific roles in each step.
APA, Harvard, Vancouver, ISO, and other styles
23

Hashimoto, Marcelo. "Detecção de objetos por reconhecimento de grafos-chave." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-22012014-080625/.

Full text
Abstract:
Detecção de objetos é um problema clássico em visão computacional, presente em aplicações como vigilância automatizada, análise de imagens médicas e recuperação de informação. Dentre as abordagens existentes na literatura para resolver esse problema, destacam-se métodos baseados em reconhecimento de pontos-chave que podem ser interpretados como diferentes implementações de um mesmo arcabouço. O objetivo desta pesquisa de doutorado é desenvolver e avaliar uma versão generalizada desse arcabouço, na qual reconhecimento de pontos-chave é substituído por reconhecimento de grafos-chave. O potencial da pesquisa reside na riqueza de informação que um grafo pode apresentar antes e depois de ser reconhecido. A dificuldade da pesquisa reside nos problemas que podem ser causados por essa riqueza, como maldição da dimensionalidade e complexidade computacional. Três contribuições serão incluídas na tese: a descrição detalhada de um arcabouço para detecção de objetos baseado em grafos-chave, implementações fiéis que demonstram sua viabilidade e resultados experimentais que demonstram seu desempenho.
Object detection is a classic problem in computer vision, present in applications such as automated surveillance, medical image analysis and information retrieval. Among the existing approaches in the literature to solve this problem, we can highlight methods based on keypoint recognition that can be interpreted as different implementations of a same framework. The objective of this PhD thesis is to develop and evaluate a generalized version of this framework, on which keypoint recognition is replaced by keygraph recognition. The potential of the research resides in the information richness that a graph can present before and after being recognized. The difficulty of the research resides in the problems that can be caused by this richness, such as curse of dimensionality and computational complexity. Three contributions are included in the thesis: the detailed description of a keygraph-based framework for object detection, faithful implementations that demonstrate its feasibility and experimental results that demonstrate its performance.
APA, Harvard, Vancouver, ISO, and other styles
24

Runeskog, Henrik. "Continuous Balance Evaluation by Image Analysis of Live Video : Fall Prevention Through Pose Estimation." Thesis, KTH, Skolan för kemi, bioteknologi och hälsa (CBH), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297541.

Full text
Abstract:
The deep learning technique Human Pose Estimation (or Human Keypoint Detection) is a promising field in tracking a person and identifying its posture. As posture and balance are two closely related concepts, the use of human pose estimation could be applied to fall prevention. By deriving the location of a persons Center of Mass and thereafter its Center of Pressure, one can evaluate the balance of a person without the use of force plates or sensors and solely using cameras. In this study, a human pose estimation model together with a predefined human weight distribution model were used to extract the location of a persons Center of Pressure in real time. The proposed method utilized two different methods of acquiring depth information from the frames - stereoscopy through two RGB-cameras and with the use of one RGB-depth camera. The estimated location of the Center of Pressure were compared to the location of the same parameter extracted while using the force plate Wii Balance Board. As the proposed method were to operate in real-time and without the use of computational processor enhancement, the choice of human pose estimation model were aimed to maximize software input/output speed. Thus, three models were used - one smaller and faster model called Lightweight Pose Network, one larger and accurate model called High-Resolution Network and one model placing itself somewhere in between the two other models, namely Pose Residual Network. The proposed method showed promising results for a real-time method of acquiring balance parameters. Although the largest source of error were the acquisition of depth information from the cameras. The results also showed that using a smaller and faster human pose estimation model proved to be sufficient in relation to the larger more accurate models in real-time usage and without the use of computational processor enhancement.
Djupinlärningstekniken Kroppshållningsestimation är ett lovande medel gällande att följa en person och identifiera dess kroppshållning. Eftersom kroppshållning och balans är två närliggande koncept, kan användning av kroppshållningsestimation appliceras till fallprevention. Genom att härleda läget för en persons tyngdpunkt och därefter läget för dess tryckcentrum, kan utvärdering en persons balans genomföras utan att använda kraftplattor eller sensorer och att enbart använda kameror. I denna studie har en kroppshållningsestimationmodell tillsammans med en fördefinierad kroppsviktfördelning använts för att extrahera läget för en persons tryckcentrum i realtid. Den föreslagna metoden använder två olika metoder för att utvinna djupseende av bilderna från kameror - stereoskopi genom användning av två RGB-kameror eller genom användning av en RGB-djupseende kamera. Det estimerade läget av tryckcentrat jämfördes med läget av samma parameter utvunnet genom användning av tryckplattan Wii Balance Board. Eftersom den föreslagna metoden var ämnad att fungera i realtid och utan hjälp av en GPU, blev valet av kroppshållningsestimationsmodellen inriktat på att maximera mjukvaruhastighet. Därför användes tre olika modeller - en mindre och snabbare modell vid namn Lightweight Pose Network, en större och mer träffsäker modell vid namn High-Resolution Network och en model som placerar sig någonstans mitt emellan de två andra modellerna gällande snabbhet och träffsäkerhet vid namn Pose Resolution Network. Den föreslagna metoden visade lovande resultat för utvinning av balansparametrar i realtid, fastän den största felfaktorn visade sig vara djupseendetekniken. Resultaten visade att användning av en mindre och snabbare kroppshållningsestimationsmodellen påvisar att hålla måttet i jämförelse med större och mer träffsäkra modeller vid användning i realtid och utan användning av externa dataprocessorer.
APA, Harvard, Vancouver, ISO, and other styles
25

Pedreira, Carabel Carlos Javier. "Terrain Mapping for Autonomous Vehicles." Thesis, KTH, Datorseende och robotik, CVAP, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-174132.

Full text
Abstract:
Autonomous vehicles have become the forefront of the automotive industry nowadays, looking to have safer and more efficient transportation systems. One of the main issues for every autonomous vehicle consists in being aware of its position and the presence of obstacles along its path. The current project addresses the pose and terrain mapping problem integrating a visual odometry method and a mapping technique. An RGB-D camera, the Kinect v2 from Microsoft, was chosen as sensor for capturing information from the environment. It was connected to an Intel mini-PC for real-time processing. Both pieces of hardware were mounted on-board of a four-wheeled research concept vehicle (RCV) to test the feasibility of the current solution at outdoor locations. The Robot Operating System (ROS) was used as development environment with C++ as programming language. The visual odometry strategy consisted in a frame registration algorithm called Adaptive Iterative Closest Keypoint (AICK) based on Iterative Closest Point (ICP) using Oriented FAST and Rotated BRIEF (ORB) as image keypoint extractor. A grid-based local costmap rolling window type was implemented to have a two-dimensional representation of the obstacles close to the vehicle within a predefined area, in order to allow further path planning applications. Experiments were performed both offline and in real-time to test the system at indoors and outdoors scenarios. The results confirmed the viability of using the designed framework to keep tracking the pose of the camera and detect objects in indoor environments. However, outdoor environments evidenced the limitations of the features of the RGB-D sensor, making the current system configuration unfeasible for outdoor purposes.
Autonoma fordon har blivit spetsen för bilindustrin i dag i sökandet efter säkrare och effektivare transportsystem. En av de viktigaste sakerna för varje autonomt fordon består i att vara medveten om sin position och närvaron av hinder längs vägen. Det aktuella projektet behandlar position och riktning samt terrängkartläggningsproblemet genom att integrera en visuell distansmätnings och kartläggningsmetod. RGB-D kameran Kinect v2 från Microsoft valdes som sensor för att samla in information från omgivningen. Den var ansluten till en Intel mini PC för realtidsbehandling. Båda komponenterna monterades på ett fyrhjuligt forskningskonceptfordon (RCV) för att testa genomförbarheten av den nuvarande lösningen i utomhusmiljöer. Robotoperativsystemet (ROS) användes som utvecklingsmiljö med C++ som programmeringsspråk. Den visuella distansmätningsstrategin bestod i en bildregistrerings-algoritm som kallas Adaptive Iterative Closest Keypoint (AICK) baserat på Iterative Closest Point (ICP) med hjälp av Oriented FAST och Rotated BRIEF (ORB) som nyckelpunktsutvinning från bilder. En rutnätsbaserad lokalkostnadskarta av rullande-fönster-typ implementerades för att få en tvådimensionell representation av de hinder som befinner sig nära fordonet inom ett fördefinierat område, i syfte att möjliggöra ytterligare applikationer för körvägen. Experiment utfördes både offline och i realtid för att testa systemet i inomhus- och utomhusscenarier. Resultaten bekräftade möjligheten att använda den utvecklade metoden för att spåra position och riktning av kameran samt upptäcka föremål i inomhusmiljöer. Men utomhus visades begränsningar i RGB-D-sensorn som gör att den aktuella systemkonfigurationen är värdelös för utomhusbruk.
APA, Harvard, Vancouver, ISO, and other styles
26

Rocha, Beatriz Gonçalves. "Automated Detection of Bone structure keypoints on Magnetic Resonance imaging - Sternum and Clavicles." Master's thesis, 2018. https://repositorio-aberto.up.pt/handle/10216/116477.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Rocha, Beatriz Gonçalves. "Automated Detection of Bone structure keypoints on Magnetic Resonance imaging - Sternum and Clavicles." Dissertação, 2018. https://repositorio-aberto.up.pt/handle/10216/116477.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Mokhtari, Djamila. "Détection des chutes par calcul homographique." Thèse, 2012. http://hdl.handle.net/1866/8869.

Full text
Abstract:
La vidéosurveillance a pour objectif principal de protéger les personnes et les biens en détectant tout comportement anormal. Ceci ne serait possible sans la détection de mouvement dans l’image. Ce processus complexe se base le plus souvent sur une opération de soustraction de l’arrière-plan statique d’une scène sur l’image. Mais il se trouve qu’en vidéosurveillance, des caméras sont souvent en mouvement, engendrant ainsi, un changement significatif de l’arrière-plan; la soustraction de l’arrière-plan devient alors problématique. Nous proposons dans ce travail, une méthode de détection de mouvement et particulièrement de chutes qui s’affranchit de la soustraction de l’arrière-plan et exploite la rotation de la caméra dans la détection du mouvement en utilisant le calcul homographique. Nos résultats sur des données synthétiques et réelles démontrent la faisabilité de cette approche.
The main objective of video surveillance is to protect persons and property by detecting any abnormal behavior. This is not possible without detecting motion in the image. This process is often based on the concept of subtraction of the scene background. However in video tracking, the cameras are themselves often in motion, causing a significant change of the background. So, background subtraction techniques become problematic. We propose in this work a motion detection approach, with the example application of fall detection. This approach is free of background subtraction for a rotating surveillance camera. The method uses the camera rotation to detect motion by using homographic calculation. Our results on synthetic and real video sequences demonstrate the feasibility of this approach.
APA, Harvard, Vancouver, ISO, and other styles
29

Liu, Wen-Pin, and 劉文彬. "A face recognition system based on keypoint exclusion and dual keypoint detection." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/02572728630414645978.

Full text
Abstract:
碩士
銘傳大學
電腦與通訊工程學系碩士班
103
This thesis presents a face recognition system based on keypoint exclusion and dual keypoiont detection. There are three major problems with conventional SIFT (Scale Invariant Feature Transform). (1) It uses single type keypoint detector. For images of small size the number of detected keypoints may be too small and this causes difficulties on image matching. (2) Each keypoint of the test image is matched independently against all keypoints of the training images. This is very time consuming. (3) Only similarities between descriptors are compared and this may still causes some false matches. To increase the number of keypoints, SIFT and FAST (Features from accelerated segment test) keypoints are combined for face image matching. Since there is no corresponding descriptor for FAST detector, the LOG (Laplace of Gaussian) function with Automatic Scale Selection is applied on each FAST keypoint to find proper scales and corresponding SIFT descriptors. On the other hand, based on the similarities between locations of features on human faces, three keypoint exclusion methods (relative location, orientation, and scale) are proposed to eliminate impossible keypoints for further descriptor matching. In this way, the number of false matches can be reduced and hence higher recognition rates can be obtained. On the other hand, matching time can also be reduced. The proposed algorithms are evaluated with the ORL and the Yale face databases. Each database pick 10 person, every person get 10 image. Our proposed method shows significantly improvements on recognition rates over conventional methods.
APA, Harvard, Vancouver, ISO, and other styles
30

Filipe, Sílvio Brás. "Biologically motivated keypoint detection for RGB-D data." Doctoral thesis, 2016. http://hdl.handle.net/10400.6/4387.

Full text
Abstract:
With the emerging interest in active vision, computer vision researchers have been increasingly concerned with the mechanisms of attention. Therefore, several visual attention computational models inspired by the human visual system, have been developed, aiming at the detection of regions of interest in images. This thesis is focused on selective visual attention, which provides a mechanism for the brain to focus computational resources on an object at a time, guided by low-level image properties (Bottom-Up attention). The task of recognizing objects in different locations is achieved by focusing on different locations, one at a time. Given the computational requirements of the models proposed, the research in this area has been mainly of theoretical interest. More recently, psychologists, neurobiologists and engineers have developed cooperation's and this has resulted in considerable benefits. The first objective of this doctoral work is to bring together concepts and ideas from these different research areas, providing a study of the biological research on human visual system and a discussion of the interdisciplinary knowledge in this area, as well as the state-of-art on computational models of visual attention (bottom-up). Normally, the visual attention is referred by engineers as saliency: when people fix their look in a particular region of the image, that's because that region is salient. In this research work, saliency methods are presented based on their classification (biological plausible, computational or hybrid) and in a chronological order. A few salient structures can be used for applications like object registration, retrieval or data simplification, being possible to consider these few salient structures as keypoints when aiming at performing object recognition. Generally, object recognition algorithms use a large number of descriptors extracted in a dense set of points, which comes along with very high computational cost, preventing real-time processing. To avoid the problem of the computational complexity required, the features have to be extracted from a small set of points, usually called keypoints. The use of keypoint-based detectors allows the reduction of the processing time and the redundancy in the data. Local descriptors extracted from images have been extensively reported in the computer vision literature. Since there is a large set of keypoint detectors, this suggests the need of a comparative evaluation between them. In this way, we propose to do a description of 2D and 3D keypoint detectors, 3D descriptors and an evaluation of existing 3D keypoint detectors in a public available point cloud library with 3D real objects. The invariance of the 3D keypoint detectors was evaluated according to rotations, scale changes and translations. This evaluation reports the robustness of a particular detector for changes of point-of-view and the criteria used are the absolute and the relative repeatability rate. In our experiments, the method that achieved better repeatability rate was the ISS3D method. The analysis of the human visual system and saliency maps detectors with biological inspiration led to the idea of making an extension for a keypoint detector based on the color information in the retina. Such proposal produced a 2D keypoint detector inspired by the behavior of the early visual system. Our method is a color extension of the BIMP keypoint detector, where we include both color and intensity channels of an image: color information is included in a biological plausible way and multi-scale image features are combined into a single keypoints map. This detector is compared against state-of-art detectors and found particularly well-suited for tasks such as category and object recognition. The recognition process is performed by comparing the extracted 3D descriptors in the locations indicated by the keypoints after mapping the 2D keypoints locations to the 3D space. The evaluation allowed us to obtain the best pair keypoint detector/descriptor on a RGB-D object dataset. Using our keypoint detector and the SHOTCOLOR descriptor a good category recognition rate and object recognition rate were obtained, and it is with the PFHRGB descriptor that we obtain the best results. A 3D recognition system involves the choice of keypoint detector and descriptor. A new method for the detection of 3D keypoints on point clouds is presented and a benchmarking is performed between each pair of 3D keypoint detector and 3D descriptor to evaluate their performance on object and category recognition. These evaluations are done in a public database of real 3D objects. Our keypoint detector is inspired by the behavior and neural architecture of the primate visual system: the 3D keypoints are extracted based on a bottom-up 3D saliency map, which is a map that encodes the saliency of objects in the visual environment. The saliency map is determined by computing conspicuity maps (a combination across different modalities) of the orientation, intensity and color information, in a bottom-up and in a purely stimulusdriven manner. These three conspicuity maps are fused into a 3D saliency map and, finally, the focus of attention (or "keypoint location") is sequentially directed to the most salient points in this map. Inhibiting this location automatically allows the system to attend to the next most salient location. The main conclusions are: with a similar average number of keypoints, our 3D keypoint detector outperforms the other eight 3D keypoint detectors evaluated by achiving the best result in 32 of the evaluated metrics in the category and object recognition experiments, when the second best detector only obtained the best result in 8 of these metrics. The unique drawback is the computational time, since BIK-BUS is slower than the other detectors. Given that differences are big in terms of recognition performance, size and time requirements, the selection of the keypoint detector and descriptor has to be matched to the desired task and we give some directions to facilitate this choice. After proposing the 3D keypoint detector, the research focused on a robust detection and tracking method for 3D objects by using keypoint information in a particle filter. This method consists of three distinct steps: Segmentation, Tracking Initialization and Tracking. The segmentation is made to remove all the background information, reducing the number of points for further processing. In the initialization, we use a keypoint detector with biological inspiration. The information of the object that we want to follow is given by the extracted keypoints. The particle filter does the tracking of the keypoints, so with that we can predict where the keypoints will be in the next frame. In a recognition system, one of the problems is the computational cost of keypoint detectors with this we intend to solve this problem. The experiments with PFBIKTracking method are done indoors in an office/home environment, where personal robots are expected to operate. The Tracking Error evaluates the stability of the general tracking method. We also quantitatively evaluate this method using a "Tracking Error". Our evaluation is done by the computation of the keypoint and particle centroid. Comparing our system that the tracking method which exists in the Point Cloud Library, we archive better results, with a much smaller number of points and computational time. Our method is faster and more robust to occlusion when compared to the OpenniTracker.
Com o interesse emergente na visão ativa, os investigadores de visão computacional têm estado cada vez mais preocupados com os mecanismos de atenção. Por isso, uma série de modelos computacionais de atenção visual, inspirado no sistema visual humano, têm sido desenvolvidos. Esses modelos têm como objetivo detetar regiões de interesse nas imagens. Esta tese está focada na atenção visual seletiva, que fornece um mecanismo para que o cérebro concentre os recursos computacionais num objeto de cada vez, guiado pelas propriedades de baixo nível da imagem (atenção Bottom-Up). A tarefa de reconhecimento de objetos em diferentes locais é conseguida através da concentração em diferentes locais, um de cada vez. Dados os requisitos computacionais dos modelos propostos, a investigação nesta área tem sido principalmente de interesse teórico. Mais recentemente, psicólogos, neurobiólogos e engenheiros desenvolveram cooperações e isso resultou em benefícios consideráveis. No início deste trabalho, o objetivo é reunir os conceitos e ideias a partir dessas diferentes áreas de investigação. Desta forma, é fornecido o estudo sobre a investigação da biologia do sistema visual humano e uma discussão sobre o conhecimento interdisciplinar da matéria, bem como um estado de arte dos modelos computacionais de atenção visual (bottom-up). Normalmente, a atenção visual é denominada pelos engenheiros como saliência, se as pessoas fixam o olhar numa determinada região da imagem é porque esta região é saliente. Neste trabalho de investigação, os métodos saliência são apresentados em função da sua classificação (biologicamente plausível, computacional ou híbrido) e numa ordem cronológica. Algumas estruturas salientes podem ser usadas, em vez do objeto todo, em aplicações tais como registo de objetos, recuperação ou simplificação de dados. É possível considerar estas poucas estruturas salientes como pontos-chave, com o objetivo de executar o reconhecimento de objetos. De um modo geral, os algoritmos de reconhecimento de objetos utilizam um grande número de descritores extraídos num denso conjunto de pontos. Com isso, estes têm um custo computacional muito elevado, impedindo que o processamento seja realizado em tempo real. A fim de evitar o problema da complexidade computacional requerido, as características devem ser extraídas a partir de um pequeno conjunto de pontos, geralmente chamados pontoschave. O uso de detetores de pontos-chave permite a redução do tempo de processamento e a quantidade de redundância dos dados. Os descritores locais extraídos a partir das imagens têm sido amplamente reportados na literatura de visão por computador. Uma vez que existe um grande conjunto de detetores de pontos-chave, sugere a necessidade de uma avaliação comparativa entre eles. Desta forma, propomos a fazer uma descrição dos detetores de pontos-chave 2D e 3D, dos descritores 3D e uma avaliação dos detetores de pontos-chave 3D existentes numa biblioteca de pública disponível e com objetos 3D reais. A invariância dos detetores de pontoschave 3D foi avaliada de acordo com variações nas rotações, mudanças de escala e translações. Essa avaliação retrata a robustez de um determinado detetor no que diz respeito às mudanças de ponto-de-vista e os critérios utilizados são as taxas de repetibilidade absoluta e relativa. Nas experiências realizadas, o método que apresentou melhor taxa de repetibilidade foi o método ISS3D. Com a análise do sistema visual humano e dos detetores de mapas de saliência com inspiração biológica, surgiu a ideia de se fazer uma extensão para um detetor de ponto-chave com base na informação de cor na retina. A proposta produziu um detetor de ponto-chave 2D inspirado pelo comportamento do sistema visual. O nosso método é uma extensão com base na cor do detetor de ponto-chave BIMP, onde se incluem os canais de cor e de intensidade de uma imagem. A informação de cor é incluída de forma biológica plausível e as características multi-escala da imagem são combinadas num único mapas de pontos-chave. Este detetor é comparado com os detetores de estado-da-arte e é particularmente adequado para tarefas como o reconhecimento de categorias e de objetos. O processo de reconhecimento é realizado comparando os descritores 3D extraídos nos locais indicados pelos pontos-chave. Para isso, as localizações do pontos-chave 2D têm de ser convertido para o espaço 3D. Isto foi possível porque o conjunto de dados usado contém a localização de cada ponto de no espaço 2D e 3D. A avaliação permitiu-nos obter o melhor par detetor de ponto-chave/descritor num RGB-D object dataset. Usando o nosso detetor de ponto-chave e o descritor SHOTCOLOR, obtemos uma noa taxa de reconhecimento de categorias e para o reconhecimento de objetos é com o descritor PFHRGB que obtemos os melhores resultados. Um sistema de reconhecimento 3D envolve a escolha de detetor de ponto-chave e descritor, por isso é apresentado um novo método para a deteção de pontos-chave em nuvens de pontos 3D e uma análise comparativa é realizada entre cada par de detetor de ponto-chave 3D e descritor 3D para avaliar o desempenho no reconhecimento de categorias e de objetos. Estas avaliações são feitas numa base de dados pública de objetos 3D reais. O nosso detetor de ponto-chave é inspirado no comportamento e na arquitetura neural do sistema visual dos primatas. Os pontos-chave 3D são extraídas com base num mapa de saliências 3D bottom-up, ou seja, um mapa que codifica a saliência dos objetos no ambiente visual. O mapa de saliência é determinada pelo cálculo dos mapas de conspicuidade (uma combinação entre diferentes modalidades) da orientação, intensidade e informações de cor de forma bottom-up e puramente orientada para o estímulo. Estes três mapas de conspicuidade são fundidos num mapa de saliência 3D e, finalmente, o foco de atenção (ou "localização do ponto-chave") está sequencialmente direcionado para os pontos mais salientes deste mapa. Inibir este local permite que o sistema automaticamente orientado para próximo local mais saliente. As principais conclusões são: com um número médio similar de pontos-chave, o nosso detetor de ponto-chave 3D supera os outros oito detetores de pontos-chave 3D avaliados, obtendo o melhor resultado em 32 das métricas avaliadas nas experiências do reconhecimento das categorias e dos objetos, quando o segundo melhor detetor obteve apenas o melhor resultado em 8 dessas métricas. A única desvantagem é o tempo computacional, uma vez que BIK-BUS é mais lento do que os outros detetores. Dado que existem grandes diferenças em termos de desempenho no reconhecimento, de tamanho e de tempo, a seleção do detetor de ponto-chave e descritor tem de ser interligada com a tarefa desejada e nós damos algumas orientações para facilitar esta escolha neste trabalho de investigação. Depois de propor um detetor de ponto-chave 3D, a investigação incidiu sobre um método robusto de deteção e tracking de objetos 3D usando as informações dos pontos-chave num filtro de partículas. Este método consiste em três etapas distintas: Segmentação, Inicialização do Tracking e Tracking. A segmentação é feita de modo a remover toda a informação de fundo, a fim de reduzir o número de pontos para processamento futuro. Na inicialização, usamos um detetor de ponto-chave com inspiração biológica. A informação do objeto que queremos seguir é dada pelos pontos-chave extraídos. O filtro de partículas faz o acompanhamento dos pontoschave, de modo a se poder prever onde os pontos-chave estarão no próximo frame. As experiências com método PFBIK-Tracking são feitas no interior, num ambiente de escritório/casa, onde se espera que robôs pessoais possam operar. Também avaliado quantitativamente este método utilizando um "Tracking Error". A avaliação passa pelo cálculo das centróides dos pontos-chave e das partículas. Comparando o nosso sistema com o método de tracking que existe na biblioteca usada no desenvolvimento, nós obtemos melhores resultados, com um número muito menor de pontos e custo computacional. O nosso método é mais rápido e mais robusto em termos de oclusão, quando comparado com o OpenniTracker.
APA, Harvard, Vancouver, ISO, and other styles
31

Lourenço, António Miguel. "Techniques for keypoint detection and matching between endoscopic images." Master's thesis, 2009. http://hdl.handle.net/10316/11318.

Full text
Abstract:
The detection and description of local image features is fundamental for different computer vision applications, such as object recognition, image content retrieval, and structure from motion. In the last few years the topic deserved the attention of different authors, with several methods and techniques being currently available in the literature. The SIFT algorithm, proposed in [2], gained particular prominence because of its simplicity and invariance to common image transformations like scaling and rotation. Unfortunately the approach is not able to cope with non-linear image deformations caused by radial lens distortion. The invariance to radial distortion is highly relevant for applications that either require a wide field of view (e.g. panoramic vision), or employ cameras with specific optical arrangements enabling the visualization of small spaces and cavities (e.g. medical endoscopy). One of the objectives of this thesis is to understand how radial distortion impacts the detection and description of keypoints using the SIFT algorithm. We perform a set of experiments that clearly show that distortion affects both the repeatability of detection and the invariance of the SIFT description. These results are analyzed in detail and explained from a theoretical viewpoint. In addition, we propose a novel approach for detection and description of stable local features in images with radial distortion. The detection is carried in a scale-space image representation built using an adaptive gaussian filter that takes into account distortion, and the feature description is performed after implicit gradient correction using the derivative chain rule. Our approach only requires a rough modeling of the radial distortion function and, for moderate levels of distortion, it outperforms the application of the SIFT algorithm after explicit image correction.
APA, Harvard, Vancouver, ISO, and other styles
32

Chen, Ting-Kai, and 陳定楷. "Laser-Based SLAM Using Segmenting Keypoint Detection and B-SHOT Feature." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/53hp8q.

Full text
Abstract:
碩士
國立臺灣大學
電機工程學研究所
106
Simultaneous localization and mapping is a basic and essential part of the autonomous driving research. Environment information gathered from sensors is computed and derives a consistent state of both self-driving car and the environment. Many types of sensor have been utilized in SLAM research, including camera and LiDAR. LiDAR can provide precise depth information, but suffers from the sparsity compared to camera images. Two main methods have been used in LiDAR-based SLAM: direct method and modeling after segmentation. Direct method first extracts interesting points, such as edge points or corner points, to reduce the point cloud size. ICP or Kalman-based filter are then applied to estimate the transformation from frame to frame. Although this method can be adopted in every scenario, the quality of estimation is hard to evaluate. Instead of directly using original point cloud, model-based method first segment point cloud into subsets, and then models each subset with a defined model. Finally, frame-to-frame transformation is estimated from models. However, the model-based method is prone to the environment which has less defined models. In this thesis, a feature-based SLAM algorithm, which is inspired from ORB-SLAM, is proposed on only LiDAR data. In the proposed algorithm, unnecessary points, such as ground points and occluded edge points, are removed by point cloud preprocessing module. Next, the keypoints are selected according to their segment ratio and encoded by B-SHOT feature descriptor. Frame-to-local-map transformation is then estimated based on the B-SHOT feature and refined by iterative closest point algorithm. The experimental results show that the estimated result of the proposed algorithm is consistent in the structural scenarios of ITRI dataset.
APA, Harvard, Vancouver, ISO, and other styles
33

Syu, Jhih-Wei, and 許智維. "A Keypoint Detector Based on Local Contrast Intensity Images." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/24007133216259018431.

Full text
Abstract:
碩士
逢甲大學
通訊工程所
98
Corners, junctions, and terminals represent prominent local features in images. They are named keypoints. Keypoint detection is a vital step in many applications such as pattern recognition and image registration. The purpose of this thesis is to develop a keypoint detector based on local contrast intensity. Initially an input image is enhanced by a compressive mapping curve and then is transformed to a line-type image by computing absolute local contrast. Subsequently, the local contrast intensity image is applied to the multi-scale and multi-orientation Gaussian second-order derivative filters. The outputs of the filters are used to detect the high curvature points. False keypoints which occur at linear edges or in noisy texture areas are eliminated by an automatic threshold scheme. Finally, the performance of the proposed method was evaluated by both the receiver operating characteristic curve and the recall and precision curve. In addition, it was compared with other methods.
APA, Harvard, Vancouver, ISO, and other styles
34

HUANG, YAN-CHENG, and 黃彥誠. "VLSI Implementation of LATCH Descriptor with ORB Keypoint Detector." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/28btmr.

Full text
Abstract:
碩士
國立高雄應用科技大學
電子工程系
106
Computer vision is an important part in today's machine learning. How to make the machine have the same human visual ability to automatically identify and analyze the content of the images is an important research topic, such as video surveillance, autonomous car navigation systems and intelligent robot. Feature extraction and classification are two main steps in object recognition. ORB is an algorithm used in computer vision to detect and describe images. It is rotation invariant and high speed. However, the accuracy is not good enough. LATCH is a good binary representation method and maintain a good and reliable identification rate. By combining ORB keypoint detector and LATCH descriptor, a novel feature extraction method and associated VLSI architecture is presented in this thesis. Using some approximated methods to replace the complex operations, we develop an efficient ORB-LATCH circuit. The pipelined hardware architecture for the proposed design is implemented by using Verilog and synthesized with SYNOPSYS Design Compiler in TSMC 0.13μm cell library. The circuit needs 206.6K gate counts, and achieves 100MHz. The throughputs are 50.76×106 pixels per second.
APA, Harvard, Vancouver, ISO, and other styles
35

lin, wei-cheng, and 林威成. "Integrating keypoint detector and visual attention mechanism into one framework." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/15257952582962715841.

Full text
Abstract:
碩士
逢甲大學
電機與通訊工程所
98
Corners, intersections, and high curvature points represent prominent features in images. These features are named keypoints. The two contributions of this thesis are a new keypoint detector based on enhanced local contrast and a novel salient region detector using the scheme of the proposed keypoint detector. We compared the developed keypoint detector with other methods in terms of correspondences and matching percentages in pairs of images generated by different viewing angles or blurring conditions. The experimental results show the robustness of the proposed keypoint detector. The salient region detector operates on a low resolution image which is generated by successively down-sampling the original input image. The performance of the proposed salient region detector is comparable to that of human subjects.
APA, Harvard, Vancouver, ISO, and other styles
36

YONG, LIM SOO, and 林詩詠. "Automatic Video Shot Boundary Detection Using a Hybrid Approach of HLFPN and Keypoint Matching." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/59537649932771006642.

Full text
Abstract:
碩士
國立臺北大學
資訊工程學系
103
Shot boundary detection (SBD) is an important and fundamental step in video content analysis such as content-based video indexing, browsing, and retrieval. In this paper, we present a hybrid SBD method by integrating a technique of high-level fuzzy Petri net (HLFPN) and keypoint matching. The HLFPN with histogram difference is executed as a pre-detection. Next, the speeded up robust features (SURF) algorithm that is reliably robust to image affine transformation and illumination variation is used to figure out the possible false shots and gradual transition based on the assumption from HLFPN. The top-down design can effectively lower down the computational complexity of SURF algorithm. The proposed algorithm has increased the precision of SBD and can be applied to different types of videos.
APA, Harvard, Vancouver, ISO, and other styles
37

Lourenço, António Miguel Marques Rodrigues Teixeira. "Keypoint Detection, Matching, and Tracking in Images with Non-Linear Distortion: Applications in Medical Endoscopy and Panoramic Vision." Doctoral thesis, 2015. http://hdl.handle.net/10316/27055.

Full text
Abstract:
Tese de doutoramento em Engenharia Electrotécnica e de Computadores, ramo de especialização em Automação e Robótica, apresentada ao Departamento de Engenharia Electrotécnica e de Computadores da Faculdade de Ciências e Tecnologia da Universidade de Coimbra.
Point correspondences between different views are the input to many computer vision algorithms with a multitude of purposes that range from camera calibration to image content retrieval, and pass by structure-from-motion, registration, and mosaicking. Establishing such correspondences is particularly difficult, not only in the case of wide-baseline and/or strong change in viewpoint, but also when images present significant non-linear distortions. The thesis addresses this last problem and investigates solutions for detecting, matching, and tracking points in images acquired by cameras with unconventional optics such as fish-eye lenses, catadioptric sensors, or medical endoscopes. We start by studying the impact of radial distortion in keypoint detection and description using the well known SIFT algorithm. Such study leads to several modifications to the original method that substantially improve matching performance in images with wide field-of-view. Our work is conclusive in showing that non-linear distortion must be implicitly handled by a suitable design of filters and operators, as opposed to being explicitly corrected via image warping. The benefits of such approach are demonstrated in experiments of structure-from-motion, as well as in the development of a vision-system for indoor localization where perspective images are used to retrieve panoramic views acquired with a catadioptric camera. In a second line of research, we investigate solutions for feature tracking in continuous sequences acquired by cameras with radial distortion. We build on the top of the conventional frameworks for image region alignment and propose specific deformation models that simultaneously describe the effect of local image motion and global image distortion. It is shown for the first time that image distortion can be calibrated at each frame time instant by tracking a random set of salient points. The result is further explored to solve the problem of knowing the intrinsic calibration of cameras with motorised zoom at all times. This problem is particularly relevant in the context of medical endoscopy and the solution passes by combining off-line calibration with on-line tracking to update of the camera focal length. The effectiveness of our tracking and calibration approaches are validated in both medical and non-medical video sequences. The last contribution is a pipeline for visual odometry in stereo laparoscopy that relies in multi-model fitting for segmenting different rigid motions and implicitly discarding regions of non-rigid deformation. This is complemented by a temporal clustering scheme that enables to decide which parts of the scene should be used to estimate the camera motion in a reliable manner.
Correspondências de pontos entre imagens da mesma cena são o argumento de entrada para muitos algoritmos de visão por computador, como por exemplo calibração de câmaras, reconhecimento de imagens e recuperação de movimento e estrutura 3D da cena. O cálculo de correspondências é particularmente difícil, não só devido a deslocamentos de câmara e mudanças de ponto de vista, mas também devido à presença de deformação não-linear, como é o caso de distorção radial. Esta tese investiga o último problema e propõe soluções para deteção, correspondência e seguimento de pontos em imagens adquiridas com câmaras equipadas com ópticas não convencionais, como lentes olho-de-peixe, sensores catadióptricos e endoscópios/laparoscópios médicos. Esta tese começa por estudar o impacto da distorção radial na deteção e descrição de pontos de interesse do método SIFT. Este estudo leva a várias modificações ao método original que permitem melhorias substanciais no desempenho em imagens adquiridas com câmaras com largo campo de visão. É demonstrado que a distorção não-linear deve ser implicitamente compensada através da adaptação dos operadores de imagem em vez de rectificar as imagens para a remover. Os benefícios desta nova solução são validados com experiências de recuperação de movimento e através de um sistema de visão que usa uma base de dados de imagens catadióptricas georeferenciadas para reconhecimento de localizações dentro de edifícios. Numa segunda linha de investigação são estudadas soluções para seguimento de pontos de interesse em sequências contínuas de imagens com distorção radial. Usando como base o actual estado da arte para registo de imagens, são propostas soluções para descrever simultaneamente o efeito do movimento local e distorção global da imagem. É demonstrado pela primeira vez que a distorção radial na imagem pode ser calibrada em cada instante de tempo através do seguimento de pontos de interesse. Esta solução é ainda explorada para resolver o problema de calibração de câmaras com zoom motorizado. Este problema é particularmente relevante no contexto de endoscopia médica e a solução passa por combinar calibração offline com calibração online usando o seguimento de pontos para actualizar a distância focal da câmara. A eficácia dos algoritmos de seguimento e calibração é validada em sequências de vídeo médicas e não-médicas. A última contribuição desta tese é um método para odometria visual em laparoscopia éstereo que utiliza técnicas de estimação de mútiplos modelos para segmentar a cena em zonas rígidas e não-rígidas. De modo a complementar a segmentação inicial um esquema de clustering temporal é usado para decidir quais zonas da cena devem ser utilizadas para âncorar a estimação do movimento da câmara.
FCT- SFRH/BD/63118/2009
APA, Harvard, Vancouver, ISO, and other styles
38

Melo, César Gonçalo Macedo. "Sistema EdgeAI para monitorização e notificação de diferentes graus de risco em contexto Covid19." Master's thesis, 2021. http://hdl.handle.net/1822/76550.

Full text
Abstract:
Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e Computadores
Atualmente, a população atravessa uma situação epidemiológica e sanitária graves a uma escala mundial, provocada pela doença da Covid-19, originada pelo vírus SARS-CoV-2. Conhecida pela grande velocidade de progragação e facilidade de transmissão, tem conduzido a efeitos sociais, económicos e políticos devastadores em todo o mundo. A elevada taxa de pessoas assintomáticas à doença, isto é, que a têm presente no seu organismo, mas que não apresentam quaisquer sintomas, faz com que por vezes existam descuidos e atitudes inconscientes em relação ao cumprimento das regras impostas para o controlo da pandemia. Para minimizar as situações de risco e de possível exposição negligente ao vírus, existe a necessidade de desenvolver metodologias que permitem monitorizar o comportamento das pessoas em espaços públicos e superfícies comerciais. Desta forma, o objetivo desta dissertação passa pela aplicação de técnicas de Machine Learning (ML) capazes de identificar fatores e comportamentos de risco por parte das pessoas que possam proporcionar o aumento de contágios e propagação do vírus dentro da comunidade. Com recurso a algoritmos de Deep Learning (DL) integrados em um sistema de EdgeAI, pretende-se monitorizar a presença ou ausência de máscara por parte das pessoas em espaços onde a sua utilização é obrigatória, bem como executar de forma pontual medições de temperatura bastante precisas como meio de identificação de pessoas em possível estado febril. Esta dissertação pode ser dividida em três capítulos principais: Deteção de máscaras em ambiente urbano, Deteção de temperatura febril e Construção de Protótipo. No módulo Deteção de máscaras em ambiente urbano, são apresentadas as técnicas e recursos utilizados para a geração do dataset que serviu de base ao treino de algoritmos de DL para deteção de presença ou ausência de máscaras, bem como a implementação e avaliação dos algoritmos selecionados. Este dataset é constituído tanto por imagens Red Green Blue (RGB) reais, como também por imagens RGB sintéticas, de forma a aumentar a quantidade e variabilidade dos dados. No módulo Deteção de temperatura febril, são igualmente enunciadas as metodologias utilizadas na geração de um dataset para deteção dos pontos faciais onde a temperatura é medida com maior precisão, bem como estabelecido o comparativo entre os demais algoritmos utilizados. Neste caso, o dataset é constituído por imagens termográficas, a partir da agregação de datasets já existentes bem como de um dataset originado a partir da recolha de imagens no laboratório onde esta dissertação foi desenvolvida. Por último, no módulo Construção de Protótipo, são apresentadas as especificações tecnológicas e funcionais que constituem o protótipo construído no âmbito desta dissertação. O sistema final foi implementado na plataforma embebida NVIDIA Jetson Xavier NX, que detém a capacidade de aceleração da performance de algoritmos de Artificial Intelligence (AI). Neste sistema foi desenvolvida uma interface gráfica de uso fácil e interativo para o utilizador, onde estão presentes as diferentes inferências associadas à aplicação dos algoritmos desenvolvidos nos módulos anteriores, com base nas imagens recolhidas de câmaras de vigilância e uma câmara termográfica, onde será monitorizada a presença ou ausência de máscara e medição de temperatura, respetivamente. Para a componente RGB (deteção de máscaras) foi utilizado o modelo mais leve da versão cinco da arquitetura You Only Look Once (YOLO), onde foi atingida uma precisão média de 82.4% entre as classes a detetar e um tempo de inferência no sistema embebido de 0.032 segundos. Para a componente termográfica foi utilizada uma arquitetura que contém como a camada de extração de características a rede Resnet-50, e posteriores camadas de desconvolução responsáveis pela extração dos pontos faciais pretendidos, cuja precisão média foi de 78.7%.
Currently, the population is going through a serious epidemiological and health situation on a world scale, due to the Covid-19 disease, caused by the SARS-CoV-2 virus. Known for its huge speed of propagation and easy transmission, it has been responsable for a devastating social, economic and political consequences all over the world. The high rate of asymptomatic persons that has the disease, but do not feel any symptons, sometimes results in careless and unconscious behaviors related to the rules that were imposed to control the pandemic. To minimize the risk and possible negligent exposure to the virus, there is a need associated to the development of methodologies to control the people’s behavior in public spaces and commercial surfaces. Therefore, the main goal of this dissertation involves the application of ML techniques, capable of identify risky factors and behaviors from the people’s actions that can provide the increase of the number of infections and the spread of the virus in the community. Using DL algorithms integrated in an EdgeAI system, the main goals are the control of the presence or absence of mask by people in spaces where its use is mandatory, as well as to perform very accurate temperature measurements to identify people with high body temperature. This dissertation can be divided in three main chapters: Detection of masks in urban environments, Temperature measurements and Prototype Construction. In the detection of masks in urban environments module, is presented the techniques and resources used to generate the dataset that served as the base for the train of DL algorithms, as well as the implementation and evaluation of the selected algorithms. This dataset was made up with the use of real and synthetic RGB images, in order to increase the amount and variability of the data. In the temperature measurements module is shown the methodologies used to generate the dataset for the detection of the facial points where the measure of the temperature is made with more accuracy, and is presented the comparison between all the trained algorithms for this task. In this case, the dataset is formed by thermal images, based on the aggregation of existing datasets, and images collected in the laboratory where this dissertation was developed. Finally, in the prototype construction module, are shown the technological and functional specifications that form the prototype built under this dissertation. The final system was developed on the embedded platform called NVIDIA Jetson Xavier NX, which has the ability to accelerate the performance of AI algorithms. In this system was developed a graphical interface with easy interaction for the user, where the inferences from the different algorithms developed in the previous models are implemented, based on images collected from surveillance and thermal cameras, where are monitorized the tasks related to the presence or absence of mask and temperature measurement, respectively. For the RGB component, it was selected the lighter version of the YOLOv5 architecture, where was achieved an average accuracy of 82.4% between the associated classes, and a inference time of 0.032 seconds in the embedded system. For the thermal component, the used model contains the Resnet-50 network as backbone, responsable for the feature extraction task, and then are used deconvolution layers for the extraction of the desired facial points. This model achieved an average precision of 78,7%.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography