To see the other types of publications on this topic, follow the link: Deep Learning techniques.

Dissertations / Theses on the topic 'Deep Learning techniques'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Deep Learning techniques.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Hossain, Md Zakir. "Deep learning techniques for image captioning." Thesis, Hossain, Md. Zakir (2020) Deep learning techniques for image captioning. PhD thesis, Murdoch University, 2020. https://researchrepository.murdoch.edu.au/id/eprint/60782/.

Full text
Abstract:
Generating a description of an image is called image captioning. Image captioning is a challenging task because it involves the understanding of the main objects, their attributes, and their relationships in an image. It also involves the generation of syntactically and semantically meaningful descriptions of the images in natural language. A typical image captioning pipeline comprises an image encoder and a language decoder. Convolutional Neural Networks (CNNs) are widely used as the encoder while Long short-term memory (LSTM) networks are used as the decoder. A variety of LSTMs and CNNs including attention mechanisms are used to generate meaningful and accurate captions. Traditional image captioning techniques have limitations in generating semantically meaningful and superior captions. In this research, we focus on advanced image captioning techniques, which are able to generate semantically more meaningful and superior captions. As such we have made four contributions in this thesis. First, we investigate an attention based LSTM on image features extracted by DenseNet, which is a newer type of CNN. We integrate DenseNet features with attention mechanism and we show that this combination can generate more relevant image captions than other CNNs. Second, we use bi-directional self-attention as a language decoder. Bi-directional decoder can capture the context in both forward and backward directions, i.e., past context as well as any future context, in caption generation. Consequently, the generated captions are more meaningful and superior to those generated by typical LSTMs and CNNs. Third, we further extend the work by using an additional CNN layer to incorporate the structured local context together with the past and the future contexts attained by Bi-directional LSTM. A pooling scheme namely Attention Pooling is also used to enhance the information extraction capability of the pooling layer. Consequently, it is able to generate contextually superior captions. Fourth, existing image captioning techniques use human-annotated real images for training and testing, which involve an expensive and time-consuming process. Moreover, nowadays bulk of the images are synthetic or generated by machines. There is also a need for generating captions for such images. We investigate the use of synthetic images for training and testing image captioning. We show that such images can help improving the captions of real images and they can effectively be used in caption generation of synthetic images.
APA, Harvard, Vancouver, ISO, and other styles
2

Domeniconi, Federico. "Deep Learning Techniques applied to Photometric Stereo." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20031/.

Full text
Abstract:
La tesi si focalizza sullo studio dello stato dell’arte della fotometria stereo con deep learning: Self-calibrating Deep Photometric Stereo Networks. Il modello è composto è composto di due reti, la prima predice la direzione e l’intensità delle luci, la seconda predice le normali della superficie. L’obiettivo della tesi è individuare i limiti del modello e capire se possa essere modifcato per avere buone prestazioni anche in scenari reali. Il progetto di tesi è basato su fine-tuning, una tecnica supervisionata di transfer learning. Per questo scopo un nuovo dataset è stato creato acquisendo immagini in laboratorio. La ground-truth è ottenuta tramite una tecnica di distillazione. In particolare la direzione delle luci è ottenuta utilizzando due algoritmi di calibrazione delle luci e unendo i due risultati. Analogamente le normali delle superfici sono ottenute unendo i risultati di vari algoritmi di fotometria stereo. I risultati della tesi sono molto promettenti. L’errore nella predizione della direzione e dell’intensità delle luci è un terzo dell’errore del modello originale. Le predizioni delle normali delle superfici possono essere analizzate solo qualitativamente, ma i miglioramenti sono evidenti. Il lavoro di questa tesi ha mostrato che è possibile applicare transfer-learning alla fotometria stereo con deep learning. Perciò non è necessario allenare un nuovo modello da zero ma è possibile approfittare di modelli già esistenti per migliorare le prestazioni e ridurre il tempo di allenamento.
APA, Harvard, Vancouver, ISO, and other styles
3

Cruz, Edmanuel. "Robotics semantic localization using deep learning techniques." Doctoral thesis, Universidad de Alicante, 2020. http://hdl.handle.net/10045/109462.

Full text
Abstract:
The tremendous technological advance experienced in recent years has allowed the development and implementation of algorithms capable of performing different tasks that help humans in their daily lives. Scene recognition is one of the fields most benefited by these advances. Scene recognition gives different systems the ability to define a context for the identification or recognition of objects or places. In this same line of research, semantic localization allows a robot to identify a place semantically. Semantic classification is currently an exciting topic and it is the main goal of a large number of works. Within this context, it is a challenge for a system or for a mobile robot to identify semantically an environment either because the environment is visually different or has been gradually modified. Changing environments are challenging scenarios because, in real-world applications, the system must be able to adapt to these environments. This research focuses on recent techniques for categorizing places that take advantage of DL to produce a semantic definition for a zone. As a contribution to the solution of this problem, in this work, a method capable of updating a previously trained model is designed. This method was used as a module of an agenda system to help people with cognitive problems in their daily tasks. An augmented reality mobile phone application was designed which uses DL techniques to locate a customer’s location and provide useful information, thus improving their shopping experience. These solutions will be described and explained in detail throughout the following document.
APA, Harvard, Vancouver, ISO, and other styles
4

Nguyen, Tien Dung. "Multimodal emotion recognition using deep learning techniques." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/180753/1/Tien%20Dung_Nguyen_Thesis.pdf.

Full text
Abstract:
This thesis investigates the use of deep learning techniques to address the problem of machine understanding of human affective behaviour and improve the accuracy of both unimodal and multimodal human emotion recognition. The objective was to explore how best to configure deep learning networks to capture individually and jointly, the key features contributing to human emotions from three modalities (speech, face, and bodily movements) to accurately classify the expressed human emotion. The outcome of the research should be useful for several applications including the design of social robots.
APA, Harvard, Vancouver, ISO, and other styles
5

Singh, Praveer. "Processing high-resolution images through deep learning techniques." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1172.

Full text
Abstract:
Dans cette thèse, nous discutons de quatre scénarios d’application différents qui peuvent être largement regroupés dans le cadre plus large de l’analyse et du traitement d’images à haute résolution à l’aide de techniques d’apprentissage approfondi. Les trois premiers chapitres portent sur le traitement des images de télédétection (RS) captées soit par avion, soit par satellite à des centaines de kilomètres de la Terre. Nous commençons par aborder un problème difficile lié à l’amélioration de la classification des scènes aériennes complexes par le biais d’un paradigme d’apprentissage profondément faiblement supervisé. Nous montrons comment en n’utilisant que les étiquettes de niveau d’image, nous pouvons localiser efficacement les régions les plus distinctives dans les scènes complexes et éliminer ainsi les ambiguïtés qui mènent à une meilleure performance de classification dans les scènes aériennes très complexes. Dans le deuxième chapitre, nous traiterons de l’affinement des étiquettes de segmentation des empreintes de pas des bâtiments dans les images aériennes. Pour ce faire, nous détectons d’abord les erreurs dans les masques de segmentation initiaux et corrigeons uniquement les pixels de segmentation où nous trouvons une forte probabilité d’erreurs. Les deux prochains chapitres de la thèse portent sur l’application des Réseaux Adversariatifs Génératifs. Dans le premier, nous construisons un modèle GAN nuageux efficace pour éliminer les couches minces de nuages dans l’imagerie Sentinel-2 en adoptant une perte de consistance cyclique. Ceci utilise une fonction de perte antagoniste pour mapper des images nuageuses avec des images non nuageuses d’une manière totalement non supervisée, où la perte cyclique aide à contraindre le réseau à produire une image sans nuage correspondant a` l’image nuageuse d’entrée et non à aucune image aléatoire dans le domaine cible. Enfin, le dernier chapitre traite d’un ensemble différent d’images `à haute résolution, ne provenant pas du domaine RS mais plutôt de l’application d’imagerie à gamme dynamique élevée (HDRI). Ce sont des images 32 bits qui capturent toute l’étendue de la luminance présente dans la scène. Notre objectif est de les quantifier en images LDR (Low Dynamic Range) de 8 bits afin qu’elles puissent être projetées efficacement sur nos écrans d’affichage normaux tout en conservant un contraste global et une qualité de perception similaires à ceux des images HDR. Nous adoptons un modèle GAN multi-échelle qui met l’accent à la fois sur les informations plus grossières et plus fines nécessaires aux images à haute résolution. Les sorties finales cartographiées par ton ont une haute qualité subjective sans artefacts perçus
In this thesis, we discuss four different application scenarios that can be broadly grouped under the larger umbrella of Analyzing and Processing high-resolution images using deep learning techniques. The first three chapters encompass processing remote-sensing (RS) images which are captured either from airplanes or satellites from hundreds of kilometers away from the Earth. We start by addressing a challenging problem related to improving the classification of complex aerial scenes through a deep weakly supervised learning paradigm. We showcase as to how by only using the image level labels we can effectively localize the most distinctive regions in complex scenes and thus remove ambiguities leading to enhanced classification performance in highly complex aerial scenes. In the second chapter, we deal with refining segmentation labels of Building footprints in aerial images. This we effectively perform by first detecting errors in the initial segmentation masks and correcting only those segmentation pixels where we find a high probability of errors. The next two chapters of the thesis are related to the application of Generative Adversarial Networks. In the first one, we build an effective Cloud-GAN model to remove thin films of clouds in Sentinel-2 imagery by adopting a cyclic consistency loss. This utilizes an adversarial lossfunction to map cloudy-images to non-cloudy images in a fully unsupervised fashion, where the cyclic-loss helps in constraining the network to output a cloud-free image corresponding to the input cloudy image and not any random image in the target domain. Finally, the last chapter addresses a different set of high-resolution images, not coming from the RS domain but instead from High Dynamic Range Imaging (HDRI) application. These are 32-bit imageswhich capture the full extent of luminance present in the scene. Our goal is to quantize them to 8-bit Low Dynamic Range (LDR) images so that they can be projected effectively on our normal display screens while keeping the overall contrast and perception quality similar to that found in HDR images. We adopt a Multi-scale GAN model that focuses on both coarser as well as finer-level information necessary for high-resolution images. The final tone-mapped outputs have a high subjective quality without any perceived artifacts
APA, Harvard, Vancouver, ISO, and other styles
6

FANTAZZINI, ALICE. "Deep Learning Techniques to Support Endovascular Surgical Procedures." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1076603.

Full text
Abstract:
Clinical Problem. Medical image analysis plays a crucial role in all the stages included in endovascular surgery, from screening to follow-up monitoring. Given the growing availability of clinical images, automatic tools that can process data in a quick and effective way are essential for clinical support. Methods. In this thesis, deep learning (DL) methodologies are designed to support clinicians in three different phases of endovascular surgery: the preoperative phase, the intraoperative phase, and the postoperative phase. In the preoperative phase, deep learning is exploited to perform automatic segmentation of aortic lumen and thrombus dealing with spatial coherence. Then, geometric measurements are extracted from the segmentation, allowing geometric evaluation and aneurysm screening. For the intraoperative phase, a deep learning model is used as a surrogate of finite-element analysis to predict the intraoperative aortic deformations induced by tools-tissue interaction. Finally, for the postoperative phase, deep learning is exploited to perform aortic lumen segmentation and geometric analysis is performed on multiple follow-up patient acquisitions. Results. For the preoperative stage, the developed segmentation pipelines provided better results compared to state-of-the art approaches. Automated geometric measurements showed comparable results to manual ones, and aneurysm screening provided promising results. For the intraoperative stage, the deep learning model showed good accuracy in predicting intraoperative aortic deformations. For the postoperative stage, the preliminary longitudinal analysis of aortic geometry showed that landing zone diameters tend to change over the follow-up acquisitions. Conclusions. This work presents a platform for the automatic analysis of CTA scans of patients affected by aortic diseases. The developed methodologies allow to rapidly process large image databases; the results of such analysis (e.g., thrombus and lumen segmentation, geometric measurements) can be useful in the research field as well as in clinical practice.
APA, Harvard, Vancouver, ISO, and other styles
7

Calvanese, Giordano. "Volumetric deep learning techniques in oil & gas exploration." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20556/.

Full text
Abstract:
This work consisted in the study and application of volumetric Deep Learning (DL) approach to seismic data provided by Eni S.p.A., with an industrial utility perspective. After a series of fruitful meetings with the Upstream & Technical Services team, we clearly defined the final objective of this approach: the automatic search for geological structures such as turbidite channel-bases, as potential regions of interest for the Oil & Gas industry. Therefore, we defined a workflow based on the training of volumetric DL models over seismic horizons containing channel bases providing “windrose” input patches, i.e. a planar approximation of a three-dimensional volume. All components and sources of criticality were systematically analyzed. For this purpose we studied: the effect of preprocessing, the contribution of the dataset augmentation, the sensitivity for the channel-base manual segmentation, the effect of the spatial expansion of the input patches. Evaluating both qualitatively and quantitatively through K-fold cross-validation. This work showed: how an appropriate preprocessing of the original data substantially helps DL models, how the dataset augmentation is fundamental for good model generalization given the poor representativity of the accessible examples compared to all possible configurations, how this DL approach is susceptible to the channel-base segmentation imposing to invest sufficient effort in the generation of reliable labels, how the size of input patches must be large enough to allow models to perceive around each voxel the structure concavity and the texture of any sediment infill. We conclude that the volumetric DL approach developed in this work has proved to be very promising.
APA, Harvard, Vancouver, ISO, and other styles
8

De, la Torre Gallart Jordi. "Diabetic Retinopathy Classification and Interpretation using Deep Learning Techniques." Doctoral thesis, Universitat Rovira i Virgili, 2019. http://hdl.handle.net/10803/667077.

Full text
Abstract:
La retinopatia diabètica és una malaltia crònica i una de les principals causes de ceguesa i discapacitat visual en els pacients diabètics. L'examen ocular a través d'imatges de la retina és utilitzat pels metges per detectar les lesions relacionades amb aquesta malaltia. En aquesta tesi, explorem diferents mètodes innovadors per a la classificació automàtica del grau de malaltia utilitzant imatges del fons d'ull. Per a aquest propòsit, explorem mètodes basats en l'extracció i classificació automàtica, basades en xarxes neuronals profundes. A més, dissenyem un nou mètode per a la interpretació dels resultats. El model està concebut de manera modular per a que pugui ser utilitzat en d'altres xarxes i dominis de classificació. Demostrem experimentalment que el nostre model d'interpretació és capaç de detectar lesions de retina a la imatge únicament a partir de la informació de classificació. A més, proposem un mètode per comprimir la representació interna de la informació de la xarxa. El mètode es basa en una anàlisi de components independents sobre la informació del vector d'atributs intern de la xarxa generat pel model per a cada imatge. Usant el nostre mètode d'interpretació esmentat anteriorment també és possible visualitzar aquests components en la imatge. Finalment, presentem una aplicació experimental del nostre millor model per classificar imatges de retina d'una població diferent, concretament de l'Hospital de Reus. Els mètodes proposats arriben al nivell de rendiment de l'oftalmòleg i són capaços d'identificar amb gran detall les lesions presents en les imatges, que es dedueixen només de la informació de classificació de la imatge.
La retinopatía diabética es una enfermedad crónica y una de las principales causas de ceguera y discapacidad visual en los pacientes diabéticos. El examen ocular a través de imágenes de la retina es utilizado por los médicos para detectar las lesiones relacionadas con esta enfermedad. En esta tesis, exploramos diferentes métodos novedosos para la clasificación automática del grado de enfermedad utilizando imágenes del fondo de la retina. Para este propósito, exploramos métodos basados en la extracción y clasificación automática, basadas en redes neuronales profundas. Además, diseñamos un nuevo método para la interpretación de los resultados. El modelo está concebido de manera modular para que pueda ser utilizado utilizando otras redes y dominios de clasificación. Demostramos experimentalmente que nuestro modelo de interpretación es capaz de detectar lesiones de retina en la imagen únicamente a partir de la información de clasificación. Además, proponemos un método para comprimir la representación interna de la información de la red. El método se basa en un análisis de componentes independientes sobre la información del vector de atributos interno de la red generado por el modelo para cada imagen. Usando nuestro método de interpretación mencionado anteriormente también es posible visualizar dichos componentes en la imagen. Finalmente, presentamos una aplicación experimental de nuestro mejor modelo para clasificar imágenes de retina de una población diferente, concretamente del Hospital de Reus. Los métodos propuestos alcanzan el nivel de rendimiento del oftalmólogo y son capaces de identificar con gran detalle las lesiones presentes en las imágenes, que se deducen solo de la información de clasificación de la imagen.
Diabetic Retinopathy is a chronic disease and one of the main causes of blindness and visual impairment for diabetic patients. Eye screening through retinal images is used by physicians to detect the lesions related with this disease. In this thesis, we explore different novel methods for the automatic diabetic retinopathy disease grade classification using retina fundus images. For this purpose, we explore methods based in automatic feature extraction and classification, based on deep neural networks. Furthermore, as results reported by these models are difficult to interpret, we design a new method for results interpretation. The model is designed in a modular manner in order to generalize its possible application to other networks and classification domains. We experimentally demonstrate that our interpretation model is able to detect retina lesions in the image solely from the classification information. Additionally, we propose a method for compressing model feature-space information. The method is based on a independent component analysis over the disentangled feature space information generated by the model for each image and serves also for identifying the mathematically independent elements causing the disease. Using our previously mentioned interpretation method is also possible to visualize such components on the image. Finally, we present an experimental application of our best model for classifying retina images of a different population, concretely from the Hospital de Reus. The methods proposed, achieve ophthalmologist performance level and are able to identify with great detail lesions present on images, inferred only from image classification information.
APA, Harvard, Vancouver, ISO, and other styles
9

Rangel, José Carlos. "Scene Understanding for Mobile Robots exploiting Deep Learning Techniques." Doctoral thesis, Universidad de Alicante, 2017. http://hdl.handle.net/10045/72503.

Full text
Abstract:
Every day robots are becoming more common in the society. Consequently, they must have certain basic skills in order to interact with humans and the environment. One of these skills is the capacity to understand the places where they are able to move. Computer vision is one of the ways commonly used for achieving this purpose. Current technologies in this field offer outstanding solutions applied to improve data quality every day, therefore producing more accurate results in the analysis of an environment. With this in mind, the main goal of this research is to develop and validate an efficient object-based scene understanding method that will be able to help solve problems related to scene identification for mobile robotics. We seek to analyze state-of-the-art methods for finding the most suitable one for our goals, as well as to select the kind of data most convenient for dealing with this issue. Another primary goal of the research is to determine the most suitable data input for analyzing scenes in order to find an accurate representation for the scenes by meaning of semantic labels or point cloud features descriptors. As a secondary goal we will show the benefits of using semantic descriptors generated with pre-trained models for mapping and scene classification problems, as well as the use of deep learning models in conjunction with 3D features description procedures to build a 3D object classification model that is directly related with the representation goal of this work. The research described in this thesis was motivated by the need for a robust system capable of understanding the locations where a robot usually interacts. In the same way, the advent of better computational resources has allowed to implement some already defined techniques that demand high computational capacity and that offer a possible solution for dealing with scene understanding issues. One of these techniques are Convolutional Neural Networks (CNNs). These networks have the capacity of classifying an image based on their visual appearance. Then, they generate a list of lexical labels and the probability for each label, representing the likelihood of the present of an object in the scene. Labels are derived from the training sets that the networks learned to recognize. Therefore, we could use this list of labels and probabilities as an efficient representation of the environment and then assign a semantic category to the regions where a mobile robot is able to navigate, and at the same time construct a semantic or topological map based on this semantic representation of the place. After analyzing the state-of-the-art in Scene Understanding, we identified a set of approaches in order to develop a robust scene understanding procedure. Among these approaches we identified an almost unexplored gap in the topic of understanding scenes based on objects present in them. Consequently, we propose to perform an experimental study in this approach aimed at finding a way of fully describing a scene considering the objects lying in place. As the Scene Understanding task involves object detection and annotation, one of the first steps is to determine the kind of data to use as input data in our proposal. With this in mind, our proposal considers to evaluate the use of 3D data. This kind of data suffers from the presence of noise, therefore, we propose to use the Growing Neural Gas (GNG) algorithm to reduce noise effect in the object recognition procedure. GNGs have the capacity to grow and adapt their topology to represent 2D information, producing a smaller representation with a slight noise influence from the input data. Applied to 3D data, the GNG presents a good approach able to tackle with noise. However, using 3D data poses a set of problems such as the lack of a 3D object dataset with enough models to generalize methods and adapt them to real situations, as well as the fact that processing three-dimensional data is computationally expensive and requires a huge storage space. These problems led us to explore new approaches for developing object recognition tasks. Therefore, considering the outstanding results obtained by the CNNs in the latest ImageNet challenge, we propose to carry out an evaluation of the former as an object detection system. These networks were initially proposed in the 90s and are nowadays easily implementable due to hardware improvements in the recent years. CNNs have shown satisfying results when they tested in problems such as: detection of objects, pedestrians, traffic signals, sound waves classification, and for medical image processing, among others. Moreover, an aggregate value of CNNs is the semantic description capabilities produced by the categories/labels that the network is able to identify and that could be translated as a semantic explanation of the input image. Consequently, we propose using the evaluation of these semantic labels as a scene descriptor for building a supervised scene classification model. Having said that, we also propose using semantic descriptors to generate topological maps and test the description capabilities of lexical labels. In addition, semantic descriptors could be suitable for unsupervised places or environment labeling, so we propose using them to deal with this kind of problem in order to achieve a robust scene labeling method. Finally, for tackling the object recognition problem we propose to develop an experimental study for unsupervised object labeling. This will be applied to the objects present in a point cloud and labeled using a lexical labeling tool. Then, objects will be used as the training instances of a classifier mixing their 3D features with label assigned by the external tool.
APA, Harvard, Vancouver, ISO, and other styles
10

Fan, Gao. "Clustering and Deep Learning Techniques for Structural Health Monitoring." Thesis, Curtin University, 2020. http://hdl.handle.net/20.500.11937/80611.

Full text
Abstract:
This thesis proposes the development and application of clustering and deep learning techniques for improved automated modal identification, lost vibration data recovery, vibration signal denoising, and dynamic response reconstruction under operational and extreme loading conditions in the area of structural health monitoring. The effectiveness and performances of the proposed approaches are validated by numerical and experimental studies. The outstanding results demonstrate that these proposed approaches are reliable and very promising for practical applications.
APA, Harvard, Vancouver, ISO, and other styles
11

ALI, ARSLAN. "Deep learning techniques for biometric authentication and robust classification." Doctoral thesis, Politecnico di Torino, 2021. http://hdl.handle.net/11583/2910084.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Beretta, Davide. "Experience Replay in Sparse Rewards Problems using Deep Reinforcement Techniques." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17531/.

Full text
Abstract:
In questo lavoro si introduce il lettore al Reinforcement Learning, un'area del Machine Learning su cui negli ultimi anni è stata fatta molta ricerca. In seguito vengono presentate alcune modifiche ad ACER, un algoritmo noto e molto interessante che fa uso di Experience Replay. Lo scopo è quello di cercare di aumentarne le performance su problemi generali ma in particolar modo sugli sparse reward problem. Per verificare la bontà delle idee proposte è utilizzato Montezuma's Revenge, un gioco sviluppato per Atari 2600 e considerato tra i più difficili da trattare.
APA, Harvard, Vancouver, ISO, and other styles
13

Pham, Cuong X. "Advanced techniques for data stream analysis and applications." Thesis, Griffith University, 2023. http://hdl.handle.net/10072/421691.

Full text
Abstract:
Deep learning (DL) is one of the most advanced AI techniques that has gained much attention in the last decade and has also been applied in many successful applications such as market stock prediction, object detection, and face recognition. The rapid advances in computational techniques like Graphic Processing Units (GPU) and Tensor Processing Units (TPU) have made it possible to train large deep learning models to obtain high accuracy surpassing human ability in some tasks, e.g., LipNet [9] achieves 93% of accuracy compared with 52% of human to recognize the word from speaker lips movement. Most of the current deep learning research work has focused on designing a deep architecture working in a static environment where the whole training set is known in advance. However, in many real-world applications like predicting financial markets, autonomous cars, and sensor networks, the data often comes in the form of streams with massive volume, and high velocity has affected the scalability of deep learning models. Learning from such data is called continual, incremental, or online learning. When learning a deep model in dynamic environments where the data come from streams, the modern deep learning models usually suffer the so-called catastrophic forgetting problem, one of the most challenging issues that have not been solved yet. Catastrophic forgetting occurs when a model learns new knowledge, i.e., new objects or classes, but its performance in the previously learned classes reduces significantly. The cause of catastrophic forgetting in the deep learning model has been identified and is related to the weight-sharing property. In detail, the model updating the corresponding weights to capture knowledge of the new tasks may push the learned weights of the past tasks away and cause the model performance to degrade. According to the stability-plasticity dilemma [17], if the model weights are too stable, it will not be able to acquire new knowledge, while a model with high plasticity can have large weight changes leading to significant forgetting of the previously learned patterns. Many approaches have been proposed to tackle this issue, like imposing constraints on weights (regularizations) or rehearsal from experience, but significant research gap still exists. First, current regularization methods often do not simultaneously consider class imbalance and catastrophic forgetting. Moreover, these methods usually require more memory to store previous versions of the model, which sometimes is not able to hold a copy of a substantial deep model due to memory constraints. Second, existing rehearsal approaches pay little attention to selecting and storing critical instances that help the model to retain as much knowledge of the learned tasks. This study focuses on dealing with these challenges by proposing several novel methods. We first proposed a new loss function that combines two loss terms to deal with class imbalance data and catastrophic forgetting simultaneously. The former is a modification of a widely used loss function for class imbalance learning, called Focal loss, to handle the exploding gradient (loss goes to NaN) and the ability to learn from highly confident data points. At the same time, the latter is a novel loss term that addresses the catastrophic forgetting within the current mini-batch. In addition, we also propose an online convolution neural network (OCNN) architecture for tabular data that act as a base classifier in an ensemble system (OECNN). Next, we introduce a rehearsal-based method to prevent catastrophic forgetting. In which we select a triplet of instances within each mini-batch to store in the memory buffer. We find that these instances are identified as crucial instances that can help either remind the model of easy tasks or revise for the hard ones. We also propose a class-wise forgetting detector that monitors the performance of each class encountered so far in a stream. If a class’s performance drops below a predefined threshold, that class is identified as a forgetting class. Finally, based on the nature of data which often comprises many modalities, we study online multi-modal multi-task (M3T) learning problems. Unlike the traditional methods in stable environments, online M3T learning need to be considered in many scenarios like missing modalities and incremental tasks. We establish the setting for six frequently happened scenarios for M3T. Most of the existing works in M3T fail to run on all of these scenarios. Therefore, we propose a novel M3T deep learning model called UniCNet that can work on all of these scenarios and achieves superior performance compared with state-of-the-art M3T methods. To conclude, this dissertation contributes to novel computational techniques that deal with catastrophic forgetting problem in continual deep learning.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Info & Comm Tech
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
14

Belloni, Carole. "Deep learning and featured-based classification techniques for radar imagery." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2019. http://www.theses.fr/2019IMTA0164.

Full text
Abstract:
Une plateforme autonome en mouvement dotée d'un système radar peut générer des images Radar à Synthèse d'Ouverture (RSO ou SAR). Ces images fournissent des informations stratégiques pour des applications civiles et militaires. Elles peuvent être acquises de jour commede nuit dans des conditions météorologiques variées. Des algorithmes visant à la Reconnaissance Automatique de Cible (RAC ou ATR) sont alors utiles pour assister voire automatiser la prise dedécision. En effet, l’interprétation de ces images peut être complexe, y compris pour un opérateur expérimenté. La classification d'images du domaine visible génère un intérêt important des chercheurs, en partie grâce à la profusion des données. Par conséquent, des méthodes robustes de classification par descripteurs et deep learning ont été développées pour les images visibles. A l’inverse, une problématique essentielle rencontrée lors du développement d'algorithmes pour la RAC RSO est la rareté des données accessibles au public. Une difficulté supplémentaire est la variabilité des phénomènes physiques lors de l’acquisition radar. Les méthodes de classification des images optiques pourraient être adaptées pour les images RSO. Une nouvelle base de données d'images RSO Inverse (RSOI ou ISAR) est proposée dans cette thèse. Elle contient des images d'entraînement et de test obtenues dans des configurations variées. Une technique visant à générer des images artificielles supplémentaires est aussi développée. L’objectif est d’améliorer l’efficacité de l’apprentissage des algorithmes de classification nécessitant de nombreuses images d'entraînement, tels que les réseaux de neurones. Cette technique consiste à simuler un bruit SAR réaliste sur les images initiales. Une segmentation basée sur des Modèles de Mélange de Gaussiennes (MMG ou GMM) est adaptée à des images RSO à polarisation simple. Des descripteurs conçus pour caractériser des images optiques sont utilisés dans le domaine RSO afin de classifier des cibles après segmentation et leurs performances respectives sont comparées. Une nouvelle architecture de réseau de neurones, appelée pose-informed, est développée. Elle prend en compte les effets de l’orientation de la cible sur son apparence dans les images RSO. Les résultats présentés montrent que cette architecture permet une amélioration significative de la classification par rapport à une architecture standard. Au-delà des performances, un enjeu cléréside dans l’explicativité des méthodes issues du deep learning. Un ensemble d’outils analytiques sont présentés afin faciliter la compréhension du processus de décision du réseau de neurones. Ils permettent, entre autres, l’identification des zones vues comme essentielles à la classification par le réseau de neurones
Autonomous moving platforms carrying radar systems can synthesise long antenna apertures and generate Synthetic Aperture Radar (SAR) images. SAR images provide strategic information for military and civilian applications and they can be acquired day and night under a wide range of weather conditions. Because the interpretation of SAR images is a common challenge, Automatic Target Recognition (ATR) algorithms can help assist with decision-making when the operator is in the loop or when the platforms are fully autonomous. One of the main limitations of developing SAR ATR algorithms is the lack of suitable and publicly available data. Optical images classification, instead, has recently attracted significantly more research interest because of the number of potential applications and the profusion of data. As a result, robust feature-based and deep learning classification methods have been developed for optical imaging that could be applied to the SAR domain. In this thesis, a new Inverse SAR (ISAR) dataset consisting of test and training images acquired under a range of geometrical conditions is presented. In addition, a method is proposed to generate extra synthetic images, by simulating realistic SAR noise on the original images, and increase the training efficiency of classification algorithms that require a wealth of data, such as deep neural networks. A Gaussian Mixture Model (GMM) segmentation approach is adapted to segment single-polarised SAR images of targets. Features proposed to characterise optical images are transferred to the SAR domain to carry out target classification after segmentation and their respective performanceis compared. A new pose-informed deep learning network architecture, that takes into account the effects of target orientation on target appearance in a SAR image, is proposed. The results presented in this thesis show that the use of this architecture provides a significant performance improvement for almost all datasets used in this work over a baseline network. Understanding the decision-making process of deep networks is another key challenge of deep learning. To address this issue, a new set of analytical tools is proposed that enables the identification, amongst other things, of the location of the algorithm focus points that lead to high level classification performance
APA, Harvard, Vancouver, ISO, and other styles
15

Zandavi, Seid Miad. "Indoor Autonomous Flight Using Deep Learning-Based Image Understanding Techniques." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/22893.

Full text
Abstract:
Indoor autonomous flight using artificial intelligence (AI) and machine learning techniques is presented. Flying inside a building without a positioning system requires a particular framework to connect computer vision, machine learning, control theory, and AI. The framework consists of six modules/disciplines presented to support indoor autonomous flight: optimization, state estimation, control, object detection, deep learning, and guidance. In this regard, the mathematical model of a quadcopter/drone is derived from an accurate model with a high level of fidelity by considering the non-linearity, uncertainties, and coupling. For the optimization module, a new heuristic optimization algorithm is designed to solve nonlinear optimization problems. The proposed algorithm utilizes a stochastic method to reach the optimal point based on simplex techniques. Swarm simplexes are distributed stochastically in the search space to locate the best optimal point. The designed algorithm is applied to 25 well-known benchmarks, and its performance is compared with Particle Swarm Optimization (PSO), the Nelder-Mead simplex algorithm, and the Grey Wolf Optimizer (GWO), both on its own and in hybrid forms where it is combined with either pattern search (hGWO-PS) or random exploratory search algorithms (hGWO-RES). The numerical results show that the presented algorithm, called Stochastic Dual Simplex Algorithm (SDSA), exhibits competitive performance in terms of accuracy and complexity. This feature makes SDSA efficient for tuning hyper-parameters and achieving the optimal weights of the reconstructed layer in deep learning modules. For the second filter module, a novel filter for nonlinear system state estimation is represented. This new filter formulates the state estimation problem as a stochastic dynamic optimization problem and utilizes a new stochastic method based on a genetic algorithm to find and track the best estimation. The experimental results show that the performance of the proposed filter, named Genetic Filter (GF), is competitive in comparison to that of classical and heuristic filters. GF is implemented to estimate unknown parameters required for the control. For the third control module, a new Proportional-Integral-Derivative-Accelerated (PIDA) control with a derivative filter was designed to improve quadcopter flight stability in a noisy environment. SDSA tunes the proposed PIDA controller associated with the objective of controlling. The simulation results show that the proposed control scheme is able to track the desired point in the presence of disturbances. Thus, the desired point is generated by extracting contextual information from images. For the fourth feature selection module, a novel multi-region feature-selection method is proposed to define histogram values of basic areas and random areas, from which it combines with continuous ant colony filter detection to represent the original target. The presented approach also achieves smooth tracking on different video sequences, especially with the motion blur problem. Both target recognition and tracking of the dynamic target are critical features for the autonomous drone. The experiment result demonstrates better and faster tracking abilities regarding traditional methods. The quality of the image is the crucial requirement to support high performance. Finally, the deep learning and guidance module issue commands to the system for actions. Improving the image resolution can enhance the performance of the image processing module’s tasks, such as object tracking, object detection, and depth detection. A new method, called a post-trained convolutional neural network (CNN), is proposed to increase the accuracy of current state-of-the-art single image super-resolution (SISR) methods. This method utilizes contextual information to update the last reconstruction layer of CNN using SDSA. The drone utilizes high-quality images to identify the target and estimate the relative distance. The estimated distance passes through the guidance low (i.e., pure proportional navigation (PPN)) to generate acceleration commands. The simulation results show that adapting the deep learning-based image understanding techniques (i.e., RetinaNet ant colony detection and Pyramid Stereo Matching Network (PSMNet)) into the proposed controller enables the drone to generate and track the desired point in the presence of disturbances in the complex environment.
APA, Harvard, Vancouver, ISO, and other styles
16

DARAIO, ELENA. "Digging Deep Into Urban Mobility Data Through Machine Learning Techniques." Doctoral thesis, Politecnico di Torino, 2022. http://hdl.handle.net/11583/2972557.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Atnafu, Selamawet Workalemahu <1989&gt. "Development and characterization of deep learning techniques for neuroimaging data." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amsdottorato.unibo.it/10484/1/ATNAFU_SELAMAWET_FINAL_THESIS.pdf.

Full text
Abstract:
Deep learning methods are extremely promising machine learning tools to analyze neuroimaging data. However, their potential use in clinical settings is limited because of the existing challenges of applying these methods to neuroimaging data. In this study, first a data leakage type caused by slice-level data split that is introduced during training and validation of a 2D CNN is surveyed and a quantitative assessment of the model’s performance overestimation is presented. Second, an interpretable, leakage-fee deep learning software written in a python language with a wide range of options has been developed to conduct both classification and regression analysis. The software was applied to the study of mild cognitive impairment (MCI) in patients with small vessel disease (SVD) using multi-parametric MRI data where the cognitive performance of 58 patients measured by five neuropsychological tests is predicted using a multi-input CNN model taking brain image and demographic data. Each of the cognitive test scores was predicted using different MRI-derived features. As MCI due to SVD has been hypothesized to be the effect of white matter damage, DTI-derived features MD and FA produced the best prediction outcome of the TMT-A score which is consistent with the existing literature. In a second study, an interpretable deep learning system aimed at 1) classifying Alzheimer disease and healthy subjects 2) examining the neural correlates of the disease that causes a cognitive decline in AD patients using CNN visualization tools and 3) highlighting the potential of interpretability techniques to capture a biased deep learning model is developed. Structural magnetic resonance imaging (MRI) data of 200 subjects was used by the proposed CNN model which was trained using a transfer learning-based approach producing a balanced accuracy of 71.6%. Brain regions in the frontal and parietal lobe showing the cerebral cortex atrophy were highlighted by the visualization tools.
APA, Harvard, Vancouver, ISO, and other styles
18

CATTANEO, DANIELE. "Machine Learning Techniques for Urban Vehicle Localization." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2020. http://hdl.handle.net/10281/263540.

Full text
Abstract:
In questa tesi presento il mio lavoro di dottorato, che ha riguardato la localizzazione di un veicolo stradale in ambito urbano. In particolare, ho fatto uso di tecniche di Machine Learning per l'elaborazione delle immagini provenienti dalle camere a bordo di un veicolo. I sistemi sviluppati hanno lo scopo di produrre una stima della posa del veicolo e quindi, nel caso di deep neural networks, si tratta di reti che effettuano una pose regression. Al meglio della mia conoscenza, alcuni dei miei sviluppi sono i primi in letteratura in grado di effettuare visual pose regression basandosi su mappe tridimensionali. Queste mappe tridimensionali sono usualmente ottenute mediante dispositivi LIDAR da parte di grosse aziende specializzate che costituiscono il mondo delle aziende che producono mappe (HERE, TOM-TOM, etc.). Questo consente di attendersi uno sviluppo commerciale delle mappe ad altissima definizione, che risulteranno quindi utilizzabili per la localizzazione da parte del veicolo. Da nostri contatti con produttori industriali di sistemi di guida autonoma per veicoli stradali, ci risulta che la presenza di LIDAR a bordo dei veicoli sia a tutt'oggi osteggiata, in quanto non sono oggi disponibili LIDAR privi di apparati di scansione meccanica che risultano quindi inusabili a causa delle accelerazioni e vibrazioni presenti su un veicolo stradale. Per questo motivo, essendo inoltre i veicoli usualmente attrezzati di diverse camere già oggi, il fatto di riuscire a svolgere una localizzazione visuale su mappe ad alta definizione costituisce una prospettiva molto significativa, non solo sul piano della ricerca ma anche dell'applicazione. La localizzazione è un aspetto essenziale per ogni robot mobile, specialmente per veicoli stradali a guida autonoma, dove una cattiva stima di posizione può a portare ad incidenti anche fatali di utenti della strada. Non si può fare affidamento solo sui Global Navigation Satellite Systems, come il GPS, a causa della accuratezza e affidabilità di questi sistemi. che spesso non è adeguata per l'applicazione di guida autonoma. Questo è ancora più vero in ambiente urbano dove gli edifici possono bloccare o deflettere i segnali dei satelliti portando così a localizzazioni errate. In questa tesi proponiamo diversi approcci per superare le limitazioni dei sistemi GNNSs sfruttando Deep Neural Networks (DNNs) e altre tecniche di machine Learning Inizialmente proponiamo un approccio probabilistico per la stima della corsia in cui si trova il veicolo. Successivamente proponiamo un approccio che integra DNNs stato dell'arte, sia per la segmentazione semantica a livello di pixel che per la ricostruzione geometrica, all'interno di una pipeline di localizzazione. Il veicolo viene localizzato associando features di alto livello come la geometria della strada e gli edifici ottenute da camere stereoscopiche a bordo veicolo con le loro controparti in un sistema di mapping come Open Street Map. Abbiamo gestito le incertezze in modo probabilistico utilizzando particle filtering. Abbiamo anche proposto una nuova DNN end-to-end per la localizzazione visuale del veicolo in mappe LIDAR ad altissima definizione. Infine, abbiamo proposto una nuova tecnica, sempre basata su DNN, per la localizzazione del veicolo in mappe LIDAR ad altissima definizione che non richiede alcuna informazione a priori sulla sua posizione. Tutti gli approcci che sono stati proposti in questa tesi sono stati validati utilizzando ben noti dataset per la guida autonoma stradale, come KITTI e RobotCar.
In this thesis, we present different approaches which dealt with the localization of a road vehicle in urban settings. In particular, we made use of machine learning techniques to process the images coming from onboard cameras of a vehicle. The developed systems aim at computing a pose and therefore in case of deep neural networks, they are referred to as pose regression networks. To the best of our knowledge, some of the developed approaches are the first deep neural networks in the literature capable of computing visual pose regression basing on 3D maps. Such 3D maps are usually built by means of LIDAR devices, and this is done from large specialized companies, which make the world of commercial map makers. It is therefore likely to expect a commercial development of very high definition maps, which will make it possible to use them for the localization of vehicles. From our contacts with industrial makers of autonomous driving systems for road vehicles, we know that LIDARs onboard the vehicles, as for today, are not well accepted, mainly because of the state-of-the-art of LIDARs, which are based on mechanical scanning systems and therefore are not capable of sustaining the accelerations and vibrations of a road vehicle. For this reason, as today's vehicles already include many cameras, to be able to visually localize a vehicle on high-definition maps is a very significant perspective, not only under a research point of view, but also for real applications. The localization is an essential task for any mobile robot, especially for self-driving cars, where a wrong position estimate might lead to accidents and even fatal injuries for other road users. We cannot rely only on Global Navigation Satellites Systems, such as the Global Positioning System, because the accuracy and reliability of these systems are often inadequate for autonomous driving applications. This is even truer in urban environments, where buildings may block or deflect the satellites' signals, leading to wrong localization. In this thesis, we propose different approaches to overcome the GNSSs limitations, exploiting state-of-the-art Deep Neural Networks (DNNs) and machine learning techniques. First, we propose a probabilistic approach for estimating in which lane the vehicle is driving. Secondly, we integrate state-of-the-art Convolutional Neural Networks for pixel-level semantic segmentation and geometric reconstruction within a localization pipeline. We localize the vehicle by matching high-level features (road geometry and buildings) from an onboard stereo camera rig, with their counterparts in the OpenStreetMap service. We handled the uncertainties in a probabilistic fashion using particle filtering. Afterward, we propose a novel end-to-end DNNs for vehicle localization in LiDAR-maps. Finally, we propose a novel DNN-based technique for localizing a vehicle in LiDAR-maps without any prior information about its position. All the approaches proposed in this thesis have been validated using well-known autonomous driving datasets, such as KITTI and RobotCar.
APA, Harvard, Vancouver, ISO, and other styles
19

Santonastasi, Luca. "A comparison among deep learning techniques in an autonomous driving context." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14708/.

Full text
Abstract:
Al giorno d’oggi, l’intelligenza artificiale è uno dei campi di ricerca che sta ricevendo sempre più attenzioni. Il miglioramento della potenza computazionale a disposizione dei ricercatori e sviluppatori sta rinvigorendo tutto il potenziale che era stato espresso a livello teorico agli albori dell’Intelligenza Artificiale. Tra tutti i campi dell’Intelligenza Artificiale, quella che sta attualmente suscitando maggiore interesse è la guida autonoma. Tantissime case automobilistiche e i più illustri college americani stanno investendo sempre più risorse su questa tecnologia. La ricerca e la descrizione dell’ampio spettro delle tecnologie disponibili per la guida autonoma è parte del confronto svolto in questo elaborato. Il caso di studio si incentra su un’azienda che partendo da zero, vorrebbe elaborare un sistema di guida autonoma senza dati, in breve tempo ed utilizzando solo sensori fatti da loro. Partendo da reti neurali e algoritmi classici, si è arrivati ad utilizzare algoritmi come A3C per descrivere tutte l’ampio spettro di possibilità. Le tecnologie selezionate verranno confrontate in due esperimenti. Il primo è un esperimento di pura visione artificiale usando DeepTesla. In questo esperimento verranno confrontate tecnologie quali le tradizionali tecniche di visione artificiale, CNN e CNN combinate con LSTM. Obiettivo è identificare quale algoritmo ha performance migliori elaborando solo immagini. Il secondo è un esperimento su CARLA, un simulatore basato su Unreal Engine. In questo esperimento, i risultati ottenuti in ambiente simulato con CNN combinate con LSTM, verranno confrontati con i risultati ottenuti con A3C. Obiettivo sarà capire se queste tecniche sono in grado di muoversi in autonomia utilizzando i dati forniti dal simulatore. Il confronto mira ad identificare le criticità e i possibili miglioramenti futuri di ciascuno degli algoritmi proposti in modo da poter trovare una soluzione fattibile che porta ottimi risultati in tempi brevi.
APA, Harvard, Vancouver, ISO, and other styles
20

Valentini, Alice. "Evaluation of deep learning techniques for object detection on embedded systems." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/15478/.

Full text
Abstract:
Area surveying is an important tool used to inspect and study in detail a given area, it is especially useful to monitor the movements and the settlement of populations located in a developing country. Unmanned Aerial Vehicles (UAV), given the recent developments, could represent a suitable technology in order to carry out this task in an easier and cheaper way. The use of UAV based surveys techniques poses many challenges in terms of accuracy, speed and efficiency. The target is to build an autonomous flight system which is able to define optimal flight paths using the gathered information from the environment. In this thesis we will focus on the development of the perception system which has to capture the desired information with accurate and fast detections. More in detail, we will explore and evaluate the use of object detection models based on Deep Learning techniques who will sense and collect data which will later use for on-board elaboration. The object detection model has to be accurate in order to detect all the objects encountered on the ground and fast in order to not introduce too much latency into the on-board decision system. Fast and accurate decisions could permit an efficient coverage of the area. Different embedded platforms will be considered and examined in order to meet the model's computational requirements and to provide an efficient use in terms of battery consumption. Different training configurations will be tested in order to maximize our detection accuracy metric, minimum average precision (mAP). The detection speed will be then evaluated on our board using Frame Per Second (FPS) metric. In addition to YOLO we also tested TinyYOLO, a smaller and faster network. Results will be then compared in order to find the best configuration in terms of accuracy/speed. We will show that our system is able to meet all the requirements even if we do not achieve our ideal detection speed.
APA, Harvard, Vancouver, ISO, and other styles
21

Abdulrahman, Qasem Al-Molegi. "Contributions to Trajectory Analysis and Prediction: Statistical and Deep Learning Techniques." Doctoral thesis, Universitat Rovira i Virgili, 2019. http://hdl.handle.net/10803/667650.

Full text
Abstract:
A causa de l’estreta relació entre la vida de les persones i determinades ubicacions geogràfiques, les dades històriques sobre trajectòries d’una persona contenen informació valuosa que es pot utilitzar per descobrir els seus estils de vida i hàbits. L’ús generalitzat de dispositius mòbils amb capacitat de localització ha impulsat la mineria de trajectòries (trajectory mining), la qual se centra en la manipulació, el processament i l’anàlisi de dades de trajectòries per facilitar l’extracció de coneixement a partir de l’històric de les trajectòries d’una persona. Basant-nos en aquesta anàlisi, fins i tot es pot arribar a predir quina serà la probable propera localització d’una persona. Amb aquestes tècniques, s’obre la porta a la millora dels actuals serveis basats en la ubicació i a l’aparició de nous models de negoci, basats en notificacions riques relacionades amb la predicció adequada de les futures ubicacions dels usuaris. Aquesta tesi tracta sobre la predicció de la ubicació i el descobriment de regions significatives a les zones de moviment de les persones. Proposa diversos models de predicció, basant-se en diferents tècniques d'aprenentatge automàtic (com ara les cadenes de Markov, les xarxes neuronals recurrents i les xarxes neuronals convolucionals), tot considerant diferents mètodes de representació d'entrada (embedding learning i one hot vector). A més, el model de predicció utilitza la attention technique (tècnica d’atenció), que té com a objectiu alinear els intervals de temps en les trajectòries de les persones que són rellevants per a una ubicació específica. La tesi també proposa un esquema de codificació temporal per capturar les característiques del comportament del moviment. Addicionalment, analitza l'impacte de l'aprenentatge de la representació espacial-temporal mitjançant l'avaluació de diferents arquitectures. Finalment, l’anàlisi de la trajectòria i la predicció de localització s’apliquen a la monitorització en temps real per a persones grans.
Debido a la estrecha relación entre la vida de las personas y determinadas ubicaciones geográficas, los datos históricos sobre trayectorias de una persona contienen información valiosa que se puede utilizar para descubrir sus estilos de vida y hábitos. El uso generalizado de dispositivos móviles con capacidad de localización ha impulsado la minería de trayectorias (trajectory mining), la cual se centra en la manipulación, el procesamiento y el análisis de datos de trayectorias para facilitar la extracción de conocimiento a partir de el histórico de las trayectorias de una persona. Basándonos en este análisis, incluso se puede llegar a predecir cuál será la probable próxima localización de una persona. Con estas técnicas, se abre la puerta a la mejora de los actuales servicios basados ​​en la ubicación y en la aparición de nuevos modelos de negocio, basados ​​en notificaciones ricas relacionadas con la predicción adecuada de las futuras ubicaciones de los usuarios. Esta tesis trata sobre la predicción de la ubicación y el descubrimiento de regiones significativas en las zonas de movimiento de las personas. Propone varios modelos de predicción, basándose en diferentes técnicas de aprendizaje automático (como las cadenas de Markov, las redes neuronales recurrentes y las redes neuronales convolucionales), considerando diferentes métodos de representación de entrada (embedding learning y one hot vector). Además, el modelo de predicción utiliza la attention technique (técnica de atención), que tiene como objetivo alinear los intervalos de tiempo en las trayectorias de las personas que son relevantes para una ubicación específica. La tesis también propone un esquema de codificación temporal para capturar las características del comportamiento del movimiento. Adicionalmente, analiza el impacto del aprendizaje de la representación espacial-temporal mediante la evaluación de diferentes arquitecturas. Finalmente, el análisis de la trayectoria y la predicción de localización se aplican a la monitorización en tiempo real para personas mayores.
Due to the relationship between people’s daily life and specific geographic locations, the historical trajectory data of a person contains lots of valuable information that can be used to discover their lifestyle and regularity. The generalisation in the use of mobile devices with location capabilities has fueled trajectory mining: the research area that focuses on manipulating, processing and analysing trajectory data to aid the extraction of higher level knowledge from the trajectory history of a user. Based on this analysis, even the person’s next probable location can be predicted. These techniques pave the way for the improvement of current location-based services and the rise of new business models, based on rich notifications related to the right prediction of users’ next location. This thesis addresses location prediction as well as the discovery of significant regions in person’s movement area. It proposes various models to predict the future state of people movement, based on different machine learning techniques (such as Markov Chains, Recurrent Neural Networks and Convolutional Neural Networks) and considering different input representation methods (embedding learning and one-hot vector). Moreover, the attention technique is used in the prediction model, aiming at aligning time intervals in people’s trajectories that are relevant to a specific location. Furthermore, the thesis proposes a time encoding scheme to capture movement behavior characteristics. In addition to that, it analyses the impact of Space-Time representation learning through evaluating different architectural configurations. Finally, trajectory analysis and location prediction is applied to real-time smartphone-based monitoring system for seniors.
APA, Harvard, Vancouver, ISO, and other styles
22

ROSA, LAURA ELENA CUE LA. "CROP RECOGNITION FROM MULTITEMPORAL SAR IMAGE SEQUENCES USING DEEP LEARNING TECHNIQUES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2018. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=34919@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
A presente dissertação tem como objetivo avaliar um conjunto de técnicas de aprendizado profundo para o reconhecimento de culturas agrícolas a partir de sequências multitemporais de imagens SAR. Três métodos foram considerados neste estudo: Autoencoders (AEs), Convolutional Neural Networks (CNNs) and Fully Convolutional Networks (FCNs). A avaliação experimental baseou-se em duas bases de dados contendo sequências de imagens geradas pelo sensor Sentinel- 1A. A primeira base cobre uma região tropical e a segunda uma região de clima temperado. Em todos os casos, utilizouse como referência para comparação um classificador Random Forest (RF) operando sobre atributos de textura derivados de matrizes de co-ocorrência. Para a região de clima temperado que apresenta menor dinâmica agrícola as técnicas de aprendizado profundo produziram consistentemente melhores resultados do que a abordagem via RF, sendo AEs o melhor em praticamente todos os experimentos. Na região tropical, onde a dinâmica é mais complexa, as técnicas de aprendizado profundo mostraram resultados similares aos produzidos pelo método RF, embora os quatro métodos tenham se alternado como o de melhor desempenho dependendo do número e das datas das imagens utilizadas nos experimentos. De um modo geral, as RNCs se mostraram mais estáveis do que os outros métodos, atingindo o melhores resultado entre os métodos avaliados ou estando muito próximos destes em praticamente todos os experimentos. Embora tenha apresentado bons resultados, não foi possível explorar todo o potencial das RTCs neste estudo, sobretudo, devido à dificuldade de se balancear o número de amostras de treinamento entre as classes de culturas agrícolas presentes na área de estudo. A dissertação propõe ainda duas estratégias de pós-processamento que exploram o conhecimento prévio sobre a dinâmica das culturas agrícolas presentes na área alvo. Experimentos demonstraram que tais técnicas podem produzir um aumento significativo da acurácia da classificação, especialmente para culturas menos abundantes.
The present dissertation aims to evaluate a set of deep learning (DL) techniques for crop mapping from multitemporal sequences of SAR images. Three methods were considered in this study: Autoencoders (AEs), Convolutional Neural Networks (CNNs) and Fully Convolutional Networks (FCNs). The analysis was based on two databases containing image sequences generated by the Sentinel-1A. The first database covers a temperate region that presents a comparatively simpler dynamics, and second database of a tropical region that represents a scenario with complex dynamics. In all cases, a Random Forest (RF) classifier operating on texture features derived from co-occurrence matrices was used as baseline. For the temperate region, DL techniques consistently produced better results than the RF approach, with AE being the best one in almost all experiments. In the tropical region the DL approaches performed similar to RF, alternating as the best performing one for different experimental setups. By and large, CNNs achieved the best or next to the best performance in all experiments. Although the FCNs have performed well, the full potential was not fully exploited in our experiments, mainly due to the difficulty of balancing the number of training samples among the crop types. The dissertation also proposes two post-processing strategies that exploit prior knowledge about the crop dynamics in the target site. Experiments have shown that such techniques can significantly improve the recognition accuracy, in particular for less abundant crops.
APA, Harvard, Vancouver, ISO, and other styles
23

Chaaro, Lina, and Antón Laura Martínez. "Crop and weed detection using image processing and deep learning techniques." Thesis, Högskolan i Skövde, Institutionen för ingenjörsvetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18630.

Full text
Abstract:
Artificial intelligence, specifically deep learning, is a fast-growing research field today. One of its various applications is object recognition, making use of computer vision. The combination of these two technologies leads to the purpose of this thesis. In this project, a system for the identification of different crops and weeds has been developed as an alternative to the system present on the FarmBot company’s robots. This is done by accessing the images through the FarmBot API, using computer vision for image processing, and artificial intelligence for the application of transfer learning to a RCNN that performs the plants identification autonomously. The results obtained show that the system works with an accuracy of 78.10% for the main crop and 53.12% and 44.76% for the two weeds considered. Moreover, the coordinates of the weeds are also given as results. The performance of the resulting system is compared both with similar projects found during research, and with the current version of the FarmBot weed detector. Form a technological perspective, this study presents an alternative to traditional weed detectors in agriculture and open the doors to more intelligent and advanced systems.
APA, Harvard, Vancouver, ISO, and other styles
24

Pathirage, Chathurdara Sri Nadith. "Novel Deep Learning Techniques For Computer Vision and Structure Health Monitoring." Thesis, Curtin University, 2018. http://hdl.handle.net/20.500.11937/70569.

Full text
Abstract:
This thesis proposes novel techniques in building a generic framework for both the regression and classification tasks in vastly different applications domains such as computer vision and civil engineering. Many frameworks have been proposed and combined into a complex deep network design to provide a complete solution to a wide variety of problems. The experiment results demonstrate significant improvements of all the proposed techniques towards accuracy and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
25

Tan, Lu. "Image Processing by Variational Methods, Stochastic Programming and Deep Learning Techniques." Thesis, Curtin University, 2020. http://hdl.handle.net/20.500.11937/82126.

Full text
Abstract:
This thesis is to investigate effective approaches to tackle different problems in computer vision: variational methods are first studied for image processing, illusory contour reconstruction and segmentation as well as their efficiency improvement. Next, we develop variational segmentation methods by stochastic programming, tackling diverse problems with random noises. Third, the fusion approaches integrating varaitional models and deep neural networks are explored for challenging image tasks. These innovative ideas are validated by significant performance gains.
APA, Harvard, Vancouver, ISO, and other styles
26

MEHMOOD, TAHIR. "Knowledge Transfer Techniques in Deep Learning for Biomedical Named Entity Recognition." Doctoral thesis, Università degli studi di Brescia, 2021. http://hdl.handle.net/11379/546098.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Le, Goff Matthieu. "Techniques d'analyse de contenu appliquées à l'imagerie spatiale." Phd thesis, Toulouse, INPT, 2017. http://oatao.univ-toulouse.fr/19243/1/LE_GOFF_Matthieu.pdf.

Full text
Abstract:
Depuis les années 1970, la télédétection a permis d’améliorer l’analyse de la surface de la Terre grâce aux images satellites produites sous format numérique. En comparaison avec les images aéroportées, les images satellites apportent plus d’information car elles ont une couverture spatiale plus importante et une période de revisite courte. L’essor de la télédétection a été accompagné de l’émergence des technologies de traitement qui ont permis aux utilisateurs de la communauté d’analyser les images satellites avec l’aide de chaînes de traitement de plus en plus automatiques. Depuis les années 1970, les différentes missions d’observation de la Terre ont permis d’accumuler une quantité d’information importante dans le temps. Ceci est dû notamment à l’amélioration du temps de revisite des satellites pour une même région, au raffinement de la résolution spatiale et à l’augmentation de la fauchée (couverture spatiale d’une acquisition). La télédétection, autrefois cantonnée à l’étude d’une seule image, s’est progressivement tournée et se tourne de plus en plus vers l’analyse de longues séries d’images multispectrales acquises à différentes dates. Le flux annuel d’images satellite est supposé atteindre plusieurs Péta octets prochainement. La disponibilité d’une si grande quantité de données représente un atout pour développer de chaines de traitement avancées. Les techniques d’apprentissage automatique beaucoup utilisées en télédétection se sont beaucoup améliorées. Les performances de robustesse des approches classiques d’apprentissage automatique étaient souvent limitées par la quantité de données disponibles. Des nouvelles techniques ont été développées pour utiliser efficacement ce nouveau flux important de données. Cependant, la quantité de données et la complexité des algorithmes mis en place nécessitent une grande puissance de calcul pour ces nouvelles chaînes de traitement. En parallèle, la puissance de calcul accessible pour le traitement d’images s’est aussi accrue. Les GPUs («Graphic Processing Unit ») sont de plus en plus utilisés et l’utilisation de cloud public ou privé est de plus en plus répandue. Désormais, pour le traitement d’images, toute la puissance nécessaire pour les chaînes de traitements automatiques est disponible à coût raisonnable. La conception des nouvelles chaînes de traitement doit prendre en compte ce nouveau facteur. En télédétection, l’augmentation du volume de données à exploiter est devenue une problématique due à la contrainte de la puissance de calcul nécessaire pour l’analyse. Les algorithmes de télédétection traditionnels ont été conçus pour des données pouvant être stockées en mémoire interne tout au long des traitements. Cette condition est de moins en moins respectée avec la quantité d’images et leur résolution. Les algorithmes de télédétection traditionnels nécessitent d’être revus et adaptés pour le traitement de données à grande échelle. Ce besoin n’est pas propre à la télédétection et se retrouve dans d’autres secteurs comme le web, la médecine, la reconnaissance vocale,… qui ont déjà résolu une partie de ces problèmes. Une partie des techniques et technologies développées par les autres domaines doivent encore être adaptées pour être appliquée aux images satellites. Cette thèse se focalise sur les algorithmes de télédétection pour le traitement de volumes de données massifs. En particulier, un premier algorithme existant d’apprentissage automatique est étudié et adapté pour une implantation distribuée. L’objectif de l’implantation est le passage à l’échelle c’est-à-dire que l’algorithme puisse traiter une grande quantité de données moyennant une puissance de calcul adapté. Enfin, la deuxième méthodologie proposée est basée sur des algorithmes récents d’apprentissage automatique les réseaux de neurones convolutionnels et propose une méthodologie pour les appliquer à nos cas d’utilisation sur des images satellites.
APA, Harvard, Vancouver, ISO, and other styles
28

Rosar, Kós Lassance Carlos Eduardo. "Graphs for deep learning representations." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2020. http://www.theses.fr/2020IMTA0204.

Full text
Abstract:
Ces dernières années, les méthodes d'apprentissage profond ont atteint l'état de l'art dans une vaste gamme de tâches d'apprentissage automatique, y compris la classification d'images et la traduction automatique. Ces architectures sont assemblées pour résoudre des tâches d'apprentissage automatique de bout en bout. Afin d'atteindre des performances de haut niveau, ces architectures nécessitent souvent d'un très grand nombre de paramètres. Les conséquences indésirables sont multiples, et pour y remédier, il est souhaitable de pouvoir comprendre ce qui se passe à l'intérieur des architectures d'apprentissage profond. Il est difficile de le faire en raison de: i) la dimension élevée des représentations ; et ii) la stochasticité du processus de formation. Dans cette thèse, nous étudions ces architectures en introduisant un formalisme à base de graphes, s'appuyant notamment sur les récents progrès du traitement de signaux sur graphe (TSG). À savoir, nous utilisons des graphes pour représenter les espaces latents des réseaux neuronaux profonds. Nous montrons que ce formalisme des graphes nous permet de répondre à diverses questions, notamment: i) mesurer des capacités de généralisation ;ii) réduire la quantité de des choix arbitraires dans la conception du processus d'apprentissage ; iii)améliorer la robustesse aux petites perturbations ajoutées sur les entrées ; et iv) réduire la complexité des calculs
In recent years, Deep Learning methods have achieved state of the art performance in a vast range of machine learning tasks, including image classification and multilingual automatic text translation. These architectures are trained to solve machine learning tasks in an end-to-end fashion. In order to reach top-tier performance, these architectures often require a very large number of trainable parameters. There are multiple undesirable consequences, and in order to tackle these issues, it is desired to be able to open the black boxes of deep learning architectures. Problematically, doing so is difficult due to the high dimensionality of representations and the stochasticity of the training process. In this thesis, we investigate these architectures by introducing a graph formalism based on the recent advances in Graph Signal Processing (GSP). Namely, we use graphs to represent the latent spaces of deep neural networks. We showcase that this graph formalism allows us to answer various questions including: ensuring generalization abilities, reducing the amount of arbitrary choices in the design of the learning process, improving robustness to small perturbations added to the inputs, and reducing computational complexity
APA, Harvard, Vancouver, ISO, and other styles
29

Bartoli, Giacomo. "Edge AI: Deep Learning techniques for Computer Vision applied to embedded systems." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16820/.

Full text
Abstract:
In the last decade, Machine Learning techniques have been used in different fields, ranging from finance to healthcare and even marketing. Amongst all these techniques, the ones adopting a Deep Learning approach were revealed to outperform humans in tasks such as object detection, image classification and speech recognition. This thesis introduces the concept of Edge AI: that is the possibility to build learning models capable of making inference locally, without any dependence on expensive servers or cloud services. A first case study we consider is based on the Google AIY Vision Kit, an intelligent camera equipped with a graphic board to optimize Computer Vision algorithms. Then, we test the performances of CORe50, a dataset for continuous object recognition, on embedded systems. The techniques developed in these chapters will be finally used to solve a challenge within the Audi Autonomous Driving Cup 2018, where a mobile car equipped with a camera, sensors and a graphic board must recognize pedestrians and stop before hitting them.
APA, Harvard, Vancouver, ISO, and other styles
30

GRIMALDI, MATTEO. "Hardware-Aware Compression Techniques for Embedded Deep Neural Networks." Doctoral thesis, Politecnico di Torino, 2021. http://hdl.handle.net/11583/2933756.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Tovedal, Sofiea. "On The Effectiveness of Multi-TaskLearningAn evaluation of Multi-Task Learning techniques in deep learning models." Thesis, Umeå universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-172257.

Full text
Abstract:
Multi-Task Learning is today an interesting and promising field which many mention as a must for achieving the next level advancement within machine learning. However, in reality, Multi-Task Learning is much more rarely used in real-world implementations than its more popular cousin Transfer Learning. The questionis why that is and if Multi-Task Learning outperforms its Single-Task counterparts. In this thesis different Multi-Task Learning architectures were utilized in order to build a model that can handle labeling real technical issues within two categories. The model faces a challenging imbalanced data set with many labels to choose from and short texts to base its predictions on. Can task-sharing be the answer to these problems? This thesis investigated three Multi-Task Learning architectures and compared their performance to a Single-Task model. An authentic data set and two labeling tasks was used in training the models with the method of supervised learning. The four model architectures; Single-Task, Multi-Task, Cross-Stitched and the Shared-Private, first went through a hyper parameter tuning process using one of the two layer options LSTM and GRU. They were then boosted by auxiliary tasks and finally evaluated against each other.
APA, Harvard, Vancouver, ISO, and other styles
32

Gebremeskel, Ermias. "Analysis and Comparison of Distributed Training Techniques for Deep Neural Networks in a Dynamic Environment." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-231350.

Full text
Abstract:
Deep learning models' prediction accuracy tends to improve with the size of the model. The implications being that the amount of computational power needed to train models is continuously increasing. Distributed deep learning training tries to address this issue by spreading the computational load onto several devices. In theory, distributing computation onto N devices should give a performance improvement of xN. Yet, in reality the performance improvement is rarely xN, due to communication and other overheads. This thesis will study the communication overhead incurred when distributing deep learning training. Hopsworks is a platform designed for data science. The purpose of this work is to explore a feasible way of deploying distributed deep learning training on a shared cluster and analyzing the performance of different distributed deep learning algorithms to be used on this platform. The findings of this study show that bandwidth-optimal communication algorithms like ring all-reduce scales better than many-to-one communication algorithms like parameter server, but were less fault tolerant. Furthermore, system usage statistics collected revealed a network bottleneck when training is distributed on multiple machines. This work also shows that it is possible to run MPI on a hadoop cluster by building a prototype that orchestrates resource allocation, deployment, and monitoring of MPI based training jobs. Even though the experiments did not cover different cluster configurations, the results are still relevant in showing what considerations need to be made when distributing deep learning training.
Träffsäkerheten hos djupinlärningsmodeller tenderar att förbättras i relation med storleken på modellen. Implikationen blir att mängden beräkningskraft som krävs för att träna modeller ökar kontinuerligt.Distribuerad djupinlärning försöker lösa detta problem genom att distribuera beräkningsbelastning på flera enheter. Att distribuera beräkningarna på N enheter skulle i teorin innebär en linjär skalbarhet (xN). I verkligenheten stämmer sällan detta på grund av overhead från nätverkskommunikation eller I/O. Hopsworks är en dataanalys och maskininlärningsplattform. Syftetmed detta arbeta är att utforska ett möjligt sätt att utföra distribueraddjupinlärningträning på ett delat datorkluster, samt analysera prestandan hos olika algoritmer för distribuerad djupinlärning att använda i plattformen. Resultaten i denna studie visar att nätverksoptimala algoritmer såsom ring all-reduce skalar bättre för distribuerad djupinlärning änmånga-till-en kommunikationsalgoritmer såsom parameter server, men är inte lika feltoleranta. Insamlad data från experimenten visade på en flaskhals i nätverket vid träning på flera maskiner. Detta arbete visar även att det är möjligt att exekvera MPI program på ett hadoopkluster genom att bygga en prototyp som orkestrerar resursallokering, distribution och övervakning av exekvering. Trots att experimenten inte täcker olika klusterkonfigurationer så visar resultaten på vilka faktorer som bör tas hänsyn till vid distribuerad träning av djupinlärningsmodeller.
APA, Harvard, Vancouver, ISO, and other styles
33

Nardi, Paolo. "Human Activity Recognition : Deep learning techniques for an upper body exercise classification system." Thesis, Högskolan Kristianstad, Fakulteten för naturvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hkr:diva-19410.

Full text
Abstract:
Most research behind the use of Machine Learning models in the field of Human Activity Recognition focuses mainly on the classification of daily human activities and aerobic exercises. In this study, we focus on the use of 1 accelerometer and 2 gyroscope sensors to build a Deep Learning classifier to recognise 5 different strength exercises, as well as a null class. The strength exercises tested in this research are as followed: Bench press, bent row, deadlift, lateral rises and overhead press. The null class contains recordings of daily activities, such as sitting or walking around the house. The model used in this paper consists on the creation of consecutive overlapping fixed length sliding windows for each exercise, which are processed separately and act as the input for a Deep Convolutional Neural Network. In this study we compare different sliding windows lengths and overlap percentages (step sizes) to obtain the optimal window length and overlap percentage combination. Furthermore, we explore the accuracy results between 1D and 2D Convolutional Neural Networks. Cross validation is also used to check the overall accuracy of the classifiers, where the database used in this paper contains 5 exercises performed by 3 different users and a null class. Overall the models were found to perform accurately for window’s with length of 0.5 seconds or greater and provided a solid foundation to move forward in the creation of a more robust fully integrated model that can recognize a wider variety of exercises.
APA, Harvard, Vancouver, ISO, and other styles
34

Kushibar, Kaisar. "Automatic segmentation of brain structures in magnetic resonance images using deep learning techniques." Doctoral thesis, Universitat de Girona, 2020. http://hdl.handle.net/10803/670766.

Full text
Abstract:
This PhD thesis focuses on the development of deep learning based methods for accurate segmentation of the sub-cortical brain structures from MRI. First, we have proposed a 2.5D CNN architecture that combines convolutional and 2/2 spatial features. Second, we proposed a supervised domain adaptation technique to improve the robustness and consistency of deep learning model. Third, an unsupervised domain adaptation method was proposed to eliminate the requirement of manual intervention to train a deep learning model that is robust to differences in the MRI images from multi-centre and multi-scanner datasets. The experimental results for all the proposals demonstrated the effectiveness of our approaches in accurately segmenting the sub-cortical brain structures and has shown state-of-the-art performance on well-known publicly available datasets
Esta tesis doctoral se centra en el desarrollo de métodos basados en el aprendizaje profundo para la segmentación precisa de las estructuras cerebrales subcorticales a partir de la resonancia magnética. En primer lugar, hemos propuesto una arquitectura 2.5D CNN que combina características convolucionales y espaciales. En segundo lugar, hemos propuesto una técnica de adaptación de dominio supervisada para mejorar la robustez y la consistencia del modelo de aprendizaje profundo. En tercer lugar, hemos propuesto un método de adaptación de dominio no supervisado para eliminar el requisito de intervención manual para entrenar un modelo de aprendizaje profundo que sea robusto a las diferencias en las imágenes de la resonancia magnética de los conjuntos de datos multicéntricos y multiescáner. Los resultados experimentales de todas las propuestas demostraron la eficacia de nuestros enfoques para segmentar con precisión las estructuras cerebrales subcorticales y han mostrado un rendimiento de vanguardia en los conocidos conjuntos de datos de acceso público
APA, Harvard, Vancouver, ISO, and other styles
35

Correa, Jullian Camila Asunción. "Assessment of deep learning techniques for diagnosis in thermal systems through anomaly detection." Tesis, Universidad de Chile, 2019. http://repositorio.uchile.cl/handle/2250/170129.

Full text
Abstract:
Memoria para optar al título de Ingeniera Civil Mecánica
A la hora de evaluar el desempeño de sistemas térmicos, mantener registros temporales de temperatura y caudal permiten obtener información sobre el rendimiento y estado de operación del sistema. Estudios de confiabilidad en equipos y componentes son un proceso fundamental para reducir costos de mantención y aumentar la vida útil de estos. La identificación de comportamientos anómalos se puede utilizar para detectar variaciones inesperadas en patrones de consumo o en la degradación de componentes en el sistema. En los últimos años, diversas técnicas de aprendizaje profundo se han aplicado de manera exitosa en la identificación y cuantificación de daño en distintos sistemas mecánicos. Por lo anterior, es de interés evaluar su uso para el análisis de desempeño en sistemas térmicos, en particular, técnicas especializadas para el análisis de series temporales. Los sistemas solares térmicos son una fuente de energía viable y sustentable para aplicaciones de agua caliente a nivel domiciliario e industrial. Su operación requiere una correcta integración y mantención para efectivamente reducir el consumo de combustibles fósiles. Sin embargo, un sistema de monitoreo aumenta los costos del sistema, por lo que se deben tomar decisiones estratégicas para seleccionar componentes críticos a los cuales observar. Temperaturas y caudales en colectores solares, bombas y acumuladores de calor son las principales variables para analizar bajo diferentes condiciones meteorológicas. El presente Trabajo de Título consiste en la evaluación de distintas técnicas de Aprendizaje Profundo para el desarrollo de un modelo de diagnóstico de detección de anomalías en sistemas térmicos. El caso de estudio utilizado es el sistema de agua caliente solar del edificio Beauchef 851, el cual es analizado y simulado con el software TRNSYS. A través de esta representación, es posible generar grandes cantidades de datos tales como temperatura, flujo y las condiciones ambientales para representar condiciones nominales y anómalas inducidas en el sistema. Se plantea utilizar técnicas de aprendizaje profundo para el análisis de información secuencial correspondiente a los datos generados a través de la simulación en TRNSYS. Se evalúan diferentes técnicas para el análisis temporal como, por ejemplo, Redes Neuronales Recurrentes Profundas para predicción de temperaturas bajo variadas configuraciones y horizontes de evaluación. Esto, con el fin de desarrollar un método para la detección de anomalías en patrones de consumo, eficiencia de los colectores solares y operación de las bombas. El aumento de la temperatura registrada a la salida del campo solar causada por una alteración en la demanda de agua caliente es identificada como anomalía con una exactitud de un 86% en las muestras estudiadas. A su vez, la detección de la reducción de la misma temperatura debido a anomalías inducidas en la eficiencia del colector obtiene una exactitud de un 70%. A pesar de la sensibilidad del modelo de detección, estos resultados son prometedores ante la posibilidad de integrar mediciones y validaciones experimentales de este.
APA, Harvard, Vancouver, ISO, and other styles
36

LOMBARDI, MARCO. "Robust 3D Scanning and Real-Time Reconstruction Techniques in a Deep Learning Framework." Doctoral thesis, Università degli studi di Brescia, 2022. http://hdl.handle.net/11379/555015.

Full text
Abstract:
Negli anni, la ricerca ha prodotto eccellenti risultati basandosi sull'uso di scanner 3D ottici low-cost nati nel contesto delle piattaforme di gaming. Questi dispositivi sono caratterizzati da dimensioni compatte, camere di profondità con risoluzione relativamente bassa e campo di lavoro relativamente ampio. Questi strumenti trovano dunque largo impiego in situazioni di utilizzo per ricostruzioni indoor o per applicazioni di rilevamento di oggetti, o di gestualità dove il livello di dettaglio della ricostruzione 3D non è necessariamente prioritario. Un'evoluzione di queste tecnologie è rappresentata dagli scanner 3D portatili manovrabili a mano libera, basati su ricostruzione ottica, in grado di produrre dati a qualità più elevata rispetto alle controparti a basso costo, pur rimanendo in fasce di prezzo accessibili a livello professionale. In questo contesto è molto interessante disporre di tecniche di ricostruzione 3D real-time in grado di guidare l'azione dell'utente mediante feedback immediati. Contestualmente ad un trend evolutivo in termini di hardware, l'interesse per le tematiche di ricostruzione 3D sta trovando nuove soluzioni nel settore di ricerca legato alle tecniche di deep learning. L'importanza dei dati è quindi cruciale, sia in un contesto di valutazione sperimentale, che per la necessità di fornire esempi ai modelli che si vogliono progettare. Tuttavia, abbiamo rilevato alcune carenze legate alla tipologia dei dati impiegati nella ricerca accademica dove vi è un'attenzione prevalente per i dati provenenti da dispositivi a basso costo rispetto ad un più esteso panorama offerto dalle moderne tecnologie di scansione. Si tratta dunque di capire se e come i diversi dati e i diversi requisiti sulla qualità e sulle tempistiche di scansione si relazionino alla tipologia di dato generato ed alla scelta delle migliori soluzioni di ricostruzione. Durante il percorso di dottorato, ho avuto modo di lavorare con un prototipo pre-commerciale di scanner 3D portatile manovrabile, denominato Insight, sviluppato al fine di fornire ricostruzioni con una maggiore accuratezza rispetto alle controparti a basso costo, per essere utilizzato in contesti applicativi in cui il target è un singolo oggetto di scala medio-piccola di cui si vuole una fedele rappresentazione digitale. Esempi di questi contesti sono il reverse engineering, la digitalizzazione per fini ludici (cinema e videogiochi), commerciali (e.g. cataloghi per lo shop online), culturali (preservazione di oggetti storici) ed anche medicali (creazione di protesi ed ortesi). In questa tesi ci concentriamo quindi su tecniche di ricostruzione 3D innovative, cercando di analizzare e di rispondere ai requisiti sfidanti legati a strumenti come quelli in uso nel nostro lavoro, specialmente il requisito di ricostruzione real-time, confrontandoci con altre soluzioni disponibili in letteratura. In particolare, ci siamo preoccupati dapprima di collezionare e di rendere disponibile un nuovo dataset, DenseMatch, e di analizzare e confrontare approfonditamente le recenti soluzioni basate su deep learning, potenzialmente sfruttabili e fruibili nei contesti di interesse. Tale confronto avviene sfruttando sia un dataset classico che il nostro, per avere una comparazione che stabilisca quali metodi meglio generalizzano su diversi domini e quali sono i più promettenti per il nostro contesto. Sfruttiamo infine i risultati ottenuti per sviluppare un flusso di ricostruzione real-time adeguato allo scanner portatile, che renda più affidabile la soluzione nativa dello scanner Insight. Il nostro approccio supera nettamente la soluzione di riferimento in letteratura, denominata BundleFusion, soprattutto per la tipologia di dati e per le applciazioni di interesse. Vedremo come i migliori risultati si ottengano unendo il meglio degli approcci classici basati su feature geometriche con quelli che sfruttano i moderni modelli di apprendimento guidati dai dati.
Over the years, academic research has produced a number of excellent results based on the use of what are commonly referred to as low-cost optical 3D scanners, born in the context of gaming platforms. These devices are characterized by compact dimensions, depth chambers with relatively low resolution and relatively large working range. Due to these characteristics, these tools are widely used in situations of use for indoor reconstructions or for object and gesture detection applications, where the level of detail of the 3D reconstruction is not necessarily a priority. An evolution of these technologies is represented by the hand-held portable 3D scanners, based on optical reconstruction, capable of producing higher quality data than their low-cost counterparts, while remaining in price ranges affordable at a professional level. In this context it is very interesting to have real-time 3D reconstruction techniques that support and are able to guide the user's action through immediate visual feedback. Concurrently to an evolutionary trend in terms of hardware, the interest in 3D reconstruction issues is finding new solutions in the rapidly growing research area linked to deep learning techniques. The importance of data is therefore crucial, other than in an experimental evaluation context, for the need to provide examples and information to the models we want to design and develop. However, we found some shortcomings related to the type of data used in academic research where there is a prevalent attention linked to data coming from low-cost devices compared to a wider panorama offered by modern scanning technologies. During my PhD, I was able to work with a pre-commercial prototype of a hand-held 3D scanner, called Insight, developed with the aim of providing reconstructions with a higher level of accuracy than its low-cost counterparts, to be used in application contexts where the target is a single small-medium scale object of which a faithful digital representation is desired. Examples of these contexts are quality control, reverse engineering, digitization for entertainment purposes (cinema and video games), commercial contexts (for example catalogs for online shops), cultural heritage (preservation of statues and historical objects) and also biomedical (e.g. anatomic scanning for the design of prostheses and orthoses). In this thesis we therefore focus on innovative 3D reconstruction techniques, mainly related to the aforementioned type of data, trying to analyze and respond to the challenging requirements related to tools such as those in use during our work, especially the real-time reconstruction requirement, comparing ourselves with other solutions available in the literature. In particular, we first took care to collect and make available a new dataset, DenseMatch, and to analyze and compare in depth, and for the first time together, several very recent solutions based on deep learning, potentially exploitable and usable in the contexts of interest. This comparison takes place using both a classic dataset and ours, to have a comparison that establishes which methods best generalize on different domains and which ones are the most promising for our context. Finally, we leverage all the results obtained to develop a real-time 3D reconstruction pipeline suitable for our handheld scanner that improves and makes the native reconstruction solution of the Insight scanner more reliable and robust. Our solution clearly outperforms the reference method in the literature, i.e. BundleFusion, especially for the type of data and for the applications of interest. We will see how optimal results are obtained by combining the best of classic approaches based on geometric features with those that exploit modern data-driven learning models.
APA, Harvard, Vancouver, ISO, and other styles
37

Barnabò, Andrea. "Machine learning techniques for mammography applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017.

Find full text
Abstract:
During this work we will use machine learning and deep learning techniques in order to face up to some medical problems where they can play a basic role. In particular we will apply these algorithms to some mammography issues. The thesis presents three main experiments that are described below. The first one consists of a classification between nipples and non-nipples images. In this part of the work we will build a dataset composed by images belonging to these two classes. The main purpose here will be to build a classifier able to distinguish between nipple and non-nipple images. Several machine learning algorithms based on different models such as Support Vector Machine and Convolutional Neural Networks will be used in order to perform this task. In this experiment we will note the better classification capacity of the model based on Convolutional Neural Network. In the following section we will confront with an harder and usefull problem: the classification of tumoral masses vs non-tumoral masses. Therefore we will use a dataset composed by these two classes of images. We will perform again a classification either with Support Vector Machine or Convolutional Neural Networks. During this experiment we will obtain excellent results with the Convolutional Neural Networks and Support Vector Machine combined with a scattering network representation. The last part of the thesis consist of a realization of a complete CADx system . Here we will combine the models trained in the previous part and we will compare the results obtained by using them with the state of art.
APA, Harvard, Vancouver, ISO, and other styles
38

McCloskey, Stephen Michael. "Towards Sleep Data Science: Objective Analysis of Sleep Disorders Using Machine Learning Techniques." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/27140.

Full text
Abstract:
In recent years there has been an expansion in the availability of technologies to monitor sleep, however, research into sleep has been restricted by techniques developed early in the sleep science field. Sleep studies are widely conducted, but there are key challenges arising from the differences in equipment, data collection and processing, the confidential nature of medical studies, and the subjective, manual and labour-intensive sleep scoring and measurement. This motivates the need to develop objective, data-driven machine learning methods to analyse sleep and sleep disorders. We developed automated and scalable methods, employing machine learning techniques, to provide an objective measurement of sleep, opening up new avenues for large-scale sleep analysis. We proposed and evaluated machine learning techniques for sleep disorder detection and phenotyping to further objective analysis in sleep science. We explored the two most common sleep disorders: sleep apnea and insomnia disorder. Sleep apnea is characterised by a reduction or a stop in breathing temporarily during sleep and involves the use of manual analysis of multiple Polysomnography (PSG) channels. We explored the use of a single respiratory channel, the nasal airflow, to detect sleep apnea events using Convolutional Neural Network (CNN) with two types of signals: the raw 1-D signal and with a 2-D wavelet spectrogram representation of the signal. We achieved a high accuracy with the 2-D CNN achieving a higher accuracy (79.8%) than the 1-D CNN (77.6%), improving over previous work on detection of sleep apnea events by applying an objective analysis of the nasal airflow channel. The investigation of insomnia disorder started with a cluster analysis for potential phenotyping using physiology-based Quantitative Electroencephalography (qEEG) parameters based on a neural-field brain model. Motivated by previous work into insomnia disorder, we developed a novel, data-driven approach to group people with insomnia from the Insomnia-100 dataset successfully, with three meaningful clusters found: insomnia with low beta iv ABSTRACT v frequency of peak power, insomnia with high delta peak power and insomnia with low delta peak power. We also discuss that the most informative features identified were from sleep stage 3: the peak power in the delta band for the O1, F3 and C3 channels, and then the peak frequency in the beta band in the O1, F3, and C3 channels, which is consistent with previous insomnia studies. One difficulty when investigating sleep disorders like insomnia disorder is the manual subjective discrete sleep staging, which can be inconsistent across various datasets. This poses difficulties for large-scale machine learning algorithms that classifying those sleep stages. We investigated the use of an alternative method to sleep staging with sleep trajectories that provide a continuous parameterised trajectory of sleep that is based on an existing neural-field brain model. We used a data-driven approach using a multi-class Conditional Deep Convolutional GAN (CDCGAN) to distinguish between people with insomnia and good sleepers, based on sleep trajectories. This was done by using the CDCGAN as a semi-supervised classifier on 20-minute subtrajectories of sleep to learn the characteristics of insomnia disorder and good sleepers using two datasets. We had promising results with the CDCGAN with an accuracy of 74.5% on a hold-out test set, substantially outperforming alternative methods used for comparison. While sleep trajectories are a promising method to quantify sleep, they are very time-consuming and computationally expensive and so are not suitable for large-scale datasets. Therefore, we proposed a deep neural network as a surrogate model to approximate the sleep trajectories in a much faster way in an individual model and ensemble model approach. To demonstrate the effectiveness of the surrogate model, we investigated the use of the trained surrogate ensemble to classify people with insomnia disorder, achieving Area Under the Receiver Operating Characteristics (AUROC) of 0.94 and an accuracy of 89.2%. This research provides methods for analysing sleep data, with specific demonstrations for the most common sleep disorders, through automated, scalable machine learning techniques. Furthermore, by solving fundamental limitations of more sophisticated models, new directions are opened up for analysis of sleep using a sleep trajectories representation.
APA, Harvard, Vancouver, ISO, and other styles
39

Pitaro, Raffaele. "McGiver: Module Classifier using fine tuning Machine Learning techniques." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text
Abstract:
La classificazione automatizzata di documenti digitalizzati in categorie predefinite ha sollevato un grande interesse fin dagli anni 2000. Questo è dovuto al sensibile aumento di documenti in formato digitale ed alla crescente necessità di dar loro un’organizzazione gerarchica. Inoltre, principalmente a causa della grande mole di documenti da categorizzare, negli ultimi anni si richiede che tale compito venga gestito in modo automatizzato. In ambito aziendale, queste problematiche vengono spesso affrontate mediante l’utilizzo di soluzioni “black box” proprietarie. Tali soluzioni si rivelano poco performanti poiché non sufficientemente personalizzabili da essere applicate a domini specifici (general purpose). In questo lavoro, ci occuperemo del problema della categorizzazione di documenti digitalizzati, nel settore della gestione della modulistica contabile. Il Machine Learning è stato largamente utilizzato nel processing di immagini degli ultimi anni grazie alla portabilità dei suoi risultati e capacità di produrre modelli affidabili anche a partire da una scarsa connoscenza del dominio di riferimento. Questa tesi inizia con lo stato dell’arte riguardo ai classificatori di categorie di documenti digitalizzati. In seguito viene descritto l’uso di tecniche di Machine Learning (DNNs) per Document Image Classification, con approfondimenti sull’architettura, il dataset e il modello utilizzato. Infine viene presentato McGiver, uno strumento per classificare documenti in categorie a partire dalla loro versione digitale. Viene quindi descritta ogni fase di implementazione e produzione dei risultati di validazione: preprocessing del dataset, addestramento e validazione. Infine, nell’ultimo capitolo vengono presentate, argomentando le stesse, alcune considerazioni sui risultati ottenuti e una discussione sui lavori futuri.
APA, Harvard, Vancouver, ISO, and other styles
40

Gnacek, Matthew. "Convolutional Neural Networks for Enhanced Compression Techniques." University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1620139118743853.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Sunesson, Albin. "Establishing Effective Techniques for Increasing Deep Neural Networks Inference Speed." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-213833.

Full text
Abstract:
Recent trend in deep learning research is to build ever more deep networks (i.e. increase the number of layers) to solve real world classification/optimization problems. This introduces challenges for applications with a latency dependence. The problem arises from the amount of computations that needs to be performed for each evaluation. This is addressed by reducing inference speed. In this study we analyze two different methods for speeding up the evaluation of deep neural networks. The first method reduces the number of weights in a convolutional layer by decomposing its convolutional kernel. The second method lets samples exit a network through early exit branches when classifications are certain. Both methods were evaluated on several network architectures with consistent results. Convolutional kernel decomposition shows 20-70% speed up with no more than 1% loss in classification accuracy in setups evaluated. Early exit branches show up to 300% speed up with no loss in classification accuracy when evaluated on CPUs.
De senaste årens trend inom deep learning har varit att addera fler och fler lager till neurala nätverk. Det här introducerar nya utmaningar i applikationer med latensberoende. Problemet uppstår från mängden beräkningar som måste utföras vid varje evaluering. Detta adresseras med en reducering av inferenshastigheten. Jag analyserar två olika metoder för att snabba upp evalueringen av djupa neurala näverk. Den första metoden reducerar antalet vikter i ett faltningslager via en tensordekomposition på dess kärna. Den andra metoden låter samples lämna nätverket via tidiga förgreningar när en klassificering är säker. Båda metoderna utvärderas på flertalet nätverksarkitekturer med konsistenta resultat. Dekomposition på fältningskärnan visar 20-70% hastighetsökning med mindre än 1% försämring av klassifikationssäkerhet i evaluerade konfigurationer. Tidiga förgreningar visar upp till 300% hastighetsökning utan någon försämring av klassifikationssäkerhet när de evalueras på CPU.
APA, Harvard, Vancouver, ISO, and other styles
42

Peri, Deepthi. "Applying Natural Language Processing and Deep Learning Techniques for Raga Recognition in Indian Classical Music." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/99967.

Full text
Abstract:
In Indian Classical Music (ICM), the Raga is a musical piece's melodic framework. It encompasses the characteristics of a scale, a mode, and a tune, with none of them fully describing it, rendering the Raga a unique concept in ICM. The Raga provides musicians with a melodic fabric, within which all compositions and improvisations must take place. Identifying and categorizing the Raga is challenging due to its dynamism and complex structure as well as the polyphonic nature of ICM. Hence, Raga recognition—identify the constituent Raga in an audio file—has become an important problem in music informatics with several known prior approaches. Advancing the state of the art in Raga recognition paves the way to improving other Music Information Retrieval tasks in ICM, including transcribing notes automatically, recommending music, and organizing large databases. This thesis presents a novel melodic pattern-based approach to recognizing Ragas by representing this task as a document classification problem, solved by applying a deep learning technique. A digital audio excerpt is hierarchically processed and split into subsequences and gamaka sequences to mimic a textual document structure, so our model can learn the resulting tonal and temporal sequence patterns using a Recurrent Neural Network. Although training and testing on these smaller sequences, we predict the Raga for the entire audio excerpt, with the accuracy of 90.3% for the Carnatic Music Dataset and 95.6% for the Hindustani Music Dataset, thus outperforming prior approaches in Raga recognition.
Master of Science
In Indian Classical Music (ICM), the Raga is a musical piece's melodic framework. The Raga is a unique concept in ICM, not fully described by any of the fundamental concepts of Western classical music. The Raga provides musicians with a melodic fabric, within which all compositions and improvisations must take place. Raga recognition refers to identifying the constituent Raga in an audio file, a challenging and important problem with several known prior approaches and applications in Music Information Retrieval. This thesis presents a novel approach to recognizing Ragas by representing this task as a document classification problem, solved by applying a deep learning technique. A digital audio excerpt is processed into a textual document structure, from which the constituent Raga is learned. Based on the evaluation with third-party datasets, our recognition approach achieves high accuracy, thus outperforming prior approaches.
APA, Harvard, Vancouver, ISO, and other styles
43

Yu, Ying. "Improving the Accuracy of 2D On-Road Object Detection Based on Deep Learning Techniques." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235194.

Full text
Abstract:
This paper focuses on improving the accuracy of detecting on-road objects, includingcars, trucks, pedestrians, and cyclists. To meet the requirements of theembedded vision system and maintain a high speed of detection in the advanceddriving assistance system (ADAS) domain, the neural network model is designedbased on single channel images as input from a monocular camera.In the past few decades, forward collision avoidance system, a sub-system ofADAS, has been widely adopted in vehicular safety systems for its great contributionin reducing accidents. Deep neural networks, as the the-state-of-art objectdetection techniques, can be achieved in this embedded vision system withefficient computation on FPGA and high inference speed. Aimed at detectingon-road objects at a high accuracy, this paper applies an advanced end-to-endneural network, single-shot multi-box detector (SSD).In this thesis work, several experiments are carried out on how to enhance theaccuracy performance of SSD models with grayscale input. By adding properextra default boxes in high-layer feature maps and adjust the entire scale range,the detection AP over all classes has been efficiently improved around 20%, withthe mAP of SSD300 model increased from 45.1% to initially 76.8% and the mAPof SSD512 model increased from 58.5% to 78.8% on KITTI dataset. Besides,it has been verified that without color information, the model performance willnot degrade in both speed and accuracy. Experimental results were evaluatedusing Nvidia Tesla P100 GPU on KITTI Vision Benchmark Suite, Udacity annotateddataset and a short video recorded on one street in Stockholm.
Detta dokument fokuserar p att frbttra noggrannheten nr det gller att upptckaon-road-objekt, inklusive bilar, lastbilar, fotgngare och cyklister. Fr att uppfyllakraven i det inbyggda visionssystemet, och upprtthlla en hg upptckthastighet iADAS-domnen (advanced drive assist system), r den neurala ntverksmodellenutformad baserat p enkanalsbilder som inmatning frn en monokulr kamera.Under de senaste decennierna har systemet fr framtida kollisionsundvikandesystem, ett delsystem fr ADAS, antagits allmnt i fordonsskerhetssystem fr sittstora bidrag till att minska olyckor. Djupa neurala ntverk, som den senastetekniken fr detektering av objekt, kan uppns i detta inbyggda visionssystemmed e↵ektiv berkning p FPGA och hg inferenshastighet. Siktat p att upptckavgar p vgar i hg noggrannhet, tillmpar vi ett avancerat neuralt ntverk, singleshotmulti-box detector (SSD).I det hr avhandlingsarbetet utfrs flera experiment om hur man frbttrar SSDmodellernasnoggrannhet med grtoningng. Genom att lgga till lmpliga extrastandardldor i hglagerskartor och justera hela skalaomrdet har upptckt AP veralla klasser frbttrats e↵ektivt kring 20 %, med mAP av SSD300-modellen kat frn45,1 % till initialt 76,8 % och mAP av SSD512-modellen p KITTI-dataset kadefrn 58,5 % till 78,8 %. Dessutom har det kontrollerats att utan frginformationinte kommer att frsmras i bde prestanda och prestanda. Experimentella resultatutvrderades med hjlp av Nvidia Tesla P100 GPU p KITTI Vision BenchmarkSuite, Udacity annoterade dataset och en kort video inspelad p en gata i Stockholm.
APA, Harvard, Vancouver, ISO, and other styles
44

Gomez-Donoso, Francisco. "Contributions to 3D object recognition and 3D hand pose estimation using deep learning techniques." Doctoral thesis, Universidad de Alicante, 2020. http://hdl.handle.net/10045/110658.

Full text
Abstract:
In this thesis, a study of two blooming fields in the artificial intelligence topic is carried out. The first part of the present document is about 3D object recognition methods. Object recognition in general is about providing the ability to understand what objects appears in the input data of an intelligent system. Any robot, from industrial robots to social robots, could benefit of such capability to improve its performance and carry out high level tasks. In fact, this topic has been largely studied and some object recognition methods present in the state of the art outperform humans in terms of accuracy. Nonetheless, these methods are image-based, namely, they focus in recognizing visual features. This could be a problem in some contexts as there exist objects that look alike some other, different objects. For instance, a social robot that recognizes a face in a picture, or an intelligent car that recognizes a pedestrian in a billboard. A potential solution for this issue would be involving tridimensional data so that the systems would not focus on visual features but topological features. Thus, in this thesis, a study of 3D object recognition methods is carried out. The approaches proposed in this document, which take advantage of deep learning methods, take as an input point clouds and are able to provide the correct category. We evaluated the proposals with a range of public challenges, datasets and real life data with high success. The second part of the thesis is about hand pose estimation. This is also an interesting topic that focuses in providing the hand's kinematics. A range of systems, from human computer interaction and virtual reality to social robots could benefit of such capability. For instance to interface a computer and control it with seamless hand gestures or to interact with a social robot that is able to understand human non-verbal communication methods. Thus, in the present document, hand pose estimation approaches are proposed. It is worth noting that the proposals take as an input color images and are able to provide 2D and 3D hand pose in the image plane and euclidean coordinate frames. Specifically, the hand poses are encoded in a collection of points that represents the joints in a hand, so that they can be easily reconstructed in the full hand pose. The methods are evaluated on custom and public datasets, and integrated with a robotic hand teleoperation application with great success.
APA, Harvard, Vancouver, ISO, and other styles
45

Pina, Otey Sebastian. "Deep Learning and Bayesian Techniques applied to Big Data in Industry and Neutrino Oscillations." Doctoral thesis, Universitat Autònoma de Barcelona, 2020. http://hdl.handle.net/10803/671967.

Full text
Abstract:
Les oscil·lacions de neutrins són un fenomen complex d’interès teòric i experimental en física fonamental, estudiat a través d’experiments diversos, com la col·laboració T2K situada al Japó. T2K es compon de dues instal·lacions, que produeixen i mesuren les interaccions de neutrins per obtenir una millor comprensió de les seves oscil·lacions mitjançant l’anàlisi de dades en forma d’inferència de paràmetres, simulació de models i resposta del detector. Mitjançant aquest treball, s’aplicaran tècniques modernes de deep learning en forma d’estimadors de densitat neuronals i xarxes neuronals sobre grafs i es verificaran a fons en casos d’ús de T2K, avaluant-ne els beneficis i les mancances en comparació amb els mètodes tradicionals. Addicionalment, es parlarà d’un ús industrial d’aquestes metodologies per a la xarxa elèctrica espanyola.
Las oscilaciones de neutrinos son un fenómeno complejo de interés teórico y experimental en la física fundamental, estudiado a través de diversos experimentos, como la Colaboración T2K ubicada en Japón. T2K se compone de dos instalaciones, que producen y miden las interacciones de neutrinos para comprender mejor sus oscilaciones a través del análisis de datos en forma de inferencia de parámetros, simulación de modelos y respuesta del detector. A través de este trabajo, las técnicas modernas de deep learning en forma de estimadores de densidad neuronales y redes neuronales sobre grafos se aplicarán y verificarán a fondo en los casos de uso de T2K, evaluando sus beneficios y deficiencias en comparación con los métodos tradicionales. Adicionalmente se discutirá un uso industrial de estas metodologías para la red eléctrica española.
Neutrino oscillations are a complex phenomenon of theoretical and experimental interest in fundamental physics, studied through diverse experiments, such as the T2K Collaboration situated in Japan. T2K is composed of two facilities, which produce and measure neutrino interactions to get a better understanding of their oscillations through data analysis in the form of parameter inference, model simulation and detector response. Through this work, state-of-the-art deep learning techniques in the form of neural density estimators and graph neural networks will be applied and thoroughly verified in T2K use cases, assessing their benefits and shortcomings compared to traditional methods. Additionally an industrial usage of these methodologies for the Spanish electrical network will be discussed.
Universitat Autònoma de Barcelona. Programa de Doctorat en Física
APA, Harvard, Vancouver, ISO, and other styles
46

Singh, Jaswinder. "Detection of Cis-Trans Conformation in Protein Structure using Deep Learning Neural Network Techniques." Thesis, Griffith University, 2019. http://hdl.handle.net/10072/384790.

Full text
Abstract:
Proteins are important biological macromolecules that play critical roles in most biological processes. The functionality of protein depends on its three dimensional structure, which further depends on the protein's amino acid sequence. Direct prediction of 3D structure of protein from amino acid is challenging task. Therefore, prediction of three dimensional protein structure is divided into small sub-problems like one and two-dimensional properties of protein structure. The solution of these sub-problems can lead to successful three-dimensional structure prediction of protein. Accurate prediction of Cis 􀀀 Trans conformation in amino acid residues is one such sub-problem of protein structure prediction. It has been long established that cis conformations of amino acid residues play many biologically important roles and are implicated in cancer and neurodegenerative diseases, despite their exceptionally rare occurrence in protein structure (99.6% in trans). Due to this rarity, few methods have been developed for predicting cis-isomers from protein sequences, most of which are based on outdated datasets and lack the means for independent testing. This report presents several machine learning algorithm for the prediction of Cis 􀀀 Trans conformation of amino acid residues. In this research work, using a database of more than 10000 high-resolution protein structures, we update the statistics of cis-isomers available in literature and develop a sequence-based prediction technique using an ensemble of residual convolutional and Long Short-Term Memory bidirectional recurrent neural networks which allows for learning from the whole protein sequence. We show that ensembling 8 neural network models yields the maximum MCC value of approximately 0.35 for cis-Pro-isomers, and 0.1 for cis-nonPro residues. The method should be useful to prioritize functionally important residues in cis-isomers for experimental validations and improve sampling of rare protein conformations for ab initio protein structure prediction.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Eng & Built Env
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
47

Olsson, Johan. "A Client-Server Solution for Detecting Guns in School Environment using Deep Learning Techniques." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-162476.

Full text
Abstract:
With the progress of deep learning methods the last couple of years, object detection related tasks are improving rapidly. Using object detection for detecting guns in schools remove the need for human supervision and hopefully reduces police response time. This paper investigates how a gun detection system can be built by reading frames locally and using a server for detection. The detector is based on a pre-trained SSD model and through transfer learning is taught to recognize guns. The detector obtained an Average Precision of 51.1% and the server response time for a frame of size 1920 x 1080 was 480 ms, but could be scaled down to 240 x 135 to reach 210 ms, without affecting the accuracy. A non-gun class was implemented to reduce the number of false positives and on a set of 300 images containing 165 guns, the number of false positives dropped from 21 to 11.
APA, Harvard, Vancouver, ISO, and other styles
48

Quan, Weize. "Detection of computer-generated images via deep learning." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALT076.

Full text
Abstract:
Avec les progrès des outils logiciels d'édition et de génération d'images, il est devenu plus facile de falsifier le contenu des images ou de créer de nouvelles images, même pour les novices. Ces images générées, telles que l'image de rendu photoréaliste et l'image colorisée, ont un réalisme visuel de haute qualité et peuvent potentiellement menacer de nombreuses applications importantes. Par exemple, les services judiciaires doivent vérifier que les images ne sont pas produites par la technologie de rendu infographique, les images colorisées peuvent amener les systèmes de reconnaissance / surveillance à produire des décisions incorrectes, etc. Par conséquent, la détection d'images générées par ordinateur a attiré une large attention dans la communauté de recherche en sécurité de multimédia. Dans cette thèse, nous étudions l'identification de différents types d'images générées par ordinateur, y compris l'image de rendu et l'image coloriée. Nous nous intéressons à identifier si une image est acquise par une caméra ou générée par un programme informatique. L'objectif principal est de concevoir un détecteur efficace, qui a une précision de classification élevée et une bonne capacité de généralisation. Nous considérons la construction de jeux de données, l'architecture du réseau de neurones profond, la méthode d'entraînement, la visualisation et la compréhension, pour les problèmes d'investigation légale des images considérés. Nos principales contributions sont : (1) une méthode de détection d'image colorisée basée sur l'insertion d'échantillons négatifs, (2) une méthode d'amélioration de la généralisation pour la détection d'image colorisée, (3) une méthode d'identification d'image naturelle et d'image de rendu basée sur le réseau neuronal convolutif, et (4) une méthode d'identification d'image de rendu basée sur l'amélioration de la diversité des caractéristiques et des échantillons contradictoires
With the advances of image editing and generation software tools, it has become easier to tamper with the content of images or create new images, even for novices. These generated images, such as computer graphics (CG) image and colorized image (CI), have high-quality visual realism, and potentially throw huge threats to many important scenarios. For instance, the judicial departments need to verify that pictures are not produced by computer graphics rendering technology, colorized images can cause recognition/monitoring systems to produce incorrect decisions, and so on. Therefore, the detection of computer-generated images has attracted widespread attention in the multimedia security research community. In this thesis, we study the identification of different computer-generated images including CG image and CI, namely, identifying whether an image is acquired by a camera or generated by a computer program. The main objective is to design an efficient detector, which has high classification accuracy and good generalization capability. Specifically, we consider dataset construction, network architecture, training methodology, visualization and understanding, for the considered forensic problems. The main contributions are: (1) a colorized image detection method based on negative sample insertion, (2) a generalization method for colorized image detection, (3) a method for the identification of natural image (NI) and CG image based on CNN (Convolutional Neural Network), and (4) a CG image identification method based on the enhancement of feature diversity and adversarial samples
APA, Harvard, Vancouver, ISO, and other styles
49

Heffernan, Rhys. "Addressing One-Dimensional Protein Structure Prediction Problems with Machine Learning Techniques." Thesis, Griffith University, 2018. http://hdl.handle.net/10072/381401.

Full text
Abstract:
In this thesis we tackle the protein structure prediction subproblems listed previously, by applying state of the art deep learning techniques. The work in chapter 2 presents the method SPIDER. In this method, state of the art deep learning is applied iteratively to the task of predicting backbone torsion angles and , and dihedral angles and , by applying evolutionary-derived sequence pro les and physio-chemical properties of amino acid residues. This work is the fi rst method for the sequence based prediction of and angles. Chapter 3 presents the method SPIDER2. This method takes the state of the art iterative deep learning applied in SPIDER, and extends it to the prediction of three-state secondary structure, solvent accessible surface area, and ; ; , and angles, and achieves the best reported prediction accuracies for all of them (at the date of publication). Chapter 4 further builds on the work done in the previous chapters, and now adds the prediction of half sphere exposure (both C and C based) and contact numbers to SPIDER2, in a method called SPIDER2-HSE. In Chapter 5, Long Short-Term Memory Bidirectional Recurrent Neural Networks were applied to the prediction of three-state secondary structure, solvent accessible surface area, ; ; , and angles, as well as half sphere exposure and contact numbers. Previously methods used for these predictions (including SPIDER2) were typically window based. That is to say that the input data made available to the model for a given residue, is comprised of information for only that residue and a number of residues on either side in the sequence (in the range of 10-20 residues on each side). The use of LSTM-BRNNs in this method allows SPIDER3 to better learn both long and short term interactions within proteins. This advancement again lead to the best reported accuracies for all predicted structural properties. In Chapter 6, the LSTM-BRNN model used in SPIDER3 is applied to the prediction of the same structural property predictions, plus the prediction of eight-state secondary structure, using only single-sequence inputs. That is, structural properties were predicted without using any evolutionary information. This provides a method that provides not only the best reported single-sequence secondary structure and solvent accessible surface area predictions, but the fi rst reported method for the single-sequence based prediction of half sphere exposure, contact numbers, and ; ; , and angles. This study is important as most proteins have few homologous sequences and their evolutionary profi les are inac- curate and time-consuming to calculate. This single-sequence-based technique allows for fast genome-scale screening analysis of protein one-dimensional structural properties.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Eng & Built Env
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
50

Alabdulrahman, Rabaa. "Towards Personalized Recommendation Systems: Domain-Driven Machine Learning Techniques and Frameworks." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/41012.

Full text
Abstract:
Recommendation systems have been widely utilized in e-commerce settings to aid users through their shopping experiences. The principal advantage of these systems is their ability to narrow down the purchase options in addition to marketing items to customers. However, a number of challenges remain, notably those related to obtaining a clearer understanding of users, their profiles, and their preferences in terms of purchased items. Specifically, recommender systems based on collaborative filtering recommend items that have been rated by other users with preferences similar to those of the targeted users. Intuitively, the more information and ratings collected about the user, the more accurate are the recommendations such systems suggest. In a typical recommender systems database, the data are sparse. Sparsity occurs when the number of ratings obtained by the users is much lower than the number required to build a prediction model. This usually occurs because of the users’ reluctance to share their reviews, either due to privacy issues or an unwillingness to make the extra effort. Grey-sheep users pose another challenge. These are users who shared their reviews and ratings yet disagree with the majority in the systems. The current state-of-the-art typically treats these users as outliers and removes them from the system. Our goal is to determine whether keeping these users in the system may benefit learning. Thirdly, cold-start problems refer to the scenario whereby a new item or user enters the system and is another area of active research. In this case, the system will have no information about the new user or item, making it problematic to find a correlation with others in the system. This thesis addresses the three above-mentioned research challenges through the development of machine learning methods for use within the recommendation system setting. First, we focus on the label and data sparsity though the development of the Hybrid Cluster analysis and Classification learning (HCC-Learn) framework, combining supervised and unsupervised learning methods. We show that combining classification algorithms such as k-nearest neighbors and ensembles based on feature subspaces with cluster analysis algorithms such as expectation maximization, hierarchical clustering, canopy, k-means, and cascade k-means methods, generally produces high-quality results when applied to benchmark datasets. That is, cluster analysis clearly benefits the learning process, leading to high predictive accuracies for existing users. Second, to address the cold-start problem, we present the Popular Users Personalized Predictions (PUPP-DA) framework. This framework combines cluster analysis and active learning, or so-called user-in-the-loop, to assign new customers to the most appropriate groups in our framework. Based on our findings from the HCC-Learn framework, we employ the expectation maximization soft clustering technique to create our user segmentations in the PUPP-DA framework, and we further incorporate Convolutional Neural Networks into our design. Our results show the benefits of user segmentation based on soft clustering and the use of active learning to improve predictions for new users. Furthermore, our findings show that focusing on frequent or popular users clearly improves classification accuracy. In addition, we demonstrate that deep learning outperforms machine learning techniques, notably resulting in more accurate predictions for individual users. Thirdly, we address the grey-sheep problem in our Grey-sheep One-class Recommendations (GSOR) framework. The existence of grey-sheep users in the system results in a class imbalance whereby the majority of users will belong to one class and a small portion (grey-sheep users) will fall into the minority class. In this framework, we use one-class classification to provide a class structure for the training examples. As a pre-assessment stage, we assess the characteristics of grey-sheep users and study their impact on model accuracy. Next, as mentioned above, we utilize one-class learning, whereby we focus on the majority class to first learn the decision boundary in order to generate prediction lists for the grey-sheep (minority class). Our results indicate that including grey-sheep users in the training step, as opposed to treating them as outliers and removing them prior to learning, has a positive impact on the general predictive accuracy.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography