Log in

Relevant bibliographies by topics / Sparse deep neural networks / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Sparse deep neural networks.

Dissertations / Theses on the topic 'Sparse deep neural networks'

Author: Grafiati

Published: 7 July 2024

Last updated: 7 July 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Sparse deep neural networks.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Tavanaei, Amirhossein. "Spiking Neural Networks and Sparse Deep Learning." Thesis, University of Louisiana at Lafayette, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10807940.

Full text

Abstract:

This document proposes new methods for training multi-layer and deep spiking neural networks (SNNs), specifically, spiking convolutional neural networks (CNNs). Training a multi-layer spiking network poses difficulties because the output spikes do not have derivatives and the commonly used backpropagation method for non-spiking networks is not easily applied. Our methods use novel versions of the brain-like, local learning rule named spike-timing-dependent plasticity (STDP) that incorporates supervised and unsupervised components. Our method starts with conventional learning methods and converts them to spatio-temporally local rules suited for SNNs.

The training uses two components for unsupervised feature extraction and supervised classification. The first component refers to new STDP rules for spike-based representation learning that trains convolutional filters and initial representations. The second introduces new STDP-based supervised learning rules for spike pattern classification via an approximation to gradient descent by combining the STDP and anti-STDP rules. Specifically, the STDP-based supervised learning model approximates gradient descent by using temporally local STDP rules. Stacking these components implements a novel sparse, spiking deep learning model. Our spiking deep learning model is categorized as a variation of spiking CNNs of integrate-and-fire (IF) neurons with performance comparable with the state-of-the-art deep SNNs. The experimental results show the success of the proposed model for image classification. Our network architecture is the only spiking CNN which provides bio-inspired STDP rules in a hierarchy of feature extraction and classification in an entirely spike-based framework.

APA, Harvard, Vancouver, ISO, and other styles

2

Le, Quoc Tung. "Algorithmic and theoretical aspects of sparse deep neural networks." Electronic Thesis or Diss., Lyon, École normale supérieure, 2023. http://www.theses.fr/2023ENSL0105.

Full text

Abstract:

Les réseaux de neurones profonds parcimonieux offrent une opportunité pratique convaincante pour réduire le coût de l'entraînement, de l'inférence et du stockage, qui augmente de manière exponentielle dans l'état de l'art de l'apprentissage profond. Dans cette présentation, nous introduirons une approche pour étudier les réseaux de neurones profonds parcimonieux à travers le prisme d'un autre problème : la factorisation de matrices sous constraints de parcimonie, c'est-à-dire le problème d'approximation d'une matrice (dense) par le produit de facteurs (multiples) parcimonieux. En particulier, nous identifions et étudions en détail certains aspects théoriques et algorithmiques d'une variante de la factorisation de matrices parcimonieux appelée factorisation de matrices à support fixe (FSMF), dans laquelle l'ensemble des entrées non nulles des facteurs parcimonieux est connu. Plusieurs questions fondamentales des réseaux de neurones profonds parcimonieux, telles que l'existence de solutions optimales du problème d'entraînement ou les propriétés topologiques de son espace fonctionnel, peuvent être abordées à l'aide des résultats de la (FSMF). De plus, en appliquant les résultats de la (FSMF), nous étudions également la paramétrisation du type "butterfly", une approche qui consiste à remplacer les matrices de poids (larges) par le produit de matrices extrêmement parcimonieuses et structurées dans les réseaux de neurones profonds parcimonieux
Sparse deep neural networks offer a compelling practical opportunity to reduce the cost of training, inference and storage, which are growing exponentially in the state of the art of deep learning. In this presentation, we will introduce an approach to study sparse deep neural networks through the lens of another related problem: sparse matrix factorization, i.e., the problem of approximating a (dense) matrix by the product of (multiple) sparse factors. In particular, we identify and investigate in detail some theoretical and algorithmic aspects of a variant of sparse matrix factorization named fixed support matrix factorization (FSMF) in which the set of non-zero entries of sparse factors are known. Several fundamental questions of sparse deep neural networks such as the existence of optimal solutions of the training problem or topological properties of its function space can be addressed using the results of (FSMF). In addition, by applying the results of (FSMF), we also study the butterfly parametrization, an approach that consists of replacing (large) weight matrices by the products of extremely sparse and structured ones in sparse deep neural networks

APA, Harvard, Vancouver, ISO, and other styles

3

Hoori, Ammar O. "MULTI-COLUMN NEURAL NETWORKS AND SPARSE CODING NOVEL TECHNIQUES IN MACHINE LEARNING." VCU Scholars Compass, 2019. https://scholarscompass.vcu.edu/etd/5743.

Full text

Abstract:

Accurate and fast machine learning (ML) algorithms are highly vital in artificial intelligence (AI) applications. In complex dataset problems, traditional ML methods such as radial basis function neural network (RBFN), sparse coding (SC) using dictionary learning, and particle swarm optimization (PSO) provide trivial results, large structure, slow training, and/or slow testing. This dissertation introduces four novel ML techniques: the multi-column RBFN network (MCRN), the projected dictionary learning algorithm (PDL) and the multi-column adaptive and non-adaptive particle swarm optimization techniques (MC-APSO and MC-PSO). These novel techniques provide efficient alternatives for traditional ML techniques. Compared to traditional ML techniques, the novel ML techniques demonstrate more accurate results, faster training and testing timing, and parallelized structured solutions. MCRN deploys small RBFNs in a parallel structure to speed up both training and testing. Each RBFN is trained with a subset of the dataset and the overall structure provides results that are more accurate. PDL introduces a conceptual dictionary learning method in updating the dictionary atoms with the reconstructed input blocks. This method improves the sparsity of extracted features and hence, the image denoising results. MC-PSO and MC-APSO provide fast and more accurate alternatives to the PSO and APSO slow evolutionary techniques. MC-PSO and MC-APSO use multi-column parallelized RBFN structure to improve results and speed with a wide range of classification dataset problems. The novel techniques are trained and tested using benchmark dataset problems and the results are compared with the state-of-the-art counterpart techniques to evaluate their performance. Novel techniques’ results show superiority over techniques in accuracy and speed in most of the experimental results, which make them good alternatives in solving difficult ML problems.

APA, Harvard, Vancouver, ISO, and other styles

4

Vekhande, Swapnil Sudhir. "Deep Learning Neural Network-based Sinogram Interpolation for Sparse-View CT Reconstruction." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/90182.

Full text

Abstract:

Computed Tomography (CT) finds applications across domains like medical diagnosis, security screening, and scientific research. In medical imaging, CT allows physicians to diagnose injuries and disease more quickly and accurately than other imaging techniques. However, CT is one of the most significant contributors of radiation dose to the general population and the required radiation dose for scanning could lead to cancer. On the other hand, a shallow radiation dose could sacrifice image quality causing misdiagnosis. To reduce the radiation dose, sparse-view CT, which includes capturing a smaller number of projections, becomes a promising alternative. However, the image reconstructed from linearly interpolated views possesses severe artifacts. Recently, Deep Learning-based methods are increasingly being used to interpret the missing data by learning the nature of the image formation process. The current methods are promising but operate mostly in the image domain presumably due to lack of projection data. Another limitation is the use of simulated data with less sparsity (up to 75%). This research aims to interpolate the missing sparse-view CT in the sinogram domain using deep learning. To this end, a residual U-Net architecture has been trained with patch-wise projection data to minimize Euclidean distance between the ground truth and the interpolated sinogram. The model can generate highly sparse missing projection data. The results show improvement in SSIM and RMSE by 14% and 52% respectively with respect to the linear interpolation-based methods. Thus, experimental sparse-view CT data with 90% sparsity has been successfully interpolated while improving CT image quality.
Master of Science
Computed Tomography is a commonly used imaging technique due to the remarkable ability to visualize internal organs, bones, soft tissues, and blood vessels. It involves exposing the subject to X-ray radiation, which could lead to cancer. On the other hand, the radiation dose is critical for the image quality and subsequent diagnosis. Thus, image reconstruction using only a small number of projection data is an open research problem. Deep learning techniques have already revolutionized various Computer Vision applications. Here, we have used a method which fills missing highly sparse CT data. The results show that the deep learning-based method outperforms standard linear interpolation-based methods while improving the image quality.

APA, Harvard, Vancouver, ISO, and other styles

5

Carvalho, Micael. "Deep representation spaces." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS292.

Full text

Abstract:

Ces dernières années, les techniques d’apprentissage profond ont fondamentalement transformé l'état de l'art de nombreuses applications de l'apprentissage automatique, devenant la nouvelle approche standard pour plusieurs d’entre elles. Les architectures provenant de ces techniques ont été utilisées pour l'apprentissage par transfert, ce qui a élargi la puissance des modèles profonds à des tâches qui ne disposaient pas de suffisamment de données pour les entraîner à partir de zéro. Le sujet d'étude de cette thèse couvre les espaces de représentation créés par les architectures profondes. Dans un premier temps, nous étudions les propriétés de leurs espaces, en prêtant un intérêt particulier à la redondance des dimensions et la précision numérique de leurs représentations. Nos résultats démontrent un fort degré de robustesse, pointant vers des schémas de compression simples et puissants. Ensuite, nous nous concentrons sur le l'affinement de ces représentations. Nous choisissons d'adopter un problème multi-tâches intermodal et de concevoir une fonction de coût capable de tirer parti des données de plusieurs modalités, tout en tenant compte des différentes tâches associées au même ensemble de données. Afin d'équilibrer correctement ces coûts, nous développons également un nouveau processus d'échantillonnage qui ne prend en compte que des exemples contribuant à la phase d'apprentissage, c'est-à-dire ceux ayant un coût positif. Enfin, nous testons notre approche sur un ensemble de données à grande échelle de recettes de cuisine et d'images associées. Notre méthode améliore de 5 fois l'état de l'art sur cette tâche, et nous montrons que l'aspect multitâche de notre approche favorise l'organisation sémantique de l'espace de représentation, lui permettant d'effectuer des sous-tâches jamais vues pendant l'entraînement, comme l'exclusion et la sélection d’ingrédients. Les résultats que nous présentons dans cette thèse ouvrent de nombreuses possibilités, y compris la compression de caractéristiques pour les applications distantes, l'apprentissage multi-modal et multitâche robuste et l'affinement de l'espace des caractéristiques. Pour l'application dans le contexte de la cuisine, beaucoup de nos résultats sont directement applicables dans une situation réelle, en particulier pour la détection d'allergènes, la recherche de recettes alternatives en raison de restrictions alimentaires et la planification de menus
In recent years, Deep Learning techniques have swept the state-of-the-art of many applications of Machine Learning, becoming the new standard approach for them. The architectures issued from these techniques have been used for transfer learning, which extended the power of deep models to tasks that did not have enough data to fully train them from scratch. This thesis' subject of study is the representation spaces created by deep architectures. First, we study properties inherent to them, with particular interest in dimensionality redundancy and precision of their features. Our findings reveal a strong degree of robustness, pointing the path to simple and powerful compression schemes. Then, we focus on refining these representations. We choose to adopt a cross-modal multi-task problem, and design a loss function capable of taking advantage of data coming from multiple modalities, while also taking into account different tasks associated to the same dataset. In order to correctly balance these losses, we also we develop a new sampling scheme that only takes into account examples contributing to the learning phase, i.e. those having a positive loss. Finally, we test our approach in a large-scale dataset of cooking recipes and associated pictures. Our method achieves a 5-fold improvement over the state-of-the-art, and we show that the multi-task aspect of our approach promotes a semantically meaningful organization of the representation space, allowing it to perform subtasks never seen during training, like ingredient exclusion and selection. The results we present in this thesis open many possibilities, including feature compression for remote applications, robust multi-modal and multi-task learning, and feature space refinement. For the cooking application, in particular, many of our findings are directly applicable in a real-world context, especially for the detection of allergens, finding alternative recipes due to dietary restrictions, and menu planning

APA, Harvard, Vancouver, ISO, and other styles

6

Pawlowski, Filip igor. "High-performance dense tensor and sparse matrix kernels for machine learning." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEN081.

Full text

Abstract:

Dans cette thèse, nous développons des algorithmes à haute performance pour certains calculs impliquant des tenseurs denses et des matrices éparses. Nous abordons les opérations du noyau qui sont utiles pour les tâches d'apprentissage de la machine, telles que l'inférence avec les réseaux neuronaux profonds. Nous développons des structures de données et des techniques pour réduire l'utilisation de la mémoire, pour améliorer la localisation des données et donc pour améliorer la réutilisation du cache des opérations du noyau. Nous concevons des algorithmes parallèles à mémoire séquentielle et à mémoire partagée.Dans la première partie de la thèse, nous nous concentrons sur les noyaux tenseurs denses. Les noyaux tenseurs comprennent la multiplication tenseur-vecteur (TVM), la multiplication tenseur-matrice (TMM) et la multiplication tenseur-tendeur (TTM). Parmi ceux-ci, la MVT est la plus liée à la largeur de bande et constitue un élément de base pour de nombreux algorithmes. Nous proposons une nouvelle structure de données qui stocke le tenseur sous forme de blocs, qui sont ordonnés en utilisant la courbe de remplissage de l'espace connue sous le nom de courbe de Morton (ou courbe en Z). L'idée clé consiste à diviser le tenseur en blocs suffisamment petits pour tenir dans le cache et à les stocker selon l'ordre de Morton, tout en conservant un ordre simple et multidimensionnel sur les éléments individuels qui les composent. Ainsi, des routines BLAS haute performance peuvent être utilisées comme micro-noyaux pour chaque bloc. Les résultats démontrent non seulement que l'approche proposée est plus performante que les variantes de pointe jusqu'à 18%, mais aussi que l'approche proposée induit 71% de moins d'écart-type d'échantillon pour le MVT dans les différents modes possibles. Enfin, nous étudions des algorithmes de mémoire partagée parallèles pour la MVT qui utilisent la structure de données proposée. Nos résultats sur un maximum de 8 systèmes de prises montrent une performance presque maximale pour l'algorithme proposé pour les tenseurs à 2, 3, 4 et 5 dimensions.Dans la deuxième partie de la thèse, nous explorons les calculs épars dans les réseaux de neurones en nous concentrant sur le problème d'inférence profonde épars à haute performance. L'inférence sparse DNN est la tâche d'utiliser les réseaux sparse DNN pour classifier un lot d'éléments de données formant, dans notre cas, une matrice de caractéristiques sparse. La performance de l'inférence clairsemée dépend de la parallélisation efficace de la matrice clairsemée - la multiplication matricielle clairsemée (SpGEMM) répétée pour chaque couche dans la fonction d'inférence. Nous introduisons ensuite l'inférence modèle-parallèle, qui utilise un partitionnement bidimensionnel des matrices de poids obtenues à l'aide du logiciel de partitionnement des hypergraphes. Enfin, nous introduisons les algorithmes de tuilage modèle-parallèle et de tuilage hybride, qui augmentent la réutilisation du cache entre les couches, et utilisent un module de synchronisation faible pour cacher le déséquilibre de charge et les coûts de synchronisation. Nous évaluons nos techniques sur les données du grand réseau du IEEE HPEC 2019 Graph Challenge sur les systèmes à mémoire partagée et nous rapportons jusqu'à 2x l'accélération par rapport à la ligne de base
In this thesis, we develop high performance algorithms for certain computations involving dense tensors and sparse matrices. We address kernel operations that are useful for machine learning tasks, such as inference with deep neural networks (DNNs). We develop data structures and techniques to reduce memory use, to improve data locality and hence to improve cache reuse of the kernel operations. We design both sequential and shared-memory parallel algorithms. In the first part of the thesis we focus on dense tensors kernels. Tensor kernels include the tensor--vector multiplication (TVM), tensor--matrix multiplication (TMM), and tensor--tensor multiplication (TTM). Among these, TVM is the most bandwidth-bound and constitutes a building block for many algorithms. We focus on this operation and develop a data structure and sequential and parallel algorithms for it. We propose a novel data structure which stores the tensor as blocks, which are ordered using the space-filling curve known as the Morton curve (or Z-curve). The key idea consists of dividing the tensor into blocks small enough to fit cache, and storing them according to the Morton order, while keeping a simple, multi-dimensional order on the individual elements within them. Thus, high performance BLAS routines can be used as microkernels for each block. We evaluate our techniques on a set of experiments. The results not only demonstrate superior performance of the proposed approach over the state-of-the-art variants by up to 18%, but also show that the proposed approach induces 71% less sample standard deviation for the TVM across the d possible modes. Finally, we show that our data structure naturally expands to other tensor kernels by demonstrating that it yields up to 38% higher performance for the higher-order power method. Finally, we investigate shared-memory parallel TVM algorithms which use the proposed data structure. Several alternative parallel algorithms were characterized theoretically and implemented using OpenMP to compare them experimentally. Our results on up to 8 socket systems show near peak performance for the proposed algorithm for 2, 3, 4, and 5-dimensional tensors. In the second part of the thesis, we explore the sparse computations in neural networks focusing on the high-performance sparse deep inference problem. The sparse DNN inference is the task of using sparse DNN networks to classify a batch of data elements forming, in our case, a sparse feature matrix. The performance of sparse inference hinges on efficient parallelization of the sparse matrix--sparse matrix multiplication (SpGEMM) repeated for each layer in the inference function. We first characterize efficient sequential SpGEMM algorithms for our use case. We then introduce the model-parallel inference, which uses a two-dimensional partitioning of the weight matrices obtained using the hypergraph partitioning software. The model-parallel variant uses barriers to synchronize at layers. Finally, we introduce tiling model-parallel and tiling hybrid algorithms, which increase cache reuse between the layers, and use a weak synchronization module to hide load imbalance and synchronization costs. We evaluate our techniques on the large network data from the IEEE HPEC 2019 Graph Challenge on shared-memory systems and report up to 2x times speed-up versus the baseline

APA, Harvard, Vancouver, ISO, and other styles

7

Thom, Markus [Verfasser]. "Sparse neural networks / Markus Thom." Ulm : Universität Ulm. Fakultät für Ingenieurwissenschaften und Informatik, 2015. http://d-nb.info/1067496319/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Liu, Qian. "Deep spiking neural networks." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/deep-spiking-neural-networks(336e6a37-2a0b-41ff-9ffb-cca897220d6c).html.

Full text

Abstract:

Neuromorphic Engineering (NE) has led to the development of biologically-inspired computer architectures whose long-term goal is to approach the performance of the human brain in terms of energy efficiency and cognitive capabilities. Although there are a number of neuromorphic platforms available for large-scale Spiking Neural Network (SNN) simulations, the problem of programming these brain-like machines to be competent in cognitive applications still remains unsolved. On the other hand, Deep Learning has emerged in Artificial Neural Network (ANN) research to dominate state-of-the-art solutions for cognitive tasks. Thus the main research problem emerges of understanding how to operate and train biologically-plausible SNNs to close the gap in cognitive capabilities between SNNs and ANNs. SNNs can be trained by first training an equivalent ANN and then transferring the tuned weights to the SNN. This method is called âoff-lineâ training, since it does not take place on an SNN directly, but rather on an ANN instead. However, previous work on such off-line training methods has struggled in terms of poor modelling accuracy of the spiking neurons and high computational complexity. In this thesis we propose a simple and novel activation function, Noisy Softplus (NSP), to closely model the response firing activity of biologically-plausible spiking neurons, and introduce a generalised off-line training method using the Parametric Activation Function (PAF) to map the abstract numerical values of the ANN to concrete physical units, such as current and firing rate in the SNN. Based on this generalised training method and its fine tuning, we achieve the state-of-the-art accuracy on the MNIST classification task using spiking neurons, 99.07%, on a deep spiking convolutional neural network (ConvNet). We then take a step forward to âon-lineâ training methods, where Deep Learning modules are trained purely on SNNs in an event-driven manner. Existing work has failed to provide SNNs with recognition accuracy equivalent to ANNs due to the lack of mathematical analysis. Thus we propose a formalised Spike-based Rate Multiplication (SRM) method which transforms the product of firing rates to the number of coincident spikes of a pair of rate-coded spike trains. Moreover, these coincident spikes can be captured by the Spike-Time-Dependent Plasticity (STDP) rule to update the weights between the neurons in an on-line, event-based, and biologically-plausible manner. Furthermore, we put forward solutions to reduce correlations between spike trains; thereby addressing the result of performance drop in on-line SNN training. The promising results of spiking Autoencoders (AEs) and Restricted Boltzmann Machines (SRBMs) exhibit equivalent, sometimes even superior, classification and reconstruction capabilities compared to their non-spiking counterparts. To provide meaningful comparisons between these proposed SNN models and other existing methods within this rapidly advancing field of NE, we propose a large dataset of spike-based visual stimuli and a corresponding evaluation methodology to estimate the overall performance of SNN models and their hardware implementations.

APA, Harvard, Vancouver, ISO, and other styles

9

Squadrani, Lorenzo. "Deep neural networks and thermodynamics." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text

Abstract:

Deep learning is the most effective and used approach to artificial intelligence, and yet it is far from being properly understood. The understanding of it is the way to go to further improve its effectiveness and in the best case to gain some understanding of the "natural" intelligence. We attempt a step in this direction with the aim of physics. We describe a convolutional neural network for image classification (trained on CIFAR-10) within the descriptive framework of Thermodynamics. In particular we define and study the temperature of each component of the network. Our results provides a new point of view on deep learning models, which may be a starting point towards a better understanding of artificial intelligence.

APA, Harvard, Vancouver, ISO, and other styles

10

Mancevo, del Castillo Ayala Diego. "Compressing Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217316.

Full text

Abstract:

Deep Convolutional Neural Networks and "deep learning" in general stand at the cutting edge on a range of applications, from image based recognition and classification to natural language processing, speech and speaker recognition and reinforcement learning. Very deep models however are often large, complex and computationally expensive to train and evaluate. Deep learning models are thus seldom deployed natively in environments where computational resources are scarce or expensive. To address this problem we turn our attention towards a range of techniques that we collectively refer to as "model compression" where a lighter student model is trained to approximate the output produced by the model we wish to compress. To this end, the output from the original model is used to craft the training labels of the smaller student model. This work contains some experiments on CIFAR-10 and demonstrates how to use the aforementioned techniques to compress a people counting model whose precision, recall and F1-score are improved by as much as 14% against our baseline.

APA, Harvard, Vancouver, ISO, and other styles

11

Abbasi, Mahdieh. "Toward robust deep neural networks." Doctoral thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67766.

Full text

Abstract:

Dans cette thèse, notre objectif est de développer des modèles d’apprentissage robustes et fiables mais précis, en particulier les Convolutional Neural Network (CNN), en présence des exemples anomalies, comme des exemples adversaires et d’échantillons hors distribution –Out-of-Distribution (OOD). Comme la première contribution, nous proposons d’estimer la confiance calibrée pour les exemples adversaires en encourageant la diversité dans un ensemble des CNNs. À cette fin, nous concevons un ensemble de spécialistes diversifiés avec un mécanisme de vote simple et efficace en termes de calcul pour prédire les exemples adversaires avec une faible confiance tout en maintenant la confiance prédicative des échantillons propres élevée. En présence de désaccord dans notre ensemble, nous prouvons qu’une borne supérieure de 0:5 + _0 peut être établie pour la confiance, conduisant à un seuil de détection global fixe de tau = 0; 5. Nous justifions analytiquement le rôle de la diversité dans notre ensemble sur l’atténuation du risque des exemples adversaires à la fois en boîte noire et en boîte blanche. Enfin, nous évaluons empiriquement la robustesse de notre ensemble aux attaques de la boîte noire et de la boîte blanche sur plusieurs données standards. La deuxième contribution vise à aborder la détection d’échantillons OOD à travers un modèle de bout en bout entraîné sur un ensemble OOD approprié. À cette fin, nous abordons la question centrale suivante : comment différencier des différents ensembles de données OOD disponibles par rapport à une tâche de distribution donnée pour sélectionner la plus appropriée, ce qui induit à son tour un modèle calibré avec un taux de détection des ensembles inaperçus de données OOD? Pour répondre à cette question, nous proposons de différencier les ensembles OOD par leur niveau de "protection" des sub-manifolds. Pour mesurer le niveau de protection, nous concevons ensuite trois nouvelles mesures efficaces en termes de calcul à l’aide d’un CNN vanille préformé. Dans une vaste série d’expériences sur les tâches de classification d’image et d’audio, nous démontrons empiriquement la capacité d’un CNN augmenté (A-CNN) et d’un CNN explicitement calibré pour détecter une portion significativement plus grande des exemples OOD. Fait intéressant, nous observons également qu’un tel A-CNN (nommé A-CNN) peut également détecter les adversaires exemples FGS en boîte noire avec des perturbations significatives. En tant que troisième contribution, nous étudions de plus près de la capacité de l’A-CNN sur la détection de types plus larges d’adversaires boîte noire (pas seulement ceux de type FGS). Pour augmenter la capacité d’A-CNN à détecter un plus grand nombre d’adversaires,nous augmentons l’ensemble d’entraînement OOD avec des échantillons interpolés inter-classes. Ensuite, nous démontrons que l’A-CNN, entraîné sur tous ces données, a un taux de détection cohérent sur tous les types des adversaires exemples invisibles. Alors que la entraînement d’un A-CNN sur des adversaires PGD ne conduit pas à un taux de détection stable sur tous les types d’adversaires, en particulier les types inaperçus. Nous évaluons également visuellement l’espace des fonctionnalités et les limites de décision dans l’espace d’entrée d’un CNN vanille et de son homologue augmenté en présence d’adversaires et de ceux qui sont propres. Par un A-CNN correctement formé, nous visons à faire un pas vers un modèle d’apprentissage debout en bout unifié et fiable avec de faibles taux de risque sur les échantillons propres et les échantillons inhabituels, par exemple, les échantillons adversaires et OOD. La dernière contribution est de présenter une application de A-CNN pour l’entraînement d’un détecteur d’objet robuste sur un ensemble de données partiellement étiquetées, en particulier un ensemble de données fusionné. La fusion de divers ensembles de données provenant de contextes similaires mais avec différents ensembles d’objets d’intérêt (OoI) est un moyen peu coûteux de créer un ensemble de données à grande échelle qui couvre un plus large spectre d’OoI. De plus, la fusion d’ensembles de données permet de réaliser un détecteur d’objet unifié, au lieu d’en avoir plusieurs séparés, ce qui entraîne une réduction des coûts de calcul et de temps. Cependant, la fusion d’ensembles de données, en particulier à partir d’un contexte similaire, entraîne de nombreuses instances d’étiquetées manquantes. Dans le but d’entraîner un détecteur d’objet robuste intégré sur un ensemble de données partiellement étiquetées mais à grande échelle, nous proposons un cadre d’entraînement auto-supervisé pour surmonter le problème des instances d’étiquettes manquantes dans les ensembles des données fusionnés. Notre cadre est évalué sur un ensemble de données fusionné avec un taux élevé d’étiquettes manquantes. Les résultats empiriques confirment la viabilité de nos pseudo-étiquettes générées pour améliorer les performances de YOLO, en tant que détecteur d’objet à la pointe de la technologie.
In this thesis, our goal is to develop robust and reliable yet accurate learning models, particularly Convolutional Neural Networks (CNNs), in the presence of adversarial examples and Out-of-Distribution (OOD) samples. As the first contribution, we propose to predict adversarial instances with high uncertainty through encouraging diversity in an ensemble of CNNs. To this end, we devise an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism to predict the adversarial examples with low confidence while keeping the predictive confidence of the clean samples high. In the presence of high entropy in our ensemble, we prove that the predictive confidence can be upper-bounded, leading to have a globally fixed threshold over the predictive confidence for identifying adversaries. We analytically justify the role of diversity in our ensemble on mitigating the risk of both black-box and white-box adversarial examples. Finally, we empirically assess the robustness of our ensemble to the black-box and the white-box attacks on several benchmark datasets.The second contribution aims to address the detection of OOD samples through an end-to-end model trained on an appropriate OOD set. To this end, we address the following central question: how to differentiate many available OOD sets w.r.t. a given in distribution task to select the most appropriate one, which in turn induces a model with a high detection rate of unseen OOD sets? To answer this question, we hypothesize that the “protection” level of in-distribution sub-manifolds by each OOD set can be a good possible property to differentiate OOD sets. To measure the protection level, we then design three novel, simple, and cost-effective metrics using a pre-trained vanilla CNN. In an extensive series of experiments on image and audio classification tasks, we empirically demonstrate the abilityof an Augmented-CNN (A-CNN) and an explicitly-calibrated CNN for detecting a significantly larger portion of unseen OOD samples, if they are trained on the most protective OOD set. Interestingly, we also observe that the A-CNN trained on the most protective OOD set (calledA-CNN) can also detect the black-box Fast Gradient Sign (FGS) adversarial examples. As the third contribution, we investigate more closely the capacity of the A-CNN on the detection of wider types of black-box adversaries. To increase the capability of A-CNN to detect a larger number of adversaries, we augment its OOD training set with some inter-class interpolated samples. Then, we demonstrate that the A-CNN trained on the most protective OOD set along with the interpolated samples has a consistent detection rate on all types of unseen adversarial examples. Where as training an A-CNN on Projected Gradient Descent (PGD) adversaries does not lead to a stable detection rate on all types of adversaries, particularly the unseen types. We also visually assess the feature space and the decision boundaries in the input space of a vanilla CNN and its augmented counterpart in the presence of adversaries and the clean ones. By a properly trained A-CNN, we aim to take a step toward a unified and reliable end-to-end learning model with small risk rates on both clean samples and the unusual ones, e.g. adversarial and OOD samples.The last contribution is to show a use-case of A-CNN for training a robust object detector on a partially-labeled dataset, particularly a merged dataset. Merging various datasets from similar contexts but with different sets of Object of Interest (OoI) is an inexpensive way to craft a large-scale dataset which covers a larger spectrum of OoIs. Moreover, merging datasets allows achieving a unified object detector, instead of having several separate ones, resultingin the reduction of computational and time costs. However, merging datasets, especially from a similar context, causes many missing-label instances. With the goal of training an integrated robust object detector on a partially-labeled but large-scale dataset, we propose a self-supervised training framework to overcome the issue of missing-label instances in the merged datasets. Our framework is evaluated on a merged dataset with a high missing-label rate. The empirical results confirm the viability of our generated pseudo-labels to enhance the performance of YOLO, as the current (to date) state-of-the-art object detector.

APA, Harvard, Vancouver, ISO, and other styles

12

Shao, Yuanlong. "Learning Sparse Recurrent Neural Networks in Language Modeling." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398942373.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Lu, Yifei. "Deep neural networks and fraud detection." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-331833.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Kalogiras, Vasileios. "Sentiment Classification with Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217858.

Full text

Abstract:

Attitydanalys är ett delfält av språkteknologi (NLP) som försöker analysera känslan av skriven text. Detta är ett komplext problem som medför många utmaningar. Av denna anledning har det studerats i stor utsträckning. Under de senaste åren har traditionella maskininlärningsalgoritmer eller handgjord metodik använts och givit utmärkta resultat. Men den senaste renässansen för djupinlärning har växlat om intresse till end to end deep learning-modeller.Å ena sidan resulterar detta i mer kraftfulla modeller men å andra sidansaknas klart matematiskt resonemang eller intuition för dessa modeller. På grund av detta görs ett försök i denna avhandling med att kasta ljus på nyligen föreslagna deep learning-arkitekturer för attitydklassificering. En studie av deras olika skillnader utförs och ger empiriska resultat för hur ändringar i strukturen eller kapacitet hos modellen kan påverka exaktheten och sättet den representerar och ''förstår'' meningarna.
Sentiment analysis is a subfield of natural language processing (NLP) that attempts to analyze the sentiment of written text.It is is a complex problem that entails different challenges. For this reason, it has been studied extensively. In the past years traditional machine learning algorithms or handcrafted methodologies used to provide state of the art results. However, the recent deep learning renaissance shifted interest towards end to end deep learning models. On the one hand this resulted into more powerful models but on the other hand clear mathematical reasoning or intuition behind distinct models is still lacking. As a result, in this thesis, an attempt to shed some light on recently proposed deep learning architectures for sentiment classification is made.A study of their differences is performed as well as provide empirical results on how changes in the structure or capacity of a model can affect its accuracy and the way it represents and ''comprehends'' sentences.

APA, Harvard, Vancouver, ISO, and other styles

15

Choi, Keunwoo. "Deep neural networks for music tagging." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/46029.

Full text

Abstract:

In this thesis, I present my hypothesis, experiment results, and discussion that are related to various aspects of deep neural networks for music tagging. Music tagging is a task to automatically predict the suitable semantic label when music is provided. Generally speaking, the input of music tagging systems can be any entity that constitutes music, e.g., audio content, lyrics, or metadata, but only the audio content is considered in this thesis. My hypothesis is that we can fi nd effective deep learning practices for the task of music tagging task that improves the classi fication performance. As a computational model to realise a music tagging system, I use deep neural networks. Combined with the research problem, the scope of this thesis is the understanding, interpretation, optimisation, and application of deep neural networks in the context of music tagging systems. The ultimate goal of this thesis is to provide insight that can help to improve deep learning-based music tagging systems. There are many smaller goals in this regard. Since using deep neural networks is a data-driven approach, it is crucial to understand the dataset. Selecting and designing a better architecture is the next topic to discuss. Since the tagging is done with audio input, preprocessing the audio signal becomes one of the important research topics. After building (or training) a music tagging system, fi nding a suitable way to re-use it for other music information retrieval tasks is a compelling topic, in addition to interpreting the trained system. The evidence presented in the thesis supports that deep neural networks are powerful and credible methods for building a music tagging system.

APA, Harvard, Vancouver, ISO, and other styles

16

Yin, Yonghua. "Random neural networks for deep learning." Thesis, Imperial College London, 2018. http://hdl.handle.net/10044/1/64917.

Full text

Abstract:

The random neural network (RNN) is a mathematical model for an 'integrate and fire' spiking network that closely resembles the stochastic behaviour of neurons in mammalian brains. Since its proposal in 1989, there have been numerous investigations into the RNN's applications and learning algorithms. Deep learning (DL) has achieved great success in machine learning, but there has been no research into the properties of the RNN for DL to combine their power. This thesis intends to bridge the gap between RNNs and DL, in order to provide powerful DL tools that are faster, and that can potentially be used with less energy expenditure than existing methods. Based on the RNN function approximator proposed by Gelenbe in 1999, the approximation capability of the RNN is investigated and an efficient classifier is developed. By combining the RNN, DL and non-negative matrix factorisation, new shallow and multi-layer non-negative autoencoders are developed. The autoencoders are tested on typical image datasets and real-world datasets from different domains, and the test results yield the desired high learning accuracy. The concept of dense nuclei/clusters is examined, using RNN theory as a basis. In dense nuclei, neurons may interconnect via soma-to-soma interactions and conventional synaptic connections. A mathematical model of the dense nuclei is proposed and the transfer function can be deduced. A multi-layer architecture of the dense nuclei is constructed for DL, whose value is demonstrated by experiments on multi-channel datasets and server-state classification in cloud servers. A theoretical study into the multi-layer architecture of the standard RNN (MLRNN) for DL is presented. Based on the layer-output analyses, the MLRNN is shown to be a universal function approximator. The effects of the layer number on the learning capability and high-level representation extraction are analysed. A hypothesis for transforming the DL problem into a moment-learning problem is also presented. The power of the standard RNN for DL is investigated. The ability of the RNN with only positive parameters to conduct image convolution operations is demonstrated. The MLRNN equipped with the developed training algorithm achieves comparable or better classification at a lower computation cost than conventional DL methods.

APA, Harvard, Vancouver, ISO, and other styles

17

Zagoruyko, Sergey. "Weight parameterizations in deep neural networks." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1129/document.

Full text

Abstract:

Les réseaux de neurones multicouches ont été proposés pour la première fois il y a plus de trois décennies, et diverses architectures et paramétrages ont été explorés depuis. Récemment, les unités de traitement graphique ont permis une formation très efficace sur les réseaux neuronaux et ont permis de former des réseaux beaucoup plus grands sur des ensembles de données plus importants, ce qui a considérablement amélioré le rendement dans diverses tâches d'apprentissage supervisé. Cependant, la généralisation est encore loin du niveau humain, et il est difficile de comprendre sur quoi sont basées les décisions prises. Pour améliorer la généralisation et la compréhension, nous réexaminons les problèmes de paramétrage du poids dans les réseaux neuronaux profonds. Nous identifions les problèmes les plus importants, à notre avis, dans les architectures modernes : la profondeur du réseau, l'efficacité des paramètres et l'apprentissage de tâches multiples en même temps, et nous essayons de les aborder dans cette thèse. Nous commençons par l'un des problèmes fondamentaux de la vision par ordinateur, le patch matching, et proposons d'utiliser des réseaux neuronaux convolutifs de différentes architectures pour le résoudre, au lieu de descripteurs manuels. Ensuite, nous abordons la tâche de détection d'objets, où un réseau devrait apprendre simultanément à prédire à la fois la classe de l'objet et l'emplacement. Dans les deux tâches, nous constatons que le nombre de paramètres dans le réseau est le principal facteur déterminant sa performance, et nous explorons ce phénomène dans les réseaux résiduels. Nos résultats montrent que leur motivation initiale, la formation de réseaux plus profonds pour de meilleures représentations, ne tient pas entièrement, et des réseaux plus larges avec moins de couches peuvent être aussi efficaces que des réseaux plus profonds avec le même nombre de paramètres. Dans l'ensemble, nous présentons une étude approfondie sur les architectures et les paramétrages de poids, ainsi que sur les moyens de transférer les connaissances entre elles
Multilayer neural networks were first proposed more than three decades ago, and various architectures and parameterizations were explored since. Recently, graphics processing units enabled very efficient neural network training, and allowed training much larger networks on larger datasets, dramatically improving performance on various supervised learning tasks. However, the generalization is still far from human level, and it is difficult to understand on what the decisions made are based. To improve on generalization and understanding we revisit the problems of weight parameterizations in deep neural networks. We identify the most important, to our mind, problems in modern architectures: network depth, parameter efficiency, and learning multiple tasks at the same time, and try to address them in this thesis. We start with one of the core problems of computer vision, patch matching, and propose to use convolutional neural networks of various architectures to solve it, instead of manual hand-crafting descriptors. Then, we address the task of object detection, where a network should simultaneously learn to both predict class of the object and the location. In both tasks we find that the number of parameters in the network is the major factor determining it's performance, and explore this phenomena in residual networks. Our findings show that their original motivation, training deeper networks for better representations, does not fully hold, and wider networks with less layers can be as effective as deeper with the same number of parameters. Overall, we present an extensive study on architectures and weight parameterizations, and ways of transferring knowledge between them

APA, Harvard, Vancouver, ISO, and other styles

18

Ioannou, Yani Andrew. "Structural priors in deep neural networks." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/278976.

Full text

Abstract:

Deep learning has in recent years come to dominate the previously separate fields of research in machine learning, computer vision, natural language understanding and speech recognition. Despite breakthroughs in training deep networks, there remains a lack of understanding of both the optimization and structure of deep networks. The approach advocated by many researchers in the field has been to train monolithic networks with excess complexity, and strong regularization --- an approach that leaves much to desire in efficiency. Instead we propose that carefully designing networks in consideration of our prior knowledge of the task and learned representation can improve the memory and compute efficiency of state-of-the art networks, and even improve generalization --- what we propose to denote as structural priors. We present two such novel structural priors for convolutional neural networks, and evaluate them in state-of-the-art image classification CNN architectures. The first of these methods proposes to exploit our knowledge of the low-rank nature of most filters learned for natural images by structuring a deep network to learn a collection of mostly small, low-rank, filters. The second addresses the filter/channel extents of convolutional filters, by learning filters with limited channel extents. The size of these channel-wise basis filters increases with the depth of the model, giving a novel sparse connection structure that resembles a tree root. Both methods are found to improve the generalization of these architectures while also decreasing the size and increasing the efficiency of their training and test-time computation. Finally, we present work towards conditional computation in deep neural networks, moving towards a method of automatically learning structural priors in deep networks. We propose a new discriminative learning model, conditional networks, that jointly exploit the accurate representation learning capabilities of deep neural networks with the efficient conditional computation of decision trees. Conditional networks yield smaller models, and offer test-time flexibility in the trade-off of computation vs. accuracy.

APA, Harvard, Vancouver, ISO, and other styles

19

Billman, Linnar, and Johan Hullberg. "Speech Reading with Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-360022.

Full text

Abstract:

Recent growth in computational power and available data has increased popularityand progress of machine learning techniques. Methods of machine learning areused for automatic speech recognition in order to allow humans to transferinformation to computers simply by speech. In the present work, we are interestedin doing this for general contexts as e.g. speakers talking on TV or newsreadersrecorded in a studio. Automatic speech recognition systems are often solely basedon acoustic data. By introducing visual data such as lip movements, robustness ofsuch system can be increased.This thesis instead investigates how well machine learning techniques can learnthe art of lip reading as a sole source for automatic speech recognition. The keyidea is to use a sequence of 24 lip coordinates to feed to the system, rather thanlearning directly from the raw video frames.This thesis designs a solution around this principle empowered by state-of-the-artmachine learning techniques such as recurrent neural networks, making use ofGPUs. We find that this design reduces computational requirements by more thana factor of 25 compared to a state-of-art machine learning solution called LipNet.This however also scales down performance to an accuracy of 80% of what LipNetachieves, while still outperforming human recognition by a factor of 150%. Theaccuracies are based on processing of yet unseen speakers.This text presents this architecture. It details its design, reports its results, andcompares its performance to an existing solution. Basedon this, it is indicated how the result can be further refined.

APA, Harvard, Vancouver, ISO, and other styles

20

Wang, Shenhao. "Deep neural networks for choice analysis." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/129894.

Full text

Abstract:

Thesis: Ph. D. in Computer and Urban Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, September, 2020
Cataloged from student-submitted PDF of thesis.
Includes bibliographical references (pages 117-128).
As deep neural networks (DNNs) outperform classical discrete choice models (DCMs) in many empirical studies, one pressing question is how to reconcile them in the context of choice analysis. So far researchers mainly compare their prediction accuracy, treating them as completely different modeling methods. However, DNNs and classical choice models are closely related and even complementary. This dissertation seeks to lay out a new foundation of using DNNs for choice analysis. It consists of three essays, which respectively tackle the issues of economic interpretation, architectural design, and robustness of DNNs by using classical utility theories. Essay 1 demonstrates that DNNs can provide economic information as complete as the classical DCMs.
The economic information includes choice predictions, choice probabilities, market shares, substitution patterns of alternatives, social welfare, probability derivatives, elasticities, marginal rates of substitution (MRS), and heterogeneous values of time (VOT). Unlike DCMs, DNNs can automatically learn the utility function and reveal behavioral patterns that are not prespecified by modelers. However, the economic information from DNNs can be unreliable because the automatic learning capacity is associated with three challenges: high sensitivity to hyperparameters, model non-identification, and local irregularity. To demonstrate the strength of DNNs as well as the three issues, I conduct an empirical experiment by applying the DNNs to a stated preference survey and discuss successively the full list of economic information extracted from the DNNs. Essay 2 designs a particular DNN architecture with alternative-specific utility functions (ASU-DNN) by using prior behavioral knowledge.
Theoretically, ASU-DNN reduces the estimation error of fully connected DNN (F-DNN) because of its lighter architecture and sparser connectivity, although the constraint of alternative-specific utility could cause ASU-DNN to exhibit a larger approximation error. Both ASU-DNN and F-DNN can be treated as special cases of DNN architecture design guided by utility connectivity graph (UCG). Empirically, ASU-DNN has 2-3% higher prediction accuracy than F-DNN. The alternative-specific connectivity constraint, as a domain-knowledge- based regularization method, is more effective than other regularization methods. This essay demonstrates that prior behavioral knowledge can be used to guide the architecture design of DNN, to function as an effective domain-knowledge-based regularization method, and to improve both the interpretability and predictive power of DNNs in choice analysis.
Essay 3 designs a theory-based residual neural network (TB-ResNet) with a two-stage training procedure, which synthesizes decision-making theories and DNNs in a linear manner. Three instances of TB-ResNets based on choice modeling (CM-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting (HD-ResNets) are designed. Empirically, compared to the decision-making theories, the three instances of TB-ResNets predict significantly better in the out-of-sample test and become more interpretable owing to the rich utility function augmented by DNNs. Compared to the DNNs, the TB-ResNets predict better because the decision-making theories aid in localizing and regularizing the DNN models. TB-ResNets also become more robust than DNNs because the decision-making theories stablize the local utility function and the input gradients.
This essay demonstrates that it is both feasible and desirable to combine the handcrafted utility theory and automatic utility specification, with joint improvement in prediction, interpretation, and robustness.
by Shenhao Wang.
Ph. D. in Computer and Urban Science
Ph.D.inComputerandUrbanScience Massachusetts Institute of Technology, Department of Urban Studies and Planning

APA, Harvard, Vancouver, ISO, and other styles

21

Sunnegårdh, Christina. "Scar detection using deep neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299576.

Full text

Abstract:

Object detection is a computer vision method that deals with the tasks of localizing and classifying objects within an image. The number of usages for the method is constantly growing, and this thesis investigates the unexplored area of using deep neural networks for scar detection. Furthermore, the thesis investigates using the scar detector as a basis for the binary classification task of deciding whether in-the-wild images contains a scar or not. Two pre-trained object detection models, Faster R-CNN and RetinaNet, were trained on 1830 manually labeled images using different hyperparameters. Faster R-CNN Inception ResNet V2 achieved the highest results in terms of Average Precision (AP), particularly at higher IoU thresholds, closely followed by Faster R-CNN ResNet50, and finally RetinaNet. The results both indicate the superiority of Faster R-CNN compared to RetinaNet, as well as using Inception ResNet V2 as feature extractor for a large variety of object sizes. The reason is most likely due to multiple convolutional filters of different sizes operating at the same levels in the Inception ResNet network. As for inference time, RetinaNet was the fastest, followed by Faster R-CNN ResNet50 and finally Faster R-CNN Inception ResNet V2. For the binary classification task, the models were tested on a set of 200 images, where half of the images contained clearly visible scars. Faster R-CNN ResNet50 achieved the highest accuracy, followed by Faster R-CNN Inception ResNet V2 and finally RetinaNet. While the accuracy of RetinaNet suffered mainly from a low recall, Faster R-CNN Inception ResNet V2 detected some actual scars in images that had not been labeled due to low image quality, which could be a matter of subjective labeling and that the model is punished for something that at other times might be considered correct. In conclusion, this thesis shows promising results of using object detection to detect scars in images. While two-stage Faster R-CNN holds the advantage in AP for scar detection, one-stage RetinaNet holds the advantage in speed. Suggestions for future work include eliminating biases by putting more effort into labeling data as well as including training data that contain objects for which the models produced false positives. Examples of this are wounds, knuckles, and possible background objects that are visually similar to scars.
Objektdetektion är en metod inom datorseende som inkluderar både lokalisering och klassificering av objekt i bilder. Antalet användningsområden för metoden växer ständigt och denna studie undersöker det outforskade området av att använda djupa neurala nätverk för detektering av ärr. Studien utforskar även att använda detektering av ärr som grund för den binära klassificeringsuppgiften att bestämma om bilder innehåller ett synligt ärr eller inte. Två förtränade objektdetekteringsmodeller, Faster R-CNN och RetinaNet, tränades med olika hyperparametrar på 1830 manuellt märkta bilder. Faster RCNN Inception ResNet V2 uppnådde bäst resultat med avseende på average precision (AP), tätt följd av Faster R-CNN ResNet50 och slutligen RetinaNet. Resultatet indikerar både överlägsenhet av Faster R-CNN gentemot RetinaNet, såväl som att använda Inception ResNet V2 för särdragsextrahering. Detta beror med stor sannolikhet på dess användning av faltningsfilter i flera storlekar på samma nivåer i nätverket. Gällande detekteringstid per bild var RetinaNet snabbast, följd av Faster R-CNN ResNet50 och slutligen Faster R-CNN Inception ResNet V2. För den binära klassificeringsuppgiften testades modellerna på 200 bilder, där hälften av bilderna innehöll tydligt synliga ärr. Faster RCNN ResNet50 uppnådde högst träffsäkerhet, följt av Faster R-CNN Inception ResNet V2 och till sist RetinaNet. Medan träffsäkerheten för RetinaNet huvudsakligen bestraffades på grund av att ha förbisett ärr i bilder, så detekterade Faster R-CNN Inception ResNet V2 ett flertal faktiska ärr som inte datamärkts på grund av bristande bildkvalitet. Detta kan dock vara en fråga om subjektiv datamärkning och att modellen bestraffas för något som andra gånger skulle kunna anses korrekt. Sammanfattningsvis visar denna studie lovande resultat av att använda objektdetektion för att detektera ärr i bilder. Medan tvåstegsmodellen Faster R-CNN har övertaget sett till AP, har enstegsmodellen RetinaNet övertaget sett till detekteringstid. Förslag för framtida arbete inkluderar att lägga större vikt vid märkning av data för att eliminera potentiell subjektivitet, samt inkludera träningsdata innehållande objekt som modellerna misstog för ärr. Exempel på detta är öppna sår, knogar och bakgrundsobjekt som visuellt liknar ärr.

APA, Harvard, Vancouver, ISO, and other styles

22

Landeen, Trevor J. "Association Learning Via Deep Neural Networks." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7028.

Full text

Abstract:

Deep learning has been making headlines in recent years and is often portrayed as an emerging technology on a meteoric rise towards fully sentient artificial intelligence. In reality, deep learning is the most recent renaissance of a 70 year old technology and is far from possessing true intelligence. The renewed interest is motivated by recent successes in challenging problems, the accessibility made possible by hardware developments, and dataset availability. The predecessor to deep learning, commonly known as the artificial neural network, is a computational network setup to mimic the biological neural structure found in brains. However, unlike human brains, artificial neural networks, in most cases cannot make inferences from one problem to another. As a result, developing an artificial neural network requires a large number of examples of desired behavior for a specific problem. Furthermore, developing an artificial neural network capable of solving the problem can take days, or even weeks, of computations. Two specific problems addressed in this dissertation are both input association problems. One problem challenges a neural network to identify overlapping regions in images and is used to evaluate the ability of a neural network to learn associations between inputs of similar types. The other problem asks a neural network to identify which observed wireless signals originated from observed potential sources and is used to assess the ability of a neural network to learn associations between inputs of different types. The neural network solutions to both problems introduced, discussed, and evaluated in this dissertation demonstrate deep learning’s applicability to problems which have previously attracted little attention.

APA, Harvard, Vancouver, ISO, and other styles

23

Srivastava, Sanjana. "On foveation of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123134.

Full text

Abstract:

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 61-63).
The human ability to recognize objects is impaired when the object is not shown in full. "Minimal images" are the smallest regions of an image that remain recognizable for humans. [26] show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of- the-art convolutional neural networks (CNNs), and are much more prominent in CNNs. We found many cases where CNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in CNNs. Our results thus reveal a new failure mode of CNNs that also affects humans to a lesser degree. They expose how fragile CNN recognition ability is for natural images even without synthetic adversarial patterns being introduced. This opens potential for CNN robustness in natural images to be brought to the human level by taking inspiration from human robustness methods. One of these is eccentricity dependence, a model of human focus in which attention to the visual input degrades proportional to distance from the focal point [7]. We demonstrate that applying the "inverted pyramid" eccentricity method, a multi-scale input transformation, makes CNNs more robust to useless background features than a standard raw-image input. Our results also find that using the inverted pyramid method generally reduces useless background pixels, therefore reducing required training data.
by Sanjana Srivastava.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

24

Grechka, Asya. "Image editing with deep neural networks." Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS683.pdf.

Full text

Abstract:

L'édition d'images a une histoire riche remontant à plus de deux siècles. Cependant, l'édition "classique" des images requiert une grande maîtrise artistique et nécessitent un temps considérable, souvent plusieurs heures, pour modifier chaque image. Ces dernières années, d'importants progrès dans la modélisation générative ont permis la synthèse d'images réalistes et de haute qualité. Toutefois, l'édition d'une image réelle est un vrai défi nécessitant de synthétiser de nouvelles caractéristiques tout en préservant fidèlement une partie de l'image d'origine. Dans cette thèse, nous explorons différentes approches pour l'édition d'images en exploitant trois familles de modèles génératifs : les GANs, les auto-encodeurs variationnels et les modèles de diffusion. Tout d'abord, nous étudions l'utilisation d'un GAN pré-entraîné pour éditer une image réelle. Bien que des méthodes d'édition d'images générées par des GANs soient bien connues, elles ne se généralisent pas facilement aux images réelles. Nous analysons les raisons de cette limitation et proposons une solution pour mieux projeter une image réelle dans un GAN afin de la rendre éditable. Ensuite, nous utilisons des autoencodeurs variationnels avec quantification vectorielle pour obtenir directement une représentation compacte de l'image (ce qui faisait défaut avec les GANs) et optimiser le vecteur latent de manière à se rapprocher d'un texte souhaité. Nous cherchons à contraindre ce problème, qui pourrait être vulnérable à des exemples adversariaux. Nous proposons une méthode pour choisir les hyperparamètres en fonction de la fidélité et de l'édition des images modifiées. Nous présentons un protocole d'évaluation robuste et démontrons l'intérêt de notre approche. Enfin, nous abordons l'édition d'images sous l'angle particulier de l'inpainting. Notre objectif est de synthétiser une partie de l'image tout en préservant le reste intact. Pour cela, nous exploitons des modèles de diffusion pré-entraînés et nous appuyons sur la méthode classique d'inpainting en remplaçant, à chaque étape du processus de débruitage, la partie que nous ne souhaitons pas modifier par l'image réelle bruitée. Cependant, cette méthode peut entraîner une désynchronisation entre la partie générée et la partie réelle. Nous proposons une approche basée sur le calcul du gradient d'une fonction qui évalue l'harmonisation entre les deux parties. Nous guidons ainsi le processus de débruitage en utilisant ce gradient
Image editing has a rich history which dates back two centuries. That said, "classic" image editing requires strong artistic skills as well as considerable time, often in the scale of hours, to modify an image. In recent years, considerable progress has been made in generative modeling which has allowed realistic and high-quality image synthesis. However, real image editing is still a challenge which requires a balance between novel generation all while faithfully preserving parts of the original image. In this thesis, we will explore different approaches to edit images, leveraging three families of generative networks: GANs, VAEs and diffusion models. First, we study how to use a GAN to edit a real image. While methods exist to modify generated images, they do not generalize easily to real images. We analyze the reasons for this and propose a solution to better project a real image into the GAN's latent space so as to make it editable. Then, we use variational autoencoders with vector quantification to directly obtain a compact image representation (which we could not obtain with GANs) and optimize the latent vector so as to match a desired text input. We aim to constrain this problem, which on the face could be vulnerable to adversarial attacks. We propose a method to chose the hyperparameters while optimizing simultaneously the image quality and the fidelity to the original image. We present a robust evaluation protocol and show the interest of our method. Finally, we abord the problem of image editing from the view of inpainting. Our goal is to synthesize a part of an image while preserving the rest unmodified. For this, we leverage pre-trained diffusion models and build off on their classic inpainting method while replacing, at each denoising step, the part which we do not wish to modify with the noisy real image. However, this method leads to a disharmonization between the real and generated parts. We propose an approach based on calculating a gradient of a loss which evaluates the harmonization of the two parts. We guide the denoising process with this gradient

APA, Harvard, Vancouver, ISO, and other styles

25

Andersson, Viktor. "Semantic Segmentation : Using Convolutional Neural Networks and Sparse dictionaries." Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139367.

Full text

Abstract:

The two main bottlenecks using deep neural networks are data dependency and training time. This thesis proposes a novel method for weight initialization of the convolutional layers in a convolutional neural network. This thesis introduces the usage of sparse dictionaries. A sparse dictionary optimized on domain specific data can be seen as a set of intelligent feature extracting filters. This thesis investigates the effect of using such filters as kernels in the convolutional layers in the neural network. How do they affect the training time and final performance? The dataset used here is the Cityscapes-dataset which is a library of 25000 labeled road scene images.The sparse dictionary was acquired using the K-SVD method. The filters were added to two different networks whose performance was tested individually. One of the architectures is much deeper than the other. The results have been presented for both networks. The results show that filter initialization is an important aspect which should be taken into consideration while training the deep networks for semantic segmentation.

APA, Harvard, Vancouver, ISO, and other styles

26

Chen, Zhe. "Augmented Context Modelling Neural Networks." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20654.

Full text

Abstract:

Contexts provide beneficial information for machine-based image understanding tasks. However, existing context modelling methods still cannot fully exploit contexts, especially for object recognition and detection. In this thesis, we develop augmented context modelling neural networks to better utilize contexts for different object recognition and detection tasks. Our contributions are two-fold: 1) we introduce neural networks to better model instance-level visual relationships; 2) we introduce neural network-based algorithms to better utilize contexts from 3D information and synthesized data. In particular, to augment the modelling of instance-level visual relationships, we propose a context refinement network and an encapsulated context modelling network for object detection. In the context refinement study, we propose to improve the modeling of visual relationships by introducing overlap scores and confidence scores of different regions. In addition, in the encapsulated context modelling study, we boost the context modelling performance by exploiting the more powerful capsule-based neural networks. To augment the modeling of contexts from different sources, we propose novel neural networks to better utilize 3D information and synthesis-based contexts. For the modelling of 3D information, we mainly investigate the modelling of LiDAR data for road detection and the depth data for instance segmentation, respectively. In road detection, we develop a progressive LiDAR adaptation algorithm to improve the fusion of 3D LiDAR data and 2D image data. Regarding instance segmentation, we model depth data as context to help tackle the low-resolution annotation-based training problem. Moreover, to improve the modelling of synthesis-based contexts, we devise a shape translation-based pedestrian generation framework to help improve the pedestrian detection performance.

APA, Harvard, Vancouver, ISO, and other styles

27

Habibi, Aghdam Hamed. "Understanding Road Scenes using Deep Neural Networks." Doctoral thesis, Universitat Rovira i Virgili, 2018. http://hdl.handle.net/10803/461607.

Full text

Abstract:

La comprensió de les escenes de la carretera és fonamental per als automòbils autònoms. Això requereix segmentar escenes de carreteres en regions semànticament significatives i reconèixer objectes en una escena. Tot i que objectes com ara cotxes i vianants han de segmentar-se amb precisió, és possible que no sigui necessari detectar i localitzar aquests objectes en una escena. Tanmateix, detectar i classificar objectes com ara els senyals de trànsit és fonamental per ajustar-se a les regles del camí. En aquesta tesi, primer proposem un mètode per classificar senyals de trànsit amb atributs visuals i xarxes bayesianes. A continuació, proposem dues xarxes neuronals per a aquest propòsit i desenvolupem un nou mètode per crear un conjunt de models. A continuació, estudiem la sensibilitat de les xarxes neuronals contra mostres adversàries i proposem dues xarxes de denoising que s'adjunten a les xarxes de classificació per augmentar la seva estabilitat contra el soroll. A la segona part de la tesi, primer proposem una xarxa per detectar senyals de trànsit en imatges d'alta resolució en temps real i mostrar com implementar la tècnica de la finestra d'escaneig dins de la nostra xarxa utilitzant convolucions dilatades. A continuació, formulem el problema de detecció com a problema de segmentació i proposem una xarxa totalment convolucional per detectar senyals de trànsit. ? Finalment, proposem una nova xarxa totalment convolucional composta de mòduls de foc, connexions de derivació i convolucions consecutives dilatades? En l'última part de la tesi per a escenes de camins segmentinc en regions semànticament significatives i demostrar que és més accentuat i computacionalment més eficient en comparació amb xarxes similars
Comprender las escenas de la carretera es crucial para los automóviles autónomos. Esto requiere segmentar escenas de carretera en regiones semánticamente significativas y reconocer objetos en una escena. Mientras que los objetos tales como coches y peatones tienen que segmentarse con precisión, puede que no sea necesario detectar y localizar estos objetos en una escena. Sin embargo, la detección y clasificación de objetos tales como señales de tráfico es esencial para ajustarse a las reglas de la carretera. En esta tesis, proponemos un método para la clasificación de señales de tráfico utilizando atributos visuales y redes bayesianas. A continuación, proponemos dos redes neuronales para este fin y desarrollar un nuevo método para crear un conjunto de modelos. A continuación, se estudia la sensibilidad de las redes neuronales frente a las muestras adversarias y se proponen dos redes destructoras que se unen a las redes de clasificación para aumentar su estabilidad frente al ruido. En la segunda parte de la tesis, proponemos una red para detectar señales de tráfico en imágenes de alta resolución en tiempo real y mostrar cómo implementar la técnica de ventana de escaneo dentro de nuestra red usando circunvoluciones dilatadas. A continuación, formulamos el problema de detección como un problema de segmentación y proponemos una red completamente convolucional para detectar señales de tráfico. Finalmente, proponemos una nueva red totalmente convolucional compuesta de módulos de fuego, conexiones de bypass y circunvoluciones consecutivas dilatadas en la última parte de la tesis para escenarios de carretera segmentinc en regiones semánticamente significativas y muestran que es más accuarate y computacionalmente más eficiente en comparación con redes similares
Understanding road scenes is crucial for autonomous cars. This requires segmenting road scenes into semantically meaningful regions and recognizing objects in a scene. While objects such as cars and pedestrians has to be segmented accurately, it might not be necessary to detect and locate these objects in a scene. However, detecting and classifying objects such as traffic signs is essential for conforming to road rules. In this thesis, we first propose a method for classifying traffic signs using visual attributes and Bayesian networks. Then, we propose two neural network for this purpose and develop a new method for creating an ensemble of models. Next, we study sensitivity of neural networks against adversarial samples and propose two denoising networks that are attached to the classification networks to increase their stability against noise. In the second part of the thesis, we first propose a network to detect traffic signs in high-resolution images in real-time and show how to implement the scanning window technique within our network using dilated convolutions. Then, we formulate the detection problem as a segmentation problem and propose a fully convolutional network for detecting traffic signs. Finally, we propose a new fully convolutional network composed of fire modules, bypass connections and consecutive dilated convolutions in the last part of the thesis for segmenting road scenes into semantically meaningful regions and show that it is more accurate and computationally more efficient compared to similar networks.

APA, Harvard, Vancouver, ISO, and other styles

28

Antoniades, Andreas. "Interpreting biomedical data via deep neural networks." Thesis, University of Surrey, 2018. http://epubs.surrey.ac.uk/845765/.

Full text

Abstract:

Machine learning technology has taken quantum leaps in the past few years. From the rise of voice recognition as an interface to interact with our computers, to self-organising photo albums and self-driving cars. Neural networks and deep learning contributed significantly to drive this revolution. Yet, biomedicine is one of the research areas that has yet to fully embrace the possibilities of deep learning. Engaged in a cross-disciplinary subject, researchers, and clinical experts are focused on machine learning and statistical signal processing techniques. The ability to learn hierarchical features makes deep learning models highly applicable to biomedicine and researchers have started to notice this. The first works of deep learning in biomedicine are emerging with applications in diagnostics and genomics analysis. These models offer excellent accuracy, even comparable to that of human doctors. Despite the exceptional classification performance of these models, they are still used to provide \textit{quantitative} results. Diagnosing cancer proficiently and faster than a human doctor is beneficial, but automatically finding which biomarkers indicate the existence of cancerous cells would be invaluable. This type of \textit{qualitative} insight can be enabled by the hierarchical features and learning coefficients that manifest in deep models. It is this \textit{qualitative} approach that enables the interpretability of data and explainability of neural networks for biomedicine, which is the overarching aim of this thesis. As such, the aim of this thesis is to investigate the use of neural networks and deep learning models for the qualitative assessment of biomedical datasets. The first contribution is the proposition of a non-iterative, data agnostic feature selection algorithm to retain original features and provide qualitative analysis on their importance. This algorithm is employed in numerous areas including Pima Indian diabetes and children tumour detection. Next, the thesis focuses on the topic of epilepsy studied through scalp and intracranial electroencephalogram recordings of human brain activity. The second contribution promotes the use of deep learning models for the automatic generation of clinically meaningful features, as opposed to traditional handcrafted features. Convolutional neural networks are adapted to accommodate the intricacies of electroencephalogram data and trained to detect epileptiform discharges. The learning coefficients of these models are examined and found to contain clinically significant features. When combined, in a hierarchical way, these features reveal useful insights for the evaluation of treatment effectivity. The final contribution addresses the difficulty in acquiring intracranial data due to the invasive nature of the recording procedure. A non-linear brain mapping algorithm is proposed to link the electrical activities recorded on the scalp to those inside the cranium. This process improves the generalisation of models and alleviates the need for surgical procedures. %This is accomplished via an asymmetric autoencoder that accounts for differences in the dimensionality of the electroencephalogram data and improves the quality of the data.

APA, Harvard, Vancouver, ISO, and other styles

29

Avramova, Vanya. "Curriculum Learning with Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-178453.

Full text

Abstract:

Curriculum learning is a machine learning technique inspired by the way humans acquire knowledge and skills: by mastering simple concepts first, and progressing through information with increasing difficulty to grasp more complex topics. Curriculum Learning, and its derivatives Self Paced Learning (SPL) and Self Paced Learning with Diversity (SPLD), have been previously applied within various machine learning contexts: Support Vector Machines (SVMs), perceptrons, and multi-layer neural networks, where they have been shown to improve both training speed and model accuracy. This project ventured to apply the techniques within the previously unexplored context of deep learning, by investigating how they affect the performance of a deep convolutional neural network (ConvNet) trained on a large labeled image dataset. The curriculum was formed by presenting the training samples to the network in order of increasing difficulty, measured by the sample's loss value based on the network's objective function. The project evaluated SPL and SPLD, and proposed two new curriculum learning sub-variants, p-SPL and p-SPLD, which allow for a smooth progresson of sample inclusion during training. The project also explored the "inversed" versions of the SPL, SPLD, p-SPL and p-SPLD techniques, where the samples were selected for the curriculum in order of decreasing difficulty. The experiments demonstrated that all learning variants perform fairly similarly, within ≈1% average test accuracy margin, based on five trained models per variant. Surprisingly, models trained with the inversed version of the algorithms performed slightly better than the standard curriculum training variants. The SPLD-Inversed, SPL-Inversed and SPLD networks also registered marginally higher accuracy results than the network trained with the usual random sample presentation. The results suggest that while sample ordering does affect the training process, the optimal order in which samples are presented may vary based on the data set and algorithm used. The project also investigated whether some samples were more beneficial for the training process than others. Based on sample difficulty, subsets of samples were removed from the training data set. The models trained on the remaining samples were compared to a default model trained on all samples. On the data set used, removing the “easiest” 10% of samples had no effect on the achieved test accuracy compared to the default model, and removing the “easiest” 40% of samples reduced model accuracy by only ≈1% (compared to ≈6% loss when 40% of the "most difficult" samples were removed, and ≈3% loss when 40% of samples were randomly removed). Taking away the "easiest" samples first (up to a certain percentage of the data set) affected the learning process less negatively than removing random samples, while removing the "most difficult" samples first had the most detrimental effect. The results suggest that the networks derived most learning value from the "difficult" samples, and that a large subset of the "easiest" samples can be excluded from training with minimal impact on the attained model accuracy. Moreover, it is possible to identify these samples early during training, which can greatly reduce the training time for these models.

APA, Harvard, Vancouver, ISO, and other styles

30

Karlsson, Daniel. "Classifying sport videos with deep neural networks." Thesis, Umeå universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-130654.

Full text

Abstract:

This project aims to apply deep neural networks to classify video clips in applications used to streamline advertisements on the web. The system focuses on sport clips but can be expanded into other advertisement fields with lower accuracy and longer training times as a consequence. The main task was to find the neural network model best suited for classifying videos. To achieve this the field was researched and three network models were introduced to see how they could handle the videos. It was proposed that applying a recurrent LSTM structure at the end of an image classification network could make it well adapted to work with videos. The most popular image classification architectures are mostly convolutional neural networks and these structures are also the foundation of all three models. The results from the evaluation of the models as well as the research suggests that using a convolutional LSTM can bean efficient and powerful way of classifying videos. Further this project shows that by reducing the size of the input data with 25%, the training and evaluation time can be cut with around 50%. This comes at the cost of lower accuracy. However it is demonstrated that the performance loss can be compensated by considering more frames from the same videos during evaluation.

APA, Harvard, Vancouver, ISO, and other styles

31

Peng, Zeng. "Pedestrian Tracking by using Deep Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302107.

Full text

Abstract:

This project aims at using deep learning to solve the pedestrian tracking problem for Autonomous driving usage. The research area is in the domain of computer vision and deep learning. Multi-Object Tracking (MOT) aims at tracking multiple targets simultaneously in a video data. The main application scenarios of MOT are security monitoring and autonomous driving. In these scenarios, we often need to track many targets at the same time which is not possible with only object detection or single object tracking algorithms for their lack of stability and usability. Therefore we need to explore the area of multiple object tracking. The proposed method breaks the MOT into different stages and utilizes the motion and appearance information of targets to track them in the video data. We used three different object detectors to detect the pedestrians in frames, a person re-identification model as appearance feature extractor and Kalman filter as motion predictor. Our proposed model achieves 47.6% MOT accuracy and 53.2% in IDF1 score while the results obtained by the model without person re-identification module is only 44.8% and 45.8% respectively. Our experiment results indicate the fact that a robust multiple object tracking algorithm can be achieved by splitted tasks and improved by the representative DNN based appearance features.
Detta projekt syftar till att använda djupinlärning för att lösa problemet med att följa fotgängare för autonom körning. For ligger inom datorseende och djupinlärning. Multi-Objekt-följning (MOT) syftar till att följa flera mål samtidigt i videodata. de viktigaste applikationsscenarierna för MOT är säkerhetsövervakning och autonom körning. I dessa scenarier behöver vi ofta följa många mål samtidigt, vilket inte är möjligt med endast objektdetektering eller algoritmer för enkel följning av objekt för deras bristande stabilitet och användbarhet, därför måste utforska området för multipel objektspårning. Vår metod bryter MOT i olika steg och använder rörelse- och utseendinformation för mål för att spåra dem i videodata, vi använde tre olika objektdetektorer för att upptäcka fotgängare i ramar en personidentifieringsmodell som utseendefunktionsavskiljare och Kalmanfilter som rörelsesprediktor. Vår föreslagna modell uppnår 47,6 % MOT-noggrannhet och 53,2 % i IDF1 medan resultaten som erhållits av modellen utan personåteridentifieringsmodul är endast 44,8%respektive 45,8 %. Våra experimentresultat visade att den robusta algoritmen för multipel objektspårning kan uppnås genom delade uppgifter och förbättras av de representativa DNN-baserade utseendefunktionerna.

APA, Harvard, Vancouver, ISO, and other styles

32

Milner, Rosanna Margaret. "Using deep neural networks for speaker diarisation." Thesis, University of Sheffield, 2016. http://etheses.whiterose.ac.uk/16567/.

Full text

Abstract:

Speaker diarisation answers the question “who spoke when?” in an audio recording. The input may vary, but a system is required to output speaker labelled segments in time. Typical stages are Speech Activity Detection (SAD), speaker segmentation and speaker clustering. Early research focussed on Conversational Telephone Speech (CTS) and Broadcast News (BN) domains before the direction shifted to meetings and, more recently, broadcast media. The British Broadcasting Corporation (BBC) supplied data through the Multi-Genre Broadcast (MGB) Challenge in 2015 which showed the difficulties speaker diarisation systems have on broadcast media data. Diarisation is typically an unsupervised task which does not use auxiliary data or information to enhance a system. However, methods which do involve supplementary data have shown promise. Five semi-supervised methods are investigated which use a combination of inputs: different channel types and transcripts. The methods involve Deep Neural Networks (DNNs) for SAD, DNNs trained for channel detection, transcript alignment, and combinations of these approaches. However, the methods are only applicable when datasets contain the required inputs. Therefore, a method involving a pretrained Speaker Separation Deep Neural Network (ssDNN) is investigated which is applicable to every dataset. This technique performs speaker clustering and speaker segmentation using DNNs successfully for meeting data and with mixed results for broadcast media. The task of diarisation focuses on two aspects: accurate segments and speaker labels. The Diarisation Error Rate (DER) does not evaluate the segmentation quality as it does not measure the number of correctly detected segments. Other metrics exist, such as boundary and purity measures, but these also mask the segmentation quality. An alternative metric is presented based on the F-measure which considers the number of hypothesis segments correctly matched to reference segments. A deeper insight into the segment quality is shown through this metric.

APA, Harvard, Vancouver, ISO, and other styles

33

Karlsson, Jonas. "Auditory Classification of Carsby Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-355673.

Full text

Abstract:

This thesis explores the challenge of using deep neural networks to classify traits incars through sound recognition. These traits could include type of engine, model, or manufacturer of the car. The problem was approached by creating three different neural networks and evaluating their performance in classifying sounds of three different cars. The top scoring neural network achieved an accuracy of 61 percent, which is far from reaching the standard accuracy of modern speech recognition systems. The results do, however, show that there are some tendencies to the data that neural networks can learn. If the methods and networks presented in this report are further built upon, a greater classification performance may be achieved.

APA, Harvard, Vancouver, ISO, and other styles

34

Wang, Yuxuan. "Supervised Speech Separation Using Deep Neural Networks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1426366690.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Wu, Chunyang. "Structured deep neural networks for speech recognition." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/276084.

Full text

Abstract:

Deep neural networks (DNNs) and deep learning approaches yield state-of-the-art performance in a range of machine learning tasks, including automatic speech recognition. The multi-layer transformations and activation functions in DNNs, or related network variations, allow complex and difficult data to be well modelled. However, the highly distributed representations associated with these models make it hard to interpret the parameters. The whole neural network is commonly treated a ``black box''. The behaviours of activation functions and the meanings of network parameters are rarely controlled in the standard DNN training. Though a sensible performance can be achieved, the lack of interpretations to network structures and parameters causes better regularisation and adaptation on DNN models challenging. In regularisation, parameters have to be regularised universally and indiscriminately. For instance, the widely used L2 regularisation encourages all parameters to be zeros. In adaptation, it requires to re-estimate a large number of independent parameters. Adaptation schemes in this framework cannot be effectively performed when there are limited adaptation data. This thesis investigates structured deep neural networks. Special structures are explicitly designed, and they are imposed with desired interpretation to improve DNN regularisation and adaptation. For regularisation, parameters can be separately regularised based on their functions. For adaptation, parameters can be adapted in groups or partially adapted according to their roles in the network topology. Three forms of structured DNNs are proposed in this thesis. The contributions of these models are presented as follows. The first contribution of this thesis is the multi-basis adaptive neural network. This form of structured DNN introduces a set of parallel sub-networks with restricted connections. The design of restricted connectivity allows different aspects of data to be explicitly learned. Sub-network outputs are then combined, and this combination module is used as the speaker-dependent structure that can be robustly estimated for adaptation. The second contribution of this thesis is the stimulated deep neural network. This form of structured DNN relates and smooths activation functions in regions of the network. It aids the visualisation and interpretation of DNN models but also has the potential to reduce over-fitting. Novel adaptation schemes can be performed on it, taking advantages of the smooth property that the stimulated DNN offer. The third contribution of this thesis is the deep activation mixture model. Also, this form of structured DNN encourages the outputs of activation functions to achieve a smooth surface. The output of one hidden layer is explicitly modelled as the sum of a mixture model and a residual model. The mixture model forms an activation contour, and the residual model depicts fluctuations around this contour. The smoothness yielded by a mixture model helps to regularise the overall model and allows novel adaptation schemes.

APA, Harvard, Vancouver, ISO, and other styles

36

Zhang, Jeffrey M. Eng Massachusetts Institute of Technology. "Enhancing adversarial robustness of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122994.

Full text

Abstract:

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 57-58).
Logit-based regularization and pretrain-then-tune are two approaches that have recently been shown to enhance adversarial robustness of machine learning models. In the realm of regularization, Zhang et al. (2019) proposed TRADES, a logit-based regularization optimization function that has been shown to improve upon the robust optimization framework developed by Madry et al. (2018) [14, 9]. They were able to achieve state-of-the-art adversarial accuracy on CIFAR10. In the realm of pretrain- then-tune models, Hendrycks el al. (2019) demonstrated that adversarially pretraining a model on ImageNet then adversarially tuning on CIFAR10 greatly improves the adversarial robustness of machine learning models. In this work, we propose Adversarial Regularization, another logit-based regularization optimization framework that surpasses TRADES in adversarial generalization. Furthermore, we explore the impact of trying different types of adversarial training on the pretrain-then-tune paradigm.
by Jeffry Zhang.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

37

Miglani, Vivek N. "Comparing learned representations of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123048.

Full text

Abstract:

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 63-64).
In recent years, a variety of deep neural network architectures have obtained substantial accuracy improvements in tasks such as image classification, speech recognition, and machine translation, yet little is known about how different neural networks learn. To further understand this, we interpret the function of a deep neural network used for classification as converting inputs to a hidden representation in a high dimensional space and applying a linear classifier in this space. This work focuses on comparing these representations as well as the learned input features for different state-of-the-art convolutional neural network architectures. By focusing on the geometry of this representation, we find that different network architectures trained on the same task have hidden representations which are related by linear transformations. We find that retraining the same network architecture with a different initialization does not necessarily lead to more similar representation geometry for most architectures, but the ResNeXt architecture consistently learns similar features and hidden representation geometry. We also study connections to adversarial examples and observe that networks with more similar hidden representation geometries also exhibit higher rates of adversarial example transferability.
by Vivek N. Miglani.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

38

Bayer, Ali Orkan. "Semantic Language models with deep neural Networks." Doctoral thesis, Università degli studi di Trento, 2015. https://hdl.handle.net/11572/367784.

Full text

Abstract:

Spoken language systems (SLS) communicate with users in natural language through speech. There are two main problems related to processing the spoken input in SLS. The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. We focus on the language model (LM) component of SLS. LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. It has long been discussed that an improvement in the recognition performance does not necessarily yield a better understanding performance. Therefore, optimization of LMs for the understanding performance is crucial. In addition, long-range dependencies in languages are hard to handle with statistical language models. These two problems are addressed in this thesis. We investigate two different LM structures. The first LM that we investigate enable SLS to understand better what they recognize by searching the ASR hypotheses for the best understanding performance. We refer to these models as joint LMs. They use lexical and semantic units jointly in the LM. The second LM structure uses the semantic context of an utterance, which can also be described as â€œwhat the system understandsâ€ , to search for a better hypothesis that improves the recognition and the understanding performance. We refer to these models as semantic LMs (SELMs). SELMs use features that are based on a well established theory of lexical semantics, namely the theory of frame semantics. They incorporate the semantic features which are extracted from the ASR hypothesis into the LM and handle long-range dependencies by using the semantic relationships between words and semantic context. ASR noise is propagated to the semantic features, to suppress this noise we introduce the use of deep semantic encodings for semantic feature extraction. In this way, SELMs optimize both the recognition and the understanding performance.

APA, Harvard, Vancouver, ISO, and other styles

39

Bayer, Ali Orkan. "Semantic Language models with deep neural Networks." Doctoral thesis, University of Trento, 2015. http://eprints-phd.biblio.unitn.it/1578/1/bayer_thesis.pdf.

Full text

Abstract:

Spoken language systems (SLS) communicate with users in natural language through speech. There are two main problems related to processing the spoken input in SLS. The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. We focus on the language model (LM) component of SLS. LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. It has long been discussed that an improvement in the recognition performance does not necessarily yield a better understanding performance. Therefore, optimization of LMs for the understanding performance is crucial. In addition, long-range dependencies in languages are hard to handle with statistical language models. These two problems are addressed in this thesis. We investigate two different LM structures. The first LM that we investigate enable SLS to understand better what they recognize by searching the ASR hypotheses for the best understanding performance. We refer to these models as joint LMs. They use lexical and semantic units jointly in the LM. The second LM structure uses the semantic context of an utterance, which can also be described as “what the system understands”, to search for a better hypothesis that improves the recognition and the understanding performance. We refer to these models as semantic LMs (SELMs). SELMs use features that are based on a well established theory of lexical semantics, namely the theory of frame semantics. They incorporate the semantic features which are extracted from the ASR hypothesis into the LM and handle long-range dependencies by using the semantic relationships between words and semantic context. ASR noise is propagated to the semantic features, to suppress this noise we introduce the use of deep semantic encodings for semantic feature extraction. In this way, SELMs optimize both the recognition and the understanding performance.

APA, Harvard, Vancouver, ISO, and other styles

40

Elezi, Ismail <1991&gt. "Exploiting contextual information with deep neural networks." Doctoral thesis, Università Ca' Foscari Venezia, 2020. http://hdl.handle.net/10579/18453.

Full text

Abstract:

Context matters! Nevertheless, there has not been much research in exploiting contextual information in deep neural networks. For the most part, the entire usage of contextual information has been limited to recurrent neural networks. Attention models and capsule networks are two recent ways of introducing contextual information in non-recurrent models, however both of these algorithms have been developed after this work has started. In this thesis, we show that contextual information can be exploited in $2$ fundamentally different ways: implicitly and explicitly. In DeepScores project, where the usage of context is very important for the recognition of many tiny objects, we show that by carefully crafting convolutional architectures, we can achieve state-of-the-art results, while also being able to correctly distinguish between objects which are virtually identical, but have different meanings based on their surrounding. On parallel, we show that by implicitly designing algorithms (motivated from graph and game theory) which take into considerations the entire structure of the dataset, we can achieve state-of-the-art results in different topics like semi-supervised learning and similarity learning. To the best of our knowledge, we are the first to integrate graph-theoretical modules carefully crafted for the problem of similarity learning and whom are designed to consider contextual information, not only outperforming the other models, but also gaining a speed improvement while using a smaller number of parameters.

APA, Harvard, Vancouver, ISO, and other styles

41

RAGONESI, RUGGERO. "Addressing Dataset Bias in Deep Neural Networks." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1069001.

Full text

Abstract:

Deep Learning has achieved tremendous success in recent years in several areas such as image classification, text translation, autonomous agents, to name a few. Deep Neural Networks are able to learn non-linear features in a data-driven fashion from complex, large scale datasets to solve tasks. However, some fundamental issues remain to be fixed: the kind of data that is provided to the neural network directly influences its capability to generalize. This is especially true when training and test data come from different distributions (the so called domain gap or domain shift problem): in this case, the neural network may learn a data representation that is representative for the training data but not for the test, thus performing poorly when deployed in actual scenarios. The domain gap problem is addressed by the so-called Domain Adaptation, for which a large literature was recently developed. In this thesis, we first present a novel method to perform Unsupervised Domain Adaptation. Starting from the typical scenario in which we dispose of labeled source distributions and an unlabeled target distribution, we pursue a pseudo-labeling approach to assign a label to the target data, and then, in an iterative way, we refine them using Generative Adversarial Networks. Subsequently, we faced the debiasing problem. Simply speaking, bias occurs when there are factors in the data which are spuriously correlated with the task label, e.g., the background, which might be a strong clue to guess what class is depicted in an image. When this happens, neural networks may erroneously learn such spurious correlations as predictive factors, and may therefore fail when deployed on different scenarios. Learning a debiased model can be done using supervision regarding the type of bias affecting the data, or can be done without any annotation about what are the spurious correlations. We tackled the problem of supervised debiasing -- where a ground truth annotation for the bias is given -- under the lens of information theory. We designed a neural network architecture that learns to solve the task while achieving at the same time, statistical independence of the data embedding with respect to the bias label. We finally addressed the unsupervised debiasing problem, in which there is no availability of bias annotation. we address this challenging problem by a two-stage approach: we first split coarsely the training dataset into two subsets, samples that exhibit spurious correlations and those that do not. Second, we learn a feature representation that can accommodate both subsets and an augmented version of them.

APA, Harvard, Vancouver, ISO, and other styles

42

Zheng, Xuebin. "Wavelet-based Graph Neural Networks." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/27989.

Full text

Abstract:

This thesis focuses on spectral-based graph neural networks (GNNs). In Chapter 2, we use multiresolution Haar-like wavelets to design a framework of GNNs which equips with graph convolution and pooling strategies. The resulting model is called MathNet whose wavelet transform matrix is constructed with a coarse-grained chain. So our proposed MathNet not only enjoys the multiresolution analysis from the Haar-like wavelets but also leverages the clustering information of the graph data. Furthermore, we develop a novel multiscale representation system for graph data, called decimated framelets, which form a localized tight frame on the graph in Chapter 3. Based on this, we establish decimated G-framelet transforms for the decomposition and reconstruction of the graph data at multi resolutions via a constructive data-driven filter bank. The graph framelets are built on a chain-based orthonormal basis that supports fast graph Fourier transforms. From this, we give a fast algorithm for the decimated G-framelet transforms, or FGT, that has linear computational complexity O (N) for a graph of size N. Finally, in Chapter 4, we present a new approach for assembling graph neural networks based on the undecimated framelet transforms which provide a multiscale representation for graph-structured data. With the framelet system, we can decompose the graph feature into low-pass and high-pass frequencies as extracted features for network training, which then defines an undecimated-framelet-based graph convolution UFGConv. The framelet decomposition naturally induces a graph pooling strategy UFGPool by aggregating the graph feature into low-pass and high-pass spectra, which considers both the feature values and geometry of the graph data and conserves the total information. Moreover, we propose shrinkage as a new activation for UFGConv, which thresholds the high-frequency information at different scales.

APA, Harvard, Vancouver, ISO, and other styles

43

Bonfiglioli, Luca. "Identificazione efficiente di reti neurali sparse basata sulla Lottery Ticket Hypothesis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text

Abstract:

Frankle e Carbin 2018, data una rete densa inizializzata casualmente, mostrano che esistono sottoreti sparse di tale rete che possono ottenere accuratezze superiori alla rete densa e richiedono meno iterazioni di addestramento per raggiungere l’early stop. Tali sottoreti sono indicate con il nome di winning ticket. L’identificazione di questi ultimi richiede tuttavia almeno un addestramento completo del modello denso, il che ne limita l’impiego pratico, se non come tecnica di compressione. In questa tesi, si mira a trovare una variante più efficiente dei metodi di magnitude based pruning proposti in letteratura, valutando diversi metodi euristici e data driven per ottenere winning ticket senza completare l’addestramento della rete densa. Confrontandosi con i risultati di Zhou et al. 2019, si mostra come l’accuratezza all’inizializzazione di un winning ticket non sia predittiva dell’accuratezza finale raggiunta dopo l’addestramento e come, di conseguenza, ottimizzare l’accuratezza al momento di inizializzazione non garantisca altrettanto elevate accuratezze dopo il riaddestramento. Viene inoltre mostrata la presenza di good ticket, ovvero un intero spettro di reti sparse con performance confrontabili, almeno lungo una dimensione, con quelle dei winning ticket, e come sia possibile identificare sottoreti che rientrano in questa categoria anche dopo poche iterazioni di addestramento della rete densa iniziale. L’identificazione di queste reti sparse avviene in modo simile a quanto proposto da You et al. 2020, mediante una predizione del winning ticket effettuata prima del completamento dell’addestramento della rete densa. Viene mostrato che l’utilizzo di euristiche alternative al magnitude based pruning per effettuare queste predizioni consente, con costi computazionali marginalmente superiori, di ottenere predizioni significativamente migliori sulle architetture prese in esame.

APA, Harvard, Vancouver, ISO, and other styles

44

Pons, Puig Jordi. "Deep neural networks for music and audio tagging." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/668036.

Full text

Abstract:

Automatic music and audio tagging can help increase the retrieval and re-use possibilities of many audio databases that remain poorly labeled. In this dissertation, we tackle the task of music and audio tagging from the deep learning perspective and, within that context, we address the following research questions: (i) Which deep learning architectures are most appropriate for (music) audio signals? (ii) In which scenarios is waveform-based end-to-end learning feasible? (iii) How much data is required for carrying out competitive deep learning research? In pursuit of answering research question (i), we propose to use musically motivated convolutional neural networks as an alternative to designing deep learning models that is based on domain knowledge, and we evaluate several deep learning architectures for audio at a low computational cost with a novel methodology based on non-trained (randomly weighted) convolutional neural networks. Throughout our work, we find that employing music and audio domain knowledge during the model’s design can help improve the efficiency, interpretability, and performance of spectrogram-based deep learning models. For research questions (ii) and (iii), we perform a study with the SampleCNN, a recently proposed end-to-end learning model, to assess its viability for music audio tagging when variable amounts of training data —ranging from 25k to 1.2M songs— are available. We compare the SampleCNN against a spectrogram-based architecture that is musically motivated and conclude that, given enough data, end-to-end learning models can achieve better results. Finally, throughout our quest for answering research question (iii), we also investigate whether a naive regularization of the solution space, prototypical networks, transfer learning, or their combination, can foster deep learning models to better leverage a small number of training examples. Results indicate that transfer learning and prototypical networks are powerful strategies in such low-data regimes.
L’etiquetatge automàtic d’àudio i de música pot augmentar les possibilitats de reutilització de moltes de les bases de dades d’àudio que romanen pràcticament sense etiquetar. En aquesta tesi, abordem la tasca de l’etiquetatge automàtic d’àudio i de música des de la perspectiva de l’aprenentatge profund i, en aquest context, abordem les següents qüestions cientı́fiques: (i) Quines arquitectures d’aprenentatge profund són les més adients per a senyals d’àudio (musicals)? (ii) En quins escenaris és viable que els models d’aprenentatge profund processin directament formes d’ona? (iii) Quantes dades es necessiten per dur a terme estudis d’investigació en aprenentatge profund? Per tal de respondre a la primera pregunta (i), proposem utilitzar xarxes neuronals convolucionals motivades musicalment i avaluem diverses arquitectures d’aprenentatge profund per a àudio a un baix cost computacional. Al llarg de les nostres investigacions, trobem que els coneixements previs que tenim sobre la música i l’àudio ens poden ajudar a millorar l’eficiència, la interpretabilitat i el rendiment dels models d’aprenentatge basats en espectrogrames. Per a les preguntes (ii – iii) estudiem com el SampleCNN, un model d’aprenentatge profund que processa formes d’ona, funciona quan disposem de quantitats variables de dades d’entrenament — des de 25k cançons fins a 1’2M cançons. En aquest estudi, comparem el SampleCNN amb una arquitectura basada en espectrogrames que està motivada musicalment. Els resultats experimentals que obtenim indiquen que, en escenaris on disposem de suficients dades, els models d’aprenentatge profund que processen formes d’ona (com el SampleCNN) poden aconseguir millors resultats que els que processen espectrogrames. Finalment, per tal d’intentar respondre a la pregunta (iii), també investiguem si una regularització severa de l’espai de solucions, les xarxes prototipades, l’aprenentatge per transferència de coneixement, o la seva combinació, poden permetre als models d’aprenentatge profund obtenir més bons resultats en escenaris on no hi ha gaires dades d’entrenament. Els resultats dels nostres experiments indiquen que l’aprenentatge per transferència de coneixement i les xarxes prototipades són estratègies útils quan les dades d’entrenament no són abundants.

APA, Harvard, Vancouver, ISO, and other styles

45

Purmonen, Sami. "Predicting Game Level Difficulty Using Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217140.

Full text

Abstract:

We explored the usage of Monte Carlo tree search (MCTS) and deep learning in order to predict game level difficulty in Candy Crush Saga (Candy) measured as number of attempts per success. A deep neural network (DNN) was trained to predict moves from game states from large amounts of game play data. The DNN played a diverse set of levels in Candy and a regression model was fitted to predict human difficulty from bot difficulty. We compared our results to an MCTS bot. Our results show that the DNN can make estimations of game level difficulty comparable to MCTS in substantially shorter time.
Vi utforskade användning av Monte Carlo tree search (MCTS) och deep learning för attuppskatta banors svårighetsgrad i Candy Crush Saga (Candy). Ett deep neural network(DNN) tränades för att förutse speldrag från spelbanor från stora mängder speldata. DNN:en spelade en varierad mängd banor i Candy och en modell byggdes för att förutsemänsklig svårighetsgrad från DNN:ens svårighetsgrad. Resultatet jämfördes medMCTS. Våra resultat indikerar att DNN:ens kan göra uppskattningar jämförbara medMCTS men på substantiellt kortare tid.

APA, Harvard, Vancouver, ISO, and other styles

46

Winsnes, Casper. "Automatic Subcellular Protein Localization Using Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189991.

Full text

Abstract:

Protein localization is an important part in understanding the functionality of a protein. The current method of localizing proteins is to manually annotate microscopy images. This thesis investigates the feasibility of using deep artificial neural networks to automatically classify subcellular protein locations based on immunoflourescent images. We investigate the applicability in both single-label and multi-label classification, as well as cross cell line classification. We show that deep single-label neural networks can be used for protein localization with up to 73% accuracy. We also show the potential of deep multi-label neural networks for protein localization and cross cell line classification but conclude that more research is needed before we can say for certain that the method is applicable.

APA, Harvard, Vancouver, ISO, and other styles

47

Pitkänen, P. (Perttu). "Automatic image quality enhancement using deep neural networks." Master's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201904101454.

Full text

Abstract:

Abstract. Photo retouching can significantly improve image quality and it is considered an essential part of photography. Traditionally this task has been completed manually with special image enhancement software. However, recent research utilizing neural networks has been proven to perform better in the automated image enhancement task compared to traditional methods. During the literature review of this thesis, multiple automatic neural-network-based image enhancement methods were studied, and one of these methods was chosen for closer examination and evaluation. The chosen network design has several appealing qualities such as the ability to learn both local and global enhancements, and its simple architecture constructed for efficient computational speed. This research proposes a novel dataset generation method for automated image enhancement research, and tests its usefulness with the chosen network design. This dataset generation method simulates commonly occurring photographic errors, and the original high-quality images can be used as the target data. This dataset design allows studying fixes for individual and combined aberrations. The underlying idea of this design choice is that the network would learn to fix these aberrations while producing aesthetically pleasing and consistent results. The quantitative evaluation proved that the network can learn to counter these errors, and with greater effort, it could also learn to enhance all of these aspects simultaneously. Additionally, the network’s capability of learning local and portrait specific enhancement tasks were evaluated. The models can apply the effect successfully, but the results did not gain the same level of accuracy as with global enhancement tasks. According to the completed qualitative survey, the images enhanced by the proposed general enhancement model can successfully enhance the image quality, and it can perform better than some of the state-of-the-art image enhancement methods.Automaattinen kuvanlaadun parantaminen käyttämällä syviä neuroverkkoja. Tiivistelmä. Manuaalinen valokuvien käsittely voi parantaa kuvanlaatua huomattavasti ja sitä pidetään oleellisena osana valokuvausprosessia. Perinteisesti tätä tehtävää varten on käytetty erityisiä manuaalisesti operoitavia kuvankäsittelyohjelmia. Nykytutkimus on kuitenkin todistanut neuroverkkojen paremmuuden automaattisessa kuvanparannussovelluksissa perinteisiin menetelmiin verrattuna. Tämän diplomityön kirjallisuuskatsauksessa tutkittiin useita neuroverkkopohjaisia kuvanparannusmenetelmiä, ja yksi näistä valittiin tarkempaa tutkimusta ja arviointia varten. Valitulla verkkomallilla on useita vetoavia ominaisuuksia, kuten paikallisten sekä globaalien kuvanparannusten oppiminen ja sen yksinkertaistettu arkkitehtuuri, joka on rakennettu tehokasta suoritusnopeutta varten. Tämä tutkimus esittää uuden opetusdatan generointimenetelmän automaattisia kuvanparannusmetodeja varten, ja testaa sen soveltuvuutta käyttämällä valittua neuroverkkorakennetta. Tämä opetusdatan generointimenetelmä simuloi usein esiintyviä valokuvauksellisia virheitä, ja alkuperäisiä korkealaatuisia kuvia voi käyttää opetuksen tavoitedatana. Tämän generointitavan avulla voitiin tutkia erillisten valokuvausvirheiden, sekä näiden yhdistelmän korjausta. Tämän menetelmän tarkoitus oli opettaa verkkoa korjaamaan erilaisia virheitä sekä tuottamaan esteettisesti miellyttäviä ja yhtenäisiä tuloksia. Kvalitatiivinen arviointi todisti, että käytetty neuroverkko kykenee oppimaan erillisiä korjauksia näille virheille. Neuroverkko pystyy oppimaan myös mallin, joka korjaa kaikkia ennalta määrättyjä virheitä samanaikaisesti, mutta alhaisemmalla tarkkuudella. Lisäksi neuroverkon kyvykkyyttä oppia paikallisia muotokuvakohtaisia kuvanparannuksia arvioitiin. Koulutetut mallit pystyvät myös toteuttamaan paikallisen kuvanparannuksen onnistuneesti, mutta nämä mallit eivät yltäneet globaalien parannusten tasolle. Toteutetun kyselytutkimuksen mukaan esitetty yleisen kuvanparannuksen malli pystyy parantamaan kuvanlaatua onnistuneesti, sekä tuottaa parempia tuloksia kuin osa vertailluista kuvanparannustekniikoista.

APA, Harvard, Vancouver, ISO, and other styles

48

Wu, Jimmy M. Eng Massachusetts Institute of Technology. "Robotic object pose estimation with deep neural networks." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119699.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 39-45).
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively-annotated object pose data, our pose interpreter network is trained entirely on synthetic data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmentation model trained on RGB images, our synthetically-trained pose interpreter network is able to generalize to real data. Our end-to-end system for object pose estimation runs in real-time (20 Hz) on live RGB data, without using depth information or ICP refinement.
by Jimmy Wu.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

49

Paula, Thomas da Silva. "Contributions in face detection with deep neural networks." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2017. http://tede2.pucrs.br/tede2/handle/tede/7563.

Full text

Abstract:

Submitted by Caroline Xavier (caroline.xavier@pucrs.br) on 2017-07-04T12:23:43Z No. of bitstreams: 1 DIS_THOMAS_DA_SILVA_PAULA_COMPLETO.pdf: 10601063 bytes, checksum: f63f9b6e33e22c4a2553f784a3a029e1 (MD5)
Made available in DSpace on 2017-07-04T12:23:44Z (GMT). No. of bitstreams: 1 DIS_THOMAS_DA_SILVA_PAULA_COMPLETO.pdf: 10601063 bytes, checksum: f63f9b6e33e22c4a2553f784a3a029e1 (MD5) Previous issue date: 2017-03-28
Reconhecimento facial ? um dos assuntos mais estudos no campo de Vis?o Computacional. Dada uma imagem arbitr?ria ou um frame arbitr?rio, o objetivo do reconhecimento facial ? determinar se existem faces na imagem e, se existirem, obter a localiza??o e a extens?o de cada face encontrada. Tal detec??o ? facilmente feita por seres humanos, por?m continua sendo um desafio em Vis?o Computacional. O alto grau de variabilidade e a dinamicidade da face humana tornam-a dif?cil de detectar, principalmente em ambientes complexos. Recentementemente, abordagens de Aprendizado Profundo come?aram a ser utilizadas em tarefas de Vis?o Computacional com bons resultados. Tais resultados abriram novas possibilidades de pesquisa em diferentes aplica??es, incluindo Reconhecimento Facial. Embora abordagens de Aprendizado Profundo tenham sido aplicadas com sucesso para tal tarefa, a maior parte das implementa??es estado da arte utilizam detectores faciais off-the-shelf e n?o avaliam as diferen?as entre eles. Em outros casos, os detectores faciais s?o treinados para m?ltiplas tarefas, como detec??o de pontos fiduciais, detec??o de idade, entre outros. Portanto, n?s temos tr?s principais objetivos. Primeiramente, n?s resumimos e explicamos alguns avan?os do Aprendizado Profundo, detalhando como cada arquitetura e implementa??o funcionam. Depois, focamos no problema de detec??o facial em si, realizando uma rigorosa an?lise de alguns dos detectores existentes assim como algumas implementa??es nossas. N?s experimentamos e avaliamos varia??es de alguns hiper-par?metros para cada um dos detectores e seu impacto em diferentes bases de dados. N?s exploramos tanto implementa??es tradicionais quanto mais recentes, al?m de implementarmos nosso pr?prio detector facial. Por fim, n?s implementamos, testamos e comparamos uma abordagem de meta-aprendizado para detec??o facial, que visa aprender qual o melhor detector facial para uma determinada imagem. Nossos experimentos contribuem para o entendimento do papel do Aprendizado Profundo em detec??o facial, assim como os detalhes relacionados a mudan?a de hiper-par?metros dos detectores faciais e seu impacto no resultado da detec??o facial. N?s tamb?m mostramos o qu?o bem features obtidas com redes neurais profundas ? treinadas em bases de dados de prop?sito geral ? combinadas com uma abordagem de meta-aprendizado, se aplicam a detec??o facial. Nossos experimentos e conclus?es mostram que o aprendizado profundo possui de fato um papel not?vel em detec??o facial.
Face Detection is one of the most studied subjects in the Computer Vision field. Given an arbitrary image or video frame, the goal of face detection is to determine whether there are any faces in the image and, if present, return the image location and the extent of each face. Such a detection is easily done by humans, but it is still a challenge within Computer Vision. The high degree of variability and the dynamicity of the human face makes it an object very difficult to detect, mainly in complex environments. Recently, Deep Learning approaches started to be applied for Computer Vision tasks with great results. They opened new research possibilities in different applications, including Face Detection. Even though Deep Learning has been successfully applied for such a task, most of the state-of-the-art implementations make use of off-the-shelf face detectors and do not evaluate differences among them. In other cases, the face detectors are trained in a multitask manner that includes face landmark detection, age detection, and so on. Hence, our goal is threefold. First, we summarize and explain many advances of deep learning, detailing how each different architecture and implementation work. Second, we focus on the face detection problem itself, performing a rigorous analysis of some of the existing face detectors as well as implementations of our own. We experiment and evaluate variations of hyper-parameters for each of the detectors and their impact in different datasets. We explore both traditional and more recent approaches, as well as implementing our own face detectors. Finally, we implement, test, and compare a meta learning approach for face detection, which aims to learn the best face detector for a given image. Our experiments contribute in understanding the role of deep learning in face detection as well as the subtleties of changing hyper-parameters of the face detectors and their impact in face detection. We also show how well features obtained with deep neural networks trained on a general-purpose dataset perform on a meta learning approach for face detection. Our experiments and conclusions show that deep learning has indeed a notable role in face detection.

APA, Harvard, Vancouver, ISO, and other styles

50

D'Amicantonio, Giacomo. "Improvements to knowledge distillation of deep neural networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24178/.

Full text

Abstract:

One of the main problems in the field of Artificial Intelligence is the efficiency of neural networks models. In the past few years, it seemed that most tasks involving such models could simply be solved by designing larger, deeper models and training them on larger datasets for longer time. This approach requires better performing and therefore expensive and energy consuming hardware and will have an increasingly significant environmental impact when those models are deployed at scale. In 2015 G. Hinton, J. Dean and O. Vinyals presented Knowledge Distillation (KD), a technique that leveraged the logits produced by a big, cumbersome model to guide the training of a smaller model. The two networks were called “Teacher” and “Student” given the analogy between the big model with large knowledge and the small model which has yet to learn everything. They proved that it is possible to extract useful knowledge from the teacher logits and use it to obtain a better performing student when compared with the same model that learned all by itself. This thesis provides an overview of the current state-of-the-art in the field of Knowledge Distillation, analyses some of the most interesting approaches, and builds on them to exploit very confident logits in a more effective way. Furthermore, it provides experimental evidence on the importance of using also smaller logit entries and correcting mistaken predictions from the teacher in the distillation process.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!