Dissertations / Theses on the topic 'Sparse deep neural networks'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Sparse deep neural networks.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Tavanaei, Amirhossein. "Spiking Neural Networks and Sparse Deep Learning." Thesis, University of Louisiana at Lafayette, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10807940.
Full textThis document proposes new methods for training multi-layer and deep spiking neural networks (SNNs), specifically, spiking convolutional neural networks (CNNs). Training a multi-layer spiking network poses difficulties because the output spikes do not have derivatives and the commonly used backpropagation method for non-spiking networks is not easily applied. Our methods use novel versions of the brain-like, local learning rule named spike-timing-dependent plasticity (STDP) that incorporates supervised and unsupervised components. Our method starts with conventional learning methods and converts them to spatio-temporally local rules suited for SNNs.
The training uses two components for unsupervised feature extraction and supervised classification. The first component refers to new STDP rules for spike-based representation learning that trains convolutional filters and initial representations. The second introduces new STDP-based supervised learning rules for spike pattern classification via an approximation to gradient descent by combining the STDP and anti-STDP rules. Specifically, the STDP-based supervised learning model approximates gradient descent by using temporally local STDP rules. Stacking these components implements a novel sparse, spiking deep learning model. Our spiking deep learning model is categorized as a variation of spiking CNNs of integrate-and-fire (IF) neurons with performance comparable with the state-of-the-art deep SNNs. The experimental results show the success of the proposed model for image classification. Our network architecture is the only spiking CNN which provides bio-inspired STDP rules in a hierarchy of feature extraction and classification in an entirely spike-based framework.
Le, Quoc Tung. "Algorithmic and theoretical aspects of sparse deep neural networks." Electronic Thesis or Diss., Lyon, École normale supérieure, 2023. http://www.theses.fr/2023ENSL0105.
Full textSparse deep neural networks offer a compelling practical opportunity to reduce the cost of training, inference and storage, which are growing exponentially in the state of the art of deep learning. In this presentation, we will introduce an approach to study sparse deep neural networks through the lens of another related problem: sparse matrix factorization, i.e., the problem of approximating a (dense) matrix by the product of (multiple) sparse factors. In particular, we identify and investigate in detail some theoretical and algorithmic aspects of a variant of sparse matrix factorization named fixed support matrix factorization (FSMF) in which the set of non-zero entries of sparse factors are known. Several fundamental questions of sparse deep neural networks such as the existence of optimal solutions of the training problem or topological properties of its function space can be addressed using the results of (FSMF). In addition, by applying the results of (FSMF), we also study the butterfly parametrization, an approach that consists of replacing (large) weight matrices by the products of extremely sparse and structured ones in sparse deep neural networks
Hoori, Ammar O. "MULTI-COLUMN NEURAL NETWORKS AND SPARSE CODING NOVEL TECHNIQUES IN MACHINE LEARNING." VCU Scholars Compass, 2019. https://scholarscompass.vcu.edu/etd/5743.
Full textVekhande, Swapnil Sudhir. "Deep Learning Neural Network-based Sinogram Interpolation for Sparse-View CT Reconstruction." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/90182.
Full textMaster of Science
Computed Tomography is a commonly used imaging technique due to the remarkable ability to visualize internal organs, bones, soft tissues, and blood vessels. It involves exposing the subject to X-ray radiation, which could lead to cancer. On the other hand, the radiation dose is critical for the image quality and subsequent diagnosis. Thus, image reconstruction using only a small number of projection data is an open research problem. Deep learning techniques have already revolutionized various Computer Vision applications. Here, we have used a method which fills missing highly sparse CT data. The results show that the deep learning-based method outperforms standard linear interpolation-based methods while improving the image quality.
Carvalho, Micael. "Deep representation spaces." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS292.
Full textIn recent years, Deep Learning techniques have swept the state-of-the-art of many applications of Machine Learning, becoming the new standard approach for them. The architectures issued from these techniques have been used for transfer learning, which extended the power of deep models to tasks that did not have enough data to fully train them from scratch. This thesis' subject of study is the representation spaces created by deep architectures. First, we study properties inherent to them, with particular interest in dimensionality redundancy and precision of their features. Our findings reveal a strong degree of robustness, pointing the path to simple and powerful compression schemes. Then, we focus on refining these representations. We choose to adopt a cross-modal multi-task problem, and design a loss function capable of taking advantage of data coming from multiple modalities, while also taking into account different tasks associated to the same dataset. In order to correctly balance these losses, we also we develop a new sampling scheme that only takes into account examples contributing to the learning phase, i.e. those having a positive loss. Finally, we test our approach in a large-scale dataset of cooking recipes and associated pictures. Our method achieves a 5-fold improvement over the state-of-the-art, and we show that the multi-task aspect of our approach promotes a semantically meaningful organization of the representation space, allowing it to perform subtasks never seen during training, like ingredient exclusion and selection. The results we present in this thesis open many possibilities, including feature compression for remote applications, robust multi-modal and multi-task learning, and feature space refinement. For the cooking application, in particular, many of our findings are directly applicable in a real-world context, especially for the detection of allergens, finding alternative recipes due to dietary restrictions, and menu planning
Pawlowski, Filip igor. "High-performance dense tensor and sparse matrix kernels for machine learning." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEN081.
Full textIn this thesis, we develop high performance algorithms for certain computations involving dense tensors and sparse matrices. We address kernel operations that are useful for machine learning tasks, such as inference with deep neural networks (DNNs). We develop data structures and techniques to reduce memory use, to improve data locality and hence to improve cache reuse of the kernel operations. We design both sequential and shared-memory parallel algorithms. In the first part of the thesis we focus on dense tensors kernels. Tensor kernels include the tensor--vector multiplication (TVM), tensor--matrix multiplication (TMM), and tensor--tensor multiplication (TTM). Among these, TVM is the most bandwidth-bound and constitutes a building block for many algorithms. We focus on this operation and develop a data structure and sequential and parallel algorithms for it. We propose a novel data structure which stores the tensor as blocks, which are ordered using the space-filling curve known as the Morton curve (or Z-curve). The key idea consists of dividing the tensor into blocks small enough to fit cache, and storing them according to the Morton order, while keeping a simple, multi-dimensional order on the individual elements within them. Thus, high performance BLAS routines can be used as microkernels for each block. We evaluate our techniques on a set of experiments. The results not only demonstrate superior performance of the proposed approach over the state-of-the-art variants by up to 18%, but also show that the proposed approach induces 71% less sample standard deviation for the TVM across the d possible modes. Finally, we show that our data structure naturally expands to other tensor kernels by demonstrating that it yields up to 38% higher performance for the higher-order power method. Finally, we investigate shared-memory parallel TVM algorithms which use the proposed data structure. Several alternative parallel algorithms were characterized theoretically and implemented using OpenMP to compare them experimentally. Our results on up to 8 socket systems show near peak performance for the proposed algorithm for 2, 3, 4, and 5-dimensional tensors. In the second part of the thesis, we explore the sparse computations in neural networks focusing on the high-performance sparse deep inference problem. The sparse DNN inference is the task of using sparse DNN networks to classify a batch of data elements forming, in our case, a sparse feature matrix. The performance of sparse inference hinges on efficient parallelization of the sparse matrix--sparse matrix multiplication (SpGEMM) repeated for each layer in the inference function. We first characterize efficient sequential SpGEMM algorithms for our use case. We then introduce the model-parallel inference, which uses a two-dimensional partitioning of the weight matrices obtained using the hypergraph partitioning software. The model-parallel variant uses barriers to synchronize at layers. Finally, we introduce tiling model-parallel and tiling hybrid algorithms, which increase cache reuse between the layers, and use a weak synchronization module to hide load imbalance and synchronization costs. We evaluate our techniques on the large network data from the IEEE HPEC 2019 Graph Challenge on shared-memory systems and report up to 2x times speed-up versus the baseline
Thom, Markus [Verfasser]. "Sparse neural networks / Markus Thom." Ulm : Universität Ulm. Fakultät für Ingenieurwissenschaften und Informatik, 2015. http://d-nb.info/1067496319/34.
Full textLiu, Qian. "Deep spiking neural networks." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/deep-spiking-neural-networks(336e6a37-2a0b-41ff-9ffb-cca897220d6c).html.
Full textSquadrani, Lorenzo. "Deep neural networks and thermodynamics." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.
Find full textMancevo, del Castillo Ayala Diego. "Compressing Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217316.
Full textAbbasi, Mahdieh. "Toward robust deep neural networks." Doctoral thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67766.
Full textIn this thesis, our goal is to develop robust and reliable yet accurate learning models, particularly Convolutional Neural Networks (CNNs), in the presence of adversarial examples and Out-of-Distribution (OOD) samples. As the first contribution, we propose to predict adversarial instances with high uncertainty through encouraging diversity in an ensemble of CNNs. To this end, we devise an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism to predict the adversarial examples with low confidence while keeping the predictive confidence of the clean samples high. In the presence of high entropy in our ensemble, we prove that the predictive confidence can be upper-bounded, leading to have a globally fixed threshold over the predictive confidence for identifying adversaries. We analytically justify the role of diversity in our ensemble on mitigating the risk of both black-box and white-box adversarial examples. Finally, we empirically assess the robustness of our ensemble to the black-box and the white-box attacks on several benchmark datasets.The second contribution aims to address the detection of OOD samples through an end-to-end model trained on an appropriate OOD set. To this end, we address the following central question: how to differentiate many available OOD sets w.r.t. a given in distribution task to select the most appropriate one, which in turn induces a model with a high detection rate of unseen OOD sets? To answer this question, we hypothesize that the “protection” level of in-distribution sub-manifolds by each OOD set can be a good possible property to differentiate OOD sets. To measure the protection level, we then design three novel, simple, and cost-effective metrics using a pre-trained vanilla CNN. In an extensive series of experiments on image and audio classification tasks, we empirically demonstrate the abilityof an Augmented-CNN (A-CNN) and an explicitly-calibrated CNN for detecting a significantly larger portion of unseen OOD samples, if they are trained on the most protective OOD set. Interestingly, we also observe that the A-CNN trained on the most protective OOD set (calledA-CNN) can also detect the black-box Fast Gradient Sign (FGS) adversarial examples. As the third contribution, we investigate more closely the capacity of the A-CNN on the detection of wider types of black-box adversaries. To increase the capability of A-CNN to detect a larger number of adversaries, we augment its OOD training set with some inter-class interpolated samples. Then, we demonstrate that the A-CNN trained on the most protective OOD set along with the interpolated samples has a consistent detection rate on all types of unseen adversarial examples. Where as training an A-CNN on Projected Gradient Descent (PGD) adversaries does not lead to a stable detection rate on all types of adversaries, particularly the unseen types. We also visually assess the feature space and the decision boundaries in the input space of a vanilla CNN and its augmented counterpart in the presence of adversaries and the clean ones. By a properly trained A-CNN, we aim to take a step toward a unified and reliable end-to-end learning model with small risk rates on both clean samples and the unusual ones, e.g. adversarial and OOD samples.The last contribution is to show a use-case of A-CNN for training a robust object detector on a partially-labeled dataset, particularly a merged dataset. Merging various datasets from similar contexts but with different sets of Object of Interest (OoI) is an inexpensive way to craft a large-scale dataset which covers a larger spectrum of OoIs. Moreover, merging datasets allows achieving a unified object detector, instead of having several separate ones, resultingin the reduction of computational and time costs. However, merging datasets, especially from a similar context, causes many missing-label instances. With the goal of training an integrated robust object detector on a partially-labeled but large-scale dataset, we propose a self-supervised training framework to overcome the issue of missing-label instances in the merged datasets. Our framework is evaluated on a merged dataset with a high missing-label rate. The empirical results confirm the viability of our generated pseudo-labels to enhance the performance of YOLO, as the current (to date) state-of-the-art object detector.
Shao, Yuanlong. "Learning Sparse Recurrent Neural Networks in Language Modeling." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398942373.
Full textLu, Yifei. "Deep neural networks and fraud detection." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-331833.
Full textKalogiras, Vasileios. "Sentiment Classification with Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217858.
Full textSentiment analysis is a subfield of natural language processing (NLP) that attempts to analyze the sentiment of written text.It is is a complex problem that entails different challenges. For this reason, it has been studied extensively. In the past years traditional machine learning algorithms or handcrafted methodologies used to provide state of the art results. However, the recent deep learning renaissance shifted interest towards end to end deep learning models. On the one hand this resulted into more powerful models but on the other hand clear mathematical reasoning or intuition behind distinct models is still lacking. As a result, in this thesis, an attempt to shed some light on recently proposed deep learning architectures for sentiment classification is made.A study of their differences is performed as well as provide empirical results on how changes in the structure or capacity of a model can affect its accuracy and the way it represents and ''comprehends'' sentences.
Choi, Keunwoo. "Deep neural networks for music tagging." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/46029.
Full textYin, Yonghua. "Random neural networks for deep learning." Thesis, Imperial College London, 2018. http://hdl.handle.net/10044/1/64917.
Full textZagoruyko, Sergey. "Weight parameterizations in deep neural networks." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1129/document.
Full textMultilayer neural networks were first proposed more than three decades ago, and various architectures and parameterizations were explored since. Recently, graphics processing units enabled very efficient neural network training, and allowed training much larger networks on larger datasets, dramatically improving performance on various supervised learning tasks. However, the generalization is still far from human level, and it is difficult to understand on what the decisions made are based. To improve on generalization and understanding we revisit the problems of weight parameterizations in deep neural networks. We identify the most important, to our mind, problems in modern architectures: network depth, parameter efficiency, and learning multiple tasks at the same time, and try to address them in this thesis. We start with one of the core problems of computer vision, patch matching, and propose to use convolutional neural networks of various architectures to solve it, instead of manual hand-crafting descriptors. Then, we address the task of object detection, where a network should simultaneously learn to both predict class of the object and the location. In both tasks we find that the number of parameters in the network is the major factor determining it's performance, and explore this phenomena in residual networks. Our findings show that their original motivation, training deeper networks for better representations, does not fully hold, and wider networks with less layers can be as effective as deeper with the same number of parameters. Overall, we present an extensive study on architectures and weight parameterizations, and ways of transferring knowledge between them
Ioannou, Yani Andrew. "Structural priors in deep neural networks." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/278976.
Full textBillman, Linnar, and Johan Hullberg. "Speech Reading with Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-360022.
Full textWang, Shenhao. "Deep neural networks for choice analysis." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/129894.
Full textCataloged from student-submitted PDF of thesis.
Includes bibliographical references (pages 117-128).
As deep neural networks (DNNs) outperform classical discrete choice models (DCMs) in many empirical studies, one pressing question is how to reconcile them in the context of choice analysis. So far researchers mainly compare their prediction accuracy, treating them as completely different modeling methods. However, DNNs and classical choice models are closely related and even complementary. This dissertation seeks to lay out a new foundation of using DNNs for choice analysis. It consists of three essays, which respectively tackle the issues of economic interpretation, architectural design, and robustness of DNNs by using classical utility theories. Essay 1 demonstrates that DNNs can provide economic information as complete as the classical DCMs.
The economic information includes choice predictions, choice probabilities, market shares, substitution patterns of alternatives, social welfare, probability derivatives, elasticities, marginal rates of substitution (MRS), and heterogeneous values of time (VOT). Unlike DCMs, DNNs can automatically learn the utility function and reveal behavioral patterns that are not prespecified by modelers. However, the economic information from DNNs can be unreliable because the automatic learning capacity is associated with three challenges: high sensitivity to hyperparameters, model non-identification, and local irregularity. To demonstrate the strength of DNNs as well as the three issues, I conduct an empirical experiment by applying the DNNs to a stated preference survey and discuss successively the full list of economic information extracted from the DNNs. Essay 2 designs a particular DNN architecture with alternative-specific utility functions (ASU-DNN) by using prior behavioral knowledge.
Theoretically, ASU-DNN reduces the estimation error of fully connected DNN (F-DNN) because of its lighter architecture and sparser connectivity, although the constraint of alternative-specific utility could cause ASU-DNN to exhibit a larger approximation error. Both ASU-DNN and F-DNN can be treated as special cases of DNN architecture design guided by utility connectivity graph (UCG). Empirically, ASU-DNN has 2-3% higher prediction accuracy than F-DNN. The alternative-specific connectivity constraint, as a domain-knowledge- based regularization method, is more effective than other regularization methods. This essay demonstrates that prior behavioral knowledge can be used to guide the architecture design of DNN, to function as an effective domain-knowledge-based regularization method, and to improve both the interpretability and predictive power of DNNs in choice analysis.
Essay 3 designs a theory-based residual neural network (TB-ResNet) with a two-stage training procedure, which synthesizes decision-making theories and DNNs in a linear manner. Three instances of TB-ResNets based on choice modeling (CM-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting (HD-ResNets) are designed. Empirically, compared to the decision-making theories, the three instances of TB-ResNets predict significantly better in the out-of-sample test and become more interpretable owing to the rich utility function augmented by DNNs. Compared to the DNNs, the TB-ResNets predict better because the decision-making theories aid in localizing and regularizing the DNN models. TB-ResNets also become more robust than DNNs because the decision-making theories stablize the local utility function and the input gradients.
This essay demonstrates that it is both feasible and desirable to combine the handcrafted utility theory and automatic utility specification, with joint improvement in prediction, interpretation, and robustness.
by Shenhao Wang.
Ph. D. in Computer and Urban Science
Ph.D.inComputerandUrbanScience Massachusetts Institute of Technology, Department of Urban Studies and Planning
Sunnegårdh, Christina. "Scar detection using deep neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299576.
Full textObjektdetektion är en metod inom datorseende som inkluderar både lokalisering och klassificering av objekt i bilder. Antalet användningsområden för metoden växer ständigt och denna studie undersöker det outforskade området av att använda djupa neurala nätverk för detektering av ärr. Studien utforskar även att använda detektering av ärr som grund för den binära klassificeringsuppgiften att bestämma om bilder innehåller ett synligt ärr eller inte. Två förtränade objektdetekteringsmodeller, Faster R-CNN och RetinaNet, tränades med olika hyperparametrar på 1830 manuellt märkta bilder. Faster RCNN Inception ResNet V2 uppnådde bäst resultat med avseende på average precision (AP), tätt följd av Faster R-CNN ResNet50 och slutligen RetinaNet. Resultatet indikerar både överlägsenhet av Faster R-CNN gentemot RetinaNet, såväl som att använda Inception ResNet V2 för särdragsextrahering. Detta beror med stor sannolikhet på dess användning av faltningsfilter i flera storlekar på samma nivåer i nätverket. Gällande detekteringstid per bild var RetinaNet snabbast, följd av Faster R-CNN ResNet50 och slutligen Faster R-CNN Inception ResNet V2. För den binära klassificeringsuppgiften testades modellerna på 200 bilder, där hälften av bilderna innehöll tydligt synliga ärr. Faster RCNN ResNet50 uppnådde högst träffsäkerhet, följt av Faster R-CNN Inception ResNet V2 och till sist RetinaNet. Medan träffsäkerheten för RetinaNet huvudsakligen bestraffades på grund av att ha förbisett ärr i bilder, så detekterade Faster R-CNN Inception ResNet V2 ett flertal faktiska ärr som inte datamärkts på grund av bristande bildkvalitet. Detta kan dock vara en fråga om subjektiv datamärkning och att modellen bestraffas för något som andra gånger skulle kunna anses korrekt. Sammanfattningsvis visar denna studie lovande resultat av att använda objektdetektion för att detektera ärr i bilder. Medan tvåstegsmodellen Faster R-CNN har övertaget sett till AP, har enstegsmodellen RetinaNet övertaget sett till detekteringstid. Förslag för framtida arbete inkluderar att lägga större vikt vid märkning av data för att eliminera potentiell subjektivitet, samt inkludera träningsdata innehållande objekt som modellerna misstog för ärr. Exempel på detta är öppna sår, knogar och bakgrundsobjekt som visuellt liknar ärr.
Landeen, Trevor J. "Association Learning Via Deep Neural Networks." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7028.
Full textSrivastava, Sanjana. "On foveation of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123134.
Full textThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 61-63).
The human ability to recognize objects is impaired when the object is not shown in full. "Minimal images" are the smallest regions of an image that remain recognizable for humans. [26] show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of- the-art convolutional neural networks (CNNs), and are much more prominent in CNNs. We found many cases where CNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in CNNs. Our results thus reveal a new failure mode of CNNs that also affects humans to a lesser degree. They expose how fragile CNN recognition ability is for natural images even without synthetic adversarial patterns being introduced. This opens potential for CNN robustness in natural images to be brought to the human level by taking inspiration from human robustness methods. One of these is eccentricity dependence, a model of human focus in which attention to the visual input degrades proportional to distance from the focal point [7]. We demonstrate that applying the "inverted pyramid" eccentricity method, a multi-scale input transformation, makes CNNs more robust to useless background features than a standard raw-image input. Our results also find that using the inverted pyramid method generally reduces useless background pixels, therefore reducing required training data.
by Sanjana Srivastava.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Grechka, Asya. "Image editing with deep neural networks." Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS683.pdf.
Full textImage editing has a rich history which dates back two centuries. That said, "classic" image editing requires strong artistic skills as well as considerable time, often in the scale of hours, to modify an image. In recent years, considerable progress has been made in generative modeling which has allowed realistic and high-quality image synthesis. However, real image editing is still a challenge which requires a balance between novel generation all while faithfully preserving parts of the original image. In this thesis, we will explore different approaches to edit images, leveraging three families of generative networks: GANs, VAEs and diffusion models. First, we study how to use a GAN to edit a real image. While methods exist to modify generated images, they do not generalize easily to real images. We analyze the reasons for this and propose a solution to better project a real image into the GAN's latent space so as to make it editable. Then, we use variational autoencoders with vector quantification to directly obtain a compact image representation (which we could not obtain with GANs) and optimize the latent vector so as to match a desired text input. We aim to constrain this problem, which on the face could be vulnerable to adversarial attacks. We propose a method to chose the hyperparameters while optimizing simultaneously the image quality and the fidelity to the original image. We present a robust evaluation protocol and show the interest of our method. Finally, we abord the problem of image editing from the view of inpainting. Our goal is to synthesize a part of an image while preserving the rest unmodified. For this, we leverage pre-trained diffusion models and build off on their classic inpainting method while replacing, at each denoising step, the part which we do not wish to modify with the noisy real image. However, this method leads to a disharmonization between the real and generated parts. We propose an approach based on calculating a gradient of a loss which evaluates the harmonization of the two parts. We guide the denoising process with this gradient
Andersson, Viktor. "Semantic Segmentation : Using Convolutional Neural Networks and Sparse dictionaries." Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139367.
Full textChen, Zhe. "Augmented Context Modelling Neural Networks." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20654.
Full textHabibi, Aghdam Hamed. "Understanding Road Scenes using Deep Neural Networks." Doctoral thesis, Universitat Rovira i Virgili, 2018. http://hdl.handle.net/10803/461607.
Full textComprender las escenas de la carretera es crucial para los automóviles autónomos. Esto requiere segmentar escenas de carretera en regiones semánticamente significativas y reconocer objetos en una escena. Mientras que los objetos tales como coches y peatones tienen que segmentarse con precisión, puede que no sea necesario detectar y localizar estos objetos en una escena. Sin embargo, la detección y clasificación de objetos tales como señales de tráfico es esencial para ajustarse a las reglas de la carretera. En esta tesis, proponemos un método para la clasificación de señales de tráfico utilizando atributos visuales y redes bayesianas. A continuación, proponemos dos redes neuronales para este fin y desarrollar un nuevo método para crear un conjunto de modelos. A continuación, se estudia la sensibilidad de las redes neuronales frente a las muestras adversarias y se proponen dos redes destructoras que se unen a las redes de clasificación para aumentar su estabilidad frente al ruido. En la segunda parte de la tesis, proponemos una red para detectar señales de tráfico en imágenes de alta resolución en tiempo real y mostrar cómo implementar la técnica de ventana de escaneo dentro de nuestra red usando circunvoluciones dilatadas. A continuación, formulamos el problema de detección como un problema de segmentación y proponemos una red completamente convolucional para detectar señales de tráfico. Finalmente, proponemos una nueva red totalmente convolucional compuesta de módulos de fuego, conexiones de bypass y circunvoluciones consecutivas dilatadas en la última parte de la tesis para escenarios de carretera segmentinc en regiones semánticamente significativas y muestran que es más accuarate y computacionalmente más eficiente en comparación con redes similares
Understanding road scenes is crucial for autonomous cars. This requires segmenting road scenes into semantically meaningful regions and recognizing objects in a scene. While objects such as cars and pedestrians has to be segmented accurately, it might not be necessary to detect and locate these objects in a scene. However, detecting and classifying objects such as traffic signs is essential for conforming to road rules. In this thesis, we first propose a method for classifying traffic signs using visual attributes and Bayesian networks. Then, we propose two neural network for this purpose and develop a new method for creating an ensemble of models. Next, we study sensitivity of neural networks against adversarial samples and propose two denoising networks that are attached to the classification networks to increase their stability against noise. In the second part of the thesis, we first propose a network to detect traffic signs in high-resolution images in real-time and show how to implement the scanning window technique within our network using dilated convolutions. Then, we formulate the detection problem as a segmentation problem and propose a fully convolutional network for detecting traffic signs. Finally, we propose a new fully convolutional network composed of fire modules, bypass connections and consecutive dilated convolutions in the last part of the thesis for segmenting road scenes into semantically meaningful regions and show that it is more accurate and computationally more efficient compared to similar networks.
Antoniades, Andreas. "Interpreting biomedical data via deep neural networks." Thesis, University of Surrey, 2018. http://epubs.surrey.ac.uk/845765/.
Full textAvramova, Vanya. "Curriculum Learning with Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-178453.
Full textKarlsson, Daniel. "Classifying sport videos with deep neural networks." Thesis, Umeå universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-130654.
Full textPeng, Zeng. "Pedestrian Tracking by using Deep Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302107.
Full textDetta projekt syftar till att använda djupinlärning för att lösa problemet med att följa fotgängare för autonom körning. For ligger inom datorseende och djupinlärning. Multi-Objekt-följning (MOT) syftar till att följa flera mål samtidigt i videodata. de viktigaste applikationsscenarierna för MOT är säkerhetsövervakning och autonom körning. I dessa scenarier behöver vi ofta följa många mål samtidigt, vilket inte är möjligt med endast objektdetektering eller algoritmer för enkel följning av objekt för deras bristande stabilitet och användbarhet, därför måste utforska området för multipel objektspårning. Vår metod bryter MOT i olika steg och använder rörelse- och utseendinformation för mål för att spåra dem i videodata, vi använde tre olika objektdetektorer för att upptäcka fotgängare i ramar en personidentifieringsmodell som utseendefunktionsavskiljare och Kalmanfilter som rörelsesprediktor. Vår föreslagna modell uppnår 47,6 % MOT-noggrannhet och 53,2 % i IDF1 medan resultaten som erhållits av modellen utan personåteridentifieringsmodul är endast 44,8%respektive 45,8 %. Våra experimentresultat visade att den robusta algoritmen för multipel objektspårning kan uppnås genom delade uppgifter och förbättras av de representativa DNN-baserade utseendefunktionerna.
Milner, Rosanna Margaret. "Using deep neural networks for speaker diarisation." Thesis, University of Sheffield, 2016. http://etheses.whiterose.ac.uk/16567/.
Full textKarlsson, Jonas. "Auditory Classification of Carsby Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-355673.
Full textWang, Yuxuan. "Supervised Speech Separation Using Deep Neural Networks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1426366690.
Full textWu, Chunyang. "Structured deep neural networks for speech recognition." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/276084.
Full textZhang, Jeffrey M. Eng Massachusetts Institute of Technology. "Enhancing adversarial robustness of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122994.
Full textThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 57-58).
Logit-based regularization and pretrain-then-tune are two approaches that have recently been shown to enhance adversarial robustness of machine learning models. In the realm of regularization, Zhang et al. (2019) proposed TRADES, a logit-based regularization optimization function that has been shown to improve upon the robust optimization framework developed by Madry et al. (2018) [14, 9]. They were able to achieve state-of-the-art adversarial accuracy on CIFAR10. In the realm of pretrain- then-tune models, Hendrycks el al. (2019) demonstrated that adversarially pretraining a model on ImageNet then adversarially tuning on CIFAR10 greatly improves the adversarial robustness of machine learning models. In this work, we propose Adversarial Regularization, another logit-based regularization optimization framework that surpasses TRADES in adversarial generalization. Furthermore, we explore the impact of trying different types of adversarial training on the pretrain-then-tune paradigm.
by Jeffry Zhang.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Miglani, Vivek N. "Comparing learned representations of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123048.
Full textThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 63-64).
In recent years, a variety of deep neural network architectures have obtained substantial accuracy improvements in tasks such as image classification, speech recognition, and machine translation, yet little is known about how different neural networks learn. To further understand this, we interpret the function of a deep neural network used for classification as converting inputs to a hidden representation in a high dimensional space and applying a linear classifier in this space. This work focuses on comparing these representations as well as the learned input features for different state-of-the-art convolutional neural network architectures. By focusing on the geometry of this representation, we find that different network architectures trained on the same task have hidden representations which are related by linear transformations. We find that retraining the same network architecture with a different initialization does not necessarily lead to more similar representation geometry for most architectures, but the ResNeXt architecture consistently learns similar features and hidden representation geometry. We also study connections to adversarial examples and observe that networks with more similar hidden representation geometries also exhibit higher rates of adversarial example transferability.
by Vivek N. Miglani.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Bayer, Ali Orkan. "Semantic Language models with deep neural Networks." Doctoral thesis, Università degli studi di Trento, 2015. https://hdl.handle.net/11572/367784.
Full textBayer, Ali Orkan. "Semantic Language models with deep neural Networks." Doctoral thesis, University of Trento, 2015. http://eprints-phd.biblio.unitn.it/1578/1/bayer_thesis.pdf.
Full textElezi, Ismail <1991>. "Exploiting contextual information with deep neural networks." Doctoral thesis, Università Ca' Foscari Venezia, 2020. http://hdl.handle.net/10579/18453.
Full textRAGONESI, RUGGERO. "Addressing Dataset Bias in Deep Neural Networks." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1069001.
Full textZheng, Xuebin. "Wavelet-based Graph Neural Networks." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/27989.
Full textBonfiglioli, Luca. "Identificazione efficiente di reti neurali sparse basata sulla Lottery Ticket Hypothesis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.
Find full textPons, Puig Jordi. "Deep neural networks for music and audio tagging." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/668036.
Full textL’etiquetatge automàtic d’àudio i de música pot augmentar les possibilitats de reutilització de moltes de les bases de dades d’àudio que romanen pràcticament sense etiquetar. En aquesta tesi, abordem la tasca de l’etiquetatge automàtic d’àudio i de música des de la perspectiva de l’aprenentatge profund i, en aquest context, abordem les següents qüestions cientı́fiques: (i) Quines arquitectures d’aprenentatge profund són les més adients per a senyals d’àudio (musicals)? (ii) En quins escenaris és viable que els models d’aprenentatge profund processin directament formes d’ona? (iii) Quantes dades es necessiten per dur a terme estudis d’investigació en aprenentatge profund? Per tal de respondre a la primera pregunta (i), proposem utilitzar xarxes neuronals convolucionals motivades musicalment i avaluem diverses arquitectures d’aprenentatge profund per a àudio a un baix cost computacional. Al llarg de les nostres investigacions, trobem que els coneixements previs que tenim sobre la música i l’àudio ens poden ajudar a millorar l’eficiència, la interpretabilitat i el rendiment dels models d’aprenentatge basats en espectrogrames. Per a les preguntes (ii – iii) estudiem com el SampleCNN, un model d’aprenentatge profund que processa formes d’ona, funciona quan disposem de quantitats variables de dades d’entrenament — des de 25k cançons fins a 1’2M cançons. En aquest estudi, comparem el SampleCNN amb una arquitectura basada en espectrogrames que està motivada musicalment. Els resultats experimentals que obtenim indiquen que, en escenaris on disposem de suficients dades, els models d’aprenentatge profund que processen formes d’ona (com el SampleCNN) poden aconseguir millors resultats que els que processen espectrogrames. Finalment, per tal d’intentar respondre a la pregunta (iii), també investiguem si una regularització severa de l’espai de solucions, les xarxes prototipades, l’aprenentatge per transferència de coneixement, o la seva combinació, poden permetre als models d’aprenentatge profund obtenir més bons resultats en escenaris on no hi ha gaires dades d’entrenament. Els resultats dels nostres experiments indiquen que l’aprenentatge per transferència de coneixement i les xarxes prototipades són estratègies útils quan les dades d’entrenament no són abundants.
Purmonen, Sami. "Predicting Game Level Difficulty Using Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217140.
Full textVi utforskade användning av Monte Carlo tree search (MCTS) och deep learning för attuppskatta banors svårighetsgrad i Candy Crush Saga (Candy). Ett deep neural network(DNN) tränades för att förutse speldrag från spelbanor från stora mängder speldata. DNN:en spelade en varierad mängd banor i Candy och en modell byggdes för att förutsemänsklig svårighetsgrad från DNN:ens svårighetsgrad. Resultatet jämfördes medMCTS. Våra resultat indikerar att DNN:ens kan göra uppskattningar jämförbara medMCTS men på substantiellt kortare tid.
Winsnes, Casper. "Automatic Subcellular Protein Localization Using Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189991.
Full textPitkänen, P. (Perttu). "Automatic image quality enhancement using deep neural networks." Master's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201904101454.
Full textWu, Jimmy M. Eng Massachusetts Institute of Technology. "Robotic object pose estimation with deep neural networks." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119699.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 39-45).
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively-annotated object pose data, our pose interpreter network is trained entirely on synthetic data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmentation model trained on RGB images, our synthetically-trained pose interpreter network is able to generalize to real data. Our end-to-end system for object pose estimation runs in real-time (20 Hz) on live RGB data, without using depth information or ICP refinement.
by Jimmy Wu.
M. Eng.
Paula, Thomas da Silva. "Contributions in face detection with deep neural networks." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2017. http://tede2.pucrs.br/tede2/handle/tede/7563.
Full textMade available in DSpace on 2017-07-04T12:23:44Z (GMT). No. of bitstreams: 1 DIS_THOMAS_DA_SILVA_PAULA_COMPLETO.pdf: 10601063 bytes, checksum: f63f9b6e33e22c4a2553f784a3a029e1 (MD5) Previous issue date: 2017-03-28
Reconhecimento facial ? um dos assuntos mais estudos no campo de Vis?o Computacional. Dada uma imagem arbitr?ria ou um frame arbitr?rio, o objetivo do reconhecimento facial ? determinar se existem faces na imagem e, se existirem, obter a localiza??o e a extens?o de cada face encontrada. Tal detec??o ? facilmente feita por seres humanos, por?m continua sendo um desafio em Vis?o Computacional. O alto grau de variabilidade e a dinamicidade da face humana tornam-a dif?cil de detectar, principalmente em ambientes complexos. Recentementemente, abordagens de Aprendizado Profundo come?aram a ser utilizadas em tarefas de Vis?o Computacional com bons resultados. Tais resultados abriram novas possibilidades de pesquisa em diferentes aplica??es, incluindo Reconhecimento Facial. Embora abordagens de Aprendizado Profundo tenham sido aplicadas com sucesso para tal tarefa, a maior parte das implementa??es estado da arte utilizam detectores faciais off-the-shelf e n?o avaliam as diferen?as entre eles. Em outros casos, os detectores faciais s?o treinados para m?ltiplas tarefas, como detec??o de pontos fiduciais, detec??o de idade, entre outros. Portanto, n?s temos tr?s principais objetivos. Primeiramente, n?s resumimos e explicamos alguns avan?os do Aprendizado Profundo, detalhando como cada arquitetura e implementa??o funcionam. Depois, focamos no problema de detec??o facial em si, realizando uma rigorosa an?lise de alguns dos detectores existentes assim como algumas implementa??es nossas. N?s experimentamos e avaliamos varia??es de alguns hiper-par?metros para cada um dos detectores e seu impacto em diferentes bases de dados. N?s exploramos tanto implementa??es tradicionais quanto mais recentes, al?m de implementarmos nosso pr?prio detector facial. Por fim, n?s implementamos, testamos e comparamos uma abordagem de meta-aprendizado para detec??o facial, que visa aprender qual o melhor detector facial para uma determinada imagem. Nossos experimentos contribuem para o entendimento do papel do Aprendizado Profundo em detec??o facial, assim como os detalhes relacionados a mudan?a de hiper-par?metros dos detectores faciais e seu impacto no resultado da detec??o facial. N?s tamb?m mostramos o qu?o bem features obtidas com redes neurais profundas ? treinadas em bases de dados de prop?sito geral ? combinadas com uma abordagem de meta-aprendizado, se aplicam a detec??o facial. Nossos experimentos e conclus?es mostram que o aprendizado profundo possui de fato um papel not?vel em detec??o facial.
Face Detection is one of the most studied subjects in the Computer Vision field. Given an arbitrary image or video frame, the goal of face detection is to determine whether there are any faces in the image and, if present, return the image location and the extent of each face. Such a detection is easily done by humans, but it is still a challenge within Computer Vision. The high degree of variability and the dynamicity of the human face makes it an object very difficult to detect, mainly in complex environments. Recently, Deep Learning approaches started to be applied for Computer Vision tasks with great results. They opened new research possibilities in different applications, including Face Detection. Even though Deep Learning has been successfully applied for such a task, most of the state-of-the-art implementations make use of off-the-shelf face detectors and do not evaluate differences among them. In other cases, the face detectors are trained in a multitask manner that includes face landmark detection, age detection, and so on. Hence, our goal is threefold. First, we summarize and explain many advances of deep learning, detailing how each different architecture and implementation work. Second, we focus on the face detection problem itself, performing a rigorous analysis of some of the existing face detectors as well as implementations of our own. We experiment and evaluate variations of hyper-parameters for each of the detectors and their impact in different datasets. We explore both traditional and more recent approaches, as well as implementing our own face detectors. Finally, we implement, test, and compare a meta learning approach for face detection, which aims to learn the best face detector for a given image. Our experiments contribute in understanding the role of deep learning in face detection as well as the subtleties of changing hyper-parameters of the face detectors and their impact in face detection. We also show how well features obtained with deep neural networks trained on a general-purpose dataset perform on a meta learning approach for face detection. Our experiments and conclusions show that deep learning has indeed a notable role in face detection.
D'Amicantonio, Giacomo. "Improvements to knowledge distillation of deep neural networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24178/.
Full text