Дисертації з теми "Deep neural networks (DNNs)"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Deep neural networks (DNNs).

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 дисертацій для дослідження на тему "Deep neural networks (DNNs)".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Michailoff, John. "Email Classification : An evaluation of Deep Neural Networks with Naive Bayes." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-37590.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Machine learning (ML) is an area of computer science that gives computers the ability to learn data patterns without prior programming for those patterns. Using neural networks in this area is based on simulating the biological functions of neurons in brains to learn patterns in data, giving computers a predictive ability to comprehend how data can be clustered. This research investigates the possibilities of using neural networks for classifying email, i.e. working as an email case manager. A Deep Neural Network (DNN) are multiple layers of neurons connected to each other by trainable weights. The main objective of this thesis was to evaluate how the three input arguments - data size, training time and neural network structure – affects the accuracy of Deep Neural Networks pattern recognition; also an evaluation of how the DNN performs compared to the statistical ML method, Naïve Bayes, in the form of prediction accuracy and complexity; and finally the viability of the resulting DNN as a case manager. Results show an improvement of accuracy on our networks with the increase of training time and data size respectively. By testing increasingly complex network structures (larger networks of neurons with more layers) it is observed that overfitting becomes a problem with increased training time, i.e. how accuracy decrease after a certain threshold of training time. Naïve Bayes classifiers performs worse than DNN in terms of accuracy, but better in reduced complexity; making NB viable on mobile platforms. We conclude that our developed prototype may work well in tangent with existing case management systems, tested by future research.
2

Tong, Zheng. "Evidential deep neural network in the framework of Dempster-Shafer theory." Thesis, Compiègne, 2022. http://www.theses.fr/2022COMP2661.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les réseaux de neurones profonds (DNN) ont obtenu un succès remarquable sur de nombreuses applications du monde réel (par exemple, la reconnaissance de formes et la segmentation sémantique), mais sont toujours confrontés au problème de la gestion de l'incertitude. La théorie de Dempster-Shafer (DST) fournit un cadre bien fondé et élégant pour représenter et raisonner avec des informations incertaines. Dans cette thèse, nous avons proposé un nouveau framework utilisant DST et DNNs pour résoudre les problèmes d'incertitude. Dans le cadre proposé, nous hybridons d'abord DST et DNN en branchant une couche de réseau neuronal basée sur DST suivie d'une couche utilitaire à la sortie d'un réseau neuronal convolutif pour la classification à valeur définie. Nous étendons également l'idée à la segmentation sémantique en combinant des réseaux entièrement convolutifs et DST. L'approche proposée améliore les performances des modèles DNN en attribuant des modèles ambigus avec une incertitude élevée, ainsi que des valeurs aberrantes, à des ensembles multi-classes. La stratégie d'apprentissage utilisant des étiquettes souples améliore encore les performances des DNN en convertissant des données d'étiquettes imprécises et non fiables en fonctions de croyance. Nous avons également proposé une stratégie de fusion modulaire utilisant ce cadre proposé, dans lequel un module de fusion agrège les sorties de la fonction de croyance des DNN évidents selon la règle de Dempster. Nous utilisons cette stratégie pour combiner des DNN formés à partir d'ensembles de données hétérogènes avec différents ensembles de classes tout en conservant des performances au moins aussi bonnes que celles des réseaux individuels sur leurs ensembles de données respectifs. De plus, nous appliquons la stratégie pour combiner plusieurs réseaux superficiels et obtenir une performance similaire d'un DNN avancé pour une tâche compliquée
Deep neural networks (DNNs) have achieved remarkable success on many realworld applications (e.g., pattern recognition and semantic segmentation) but still face the problem of managing uncertainty. Dempster-Shafer theory (DST) provides a wellfounded and elegant framework to represent and reason with uncertain information. In this thesis, we have proposed a new framework using DST and DNNs to solve the problems of uncertainty. In the proposed framework, we first hybridize DST and DNNs by plugging a DSTbased neural-network layer followed by a utility layer at the output of a convolutional neural network for set-valued classification. We also extend the idea to semantic segmentation by combining fully convolutional networks and DST. The proposed approach enhances the performance of DNN models by assigning ambiguous patterns with high uncertainty, as well as outliers, to multi-class sets. The learning strategy using soft labels further improves the performance of the DNNs by converting imprecise and unreliable label data into belief functions. We have also proposed a modular fusion strategy using this proposed framework, in which a fusion module aggregates the belief-function outputs of evidential DNNs by Dempster’s rule. We use this strategy to combine DNNs trained from heterogeneous datasets with different sets of classes while keeping at least as good performance as those of the individual networks on their respective datasets. Further, we apply the strategy to combine several shallow networks and achieve a similar performance of an advanced DNN for a complicated task
3

Buratti, Luca. "Visualisation of Convolutional Neural Networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Le Reti Neurali, e in particolare le Reti Neurali Convoluzionali, hanno recentemente dimostrato risultati straordinari in vari campi. Purtroppo, comunque, non vi è ancora una chiara comprensione del perchè queste architetture funzionino così bene e soprattutto è difficile spiegare il comportamento nel caso di fallimenti. Questa mancanza di chiarezza è quello che separa questi modelli dall’essere applicati in scenari concreti e critici della vita reale, come la sanità o le auto a guida autonoma. Per questa ragione, durante gli ultimi anni sono stati portati avanti diversi studi in modo tale da creare metodi che siano capaci di spiegare al meglio cosa sta succedendo dentro una rete neurale oppure dove la rete sta guardando per predire in un certo modo. Proprio queste tecniche sono il centro di questa tesi e il ponte tra i due casi di studio che sono presentati sotto. Lo scopo di questo lavoro è quindi duplice: per prima cosa, usare questi metodi per analizzare e quindi capire come migliorare applicazioni basate su reti neurali convoluzionali e in secondo luogo, per investigare la capacità di generalizzazione di queste architetture, sempre grazie a questi metodi.
4

Liu, Qian. "Deep spiking neural networks." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/deep-spiking-neural-networks(336e6a37-2a0b-41ff-9ffb-cca897220d6c).html.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Neuromorphic Engineering (NE) has led to the development of biologically-inspired computer architectures whose long-term goal is to approach the performance of the human brain in terms of energy efficiency and cognitive capabilities. Although there are a number of neuromorphic platforms available for large-scale Spiking Neural Network (SNN) simulations, the problem of programming these brain-like machines to be competent in cognitive applications still remains unsolved. On the other hand, Deep Learning has emerged in Artificial Neural Network (ANN) research to dominate state-of-the-art solutions for cognitive tasks. Thus the main research problem emerges of understanding how to operate and train biologically-plausible SNNs to close the gap in cognitive capabilities between SNNs and ANNs. SNNs can be trained by first training an equivalent ANN and then transferring the tuned weights to the SNN. This method is called ‘off-line’ training, since it does not take place on an SNN directly, but rather on an ANN instead. However, previous work on such off-line training methods has struggled in terms of poor modelling accuracy of the spiking neurons and high computational complexity. In this thesis we propose a simple and novel activation function, Noisy Softplus (NSP), to closely model the response firing activity of biologically-plausible spiking neurons, and introduce a generalised off-line training method using the Parametric Activation Function (PAF) to map the abstract numerical values of the ANN to concrete physical units, such as current and firing rate in the SNN. Based on this generalised training method and its fine tuning, we achieve the state-of-the-art accuracy on the MNIST classification task using spiking neurons, 99.07%, on a deep spiking convolutional neural network (ConvNet). We then take a step forward to ‘on-line’ training methods, where Deep Learning modules are trained purely on SNNs in an event-driven manner. Existing work has failed to provide SNNs with recognition accuracy equivalent to ANNs due to the lack of mathematical analysis. Thus we propose a formalised Spike-based Rate Multiplication (SRM) method which transforms the product of firing rates to the number of coincident spikes of a pair of rate-coded spike trains. Moreover, these coincident spikes can be captured by the Spike-Time-Dependent Plasticity (STDP) rule to update the weights between the neurons in an on-line, event-based, and biologically-plausible manner. Furthermore, we put forward solutions to reduce correlations between spike trains; thereby addressing the result of performance drop in on-line SNN training. The promising results of spiking Autoencoders (AEs) and Restricted Boltzmann Machines (SRBMs) exhibit equivalent, sometimes even superior, classification and reconstruction capabilities compared to their non-spiking counterparts. To provide meaningful comparisons between these proposed SNN models and other existing methods within this rapidly advancing field of NE, we propose a large dataset of spike-based visual stimuli and a corresponding evaluation methodology to estimate the overall performance of SNN models and their hardware implementations.
5

Li, Dongfu. "Deep Neural Network Approach for Single Channel Speech Enhancement Processing." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34472.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speech intelligibility represents how comprehensible a speech is. It is more important than speech quality in some applications. Single channel speech intelligibility enhancement is much more difficult than multi-channel intelligibility enhancement. It has recently been reported that training-based single channel speech intelligibility enhancement algorithms perform better than Signal to Noise Ratio (SNR) based algorithm. In this thesis, a training-based Deep Neural Network (DNN) is used to improve single channel speech intelligibility. To increase the performance of the DNN, the Multi-Resolution Cochlea Gram (MRCG) feature set is used as the input of the DNN. MATLAB objective test results show that the MRCG-DNN approach is more robust than a Gaussian Mixture Model (GMM) approach. The MRCG-DNN also works better than other DNN training algorithms. Various conditions such as different speakers, different noise conditions and reverberation were tested in the thesis.
6

Shuvo, Md Kamruzzaman. "Hardware Efficient Deep Neural Network Implementation on FPGA." OpenSIUC, 2020. https://opensiuc.lib.siu.edu/theses/2792.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In recent years, there has been a significant push to implement Deep Neural Networks (DNNs) on edge devices, which requires power and hardware efficient circuits to carry out the intensive matrix-vector multiplication (MVM) operations. This work presents hardware efficient MVM implementation techniques using bit-serial arithmetic and a novel MSB first computation circuit. The proposed designs take advantage of the pre-trained network weight parameters, which are already known in the design stage. Thus, the partial computation results can be pre-computed and stored into look-up tables. Then the MVM results can be computed in a bit-serial manner without using multipliers. The proposed novel circuit implementation for convolution filters and rectified linear activation function used in deep neural networks conducts computation in an MSB-first bit-serial manner. It can predict earlier if the outcomes of filter computations will be negative and subsequently terminate the remaining computations to save power. The benefits of using the proposed MVM implementations techniques are demonstrated by comparing the proposed design with conventional implementation. The proposed circuit is implemented on an FPGA. It shows significant power and performance improvements compared to the conventional designs implemented on the same FPGA.
7

Squadrani, Lorenzo. "Deep neural networks and thermodynamics." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deep learning is the most effective and used approach to artificial intelligence, and yet it is far from being properly understood. The understanding of it is the way to go to further improve its effectiveness and in the best case to gain some understanding of the "natural" intelligence. We attempt a step in this direction with the aim of physics. We describe a convolutional neural network for image classification (trained on CIFAR-10) within the descriptive framework of Thermodynamics. In particular we define and study the temperature of each component of the network. Our results provides a new point of view on deep learning models, which may be a starting point towards a better understanding of artificial intelligence.
8

Mancevo, del Castillo Ayala Diego. "Compressing Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217316.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deep Convolutional Neural Networks and "deep learning" in general stand at the cutting edge on a range of applications, from image based recognition and classification to natural language processing, speech and speaker recognition and reinforcement learning. Very deep models however are often large, complex and computationally expensive to train and evaluate. Deep learning models are thus seldom deployed natively in environments where computational resources are scarce or expensive. To address this problem we turn our attention towards a range of techniques that we collectively refer to as "model compression" where a lighter student model is trained to approximate the output produced by the model we wish to compress. To this end, the output from the original model is used to craft the training labels of the smaller student model. This work contains some experiments on CIFAR-10 and demonstrates how to use the aforementioned techniques to compress a people counting model whose precision, recall and F1-score are improved by as much as 14% against our baseline.
9

Abbasi, Mahdieh. "Toward robust deep neural networks." Doctoral thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67766.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Dans cette thèse, notre objectif est de développer des modèles d’apprentissage robustes et fiables mais précis, en particulier les Convolutional Neural Network (CNN), en présence des exemples anomalies, comme des exemples adversaires et d’échantillons hors distribution –Out-of-Distribution (OOD). Comme la première contribution, nous proposons d’estimer la confiance calibrée pour les exemples adversaires en encourageant la diversité dans un ensemble des CNNs. À cette fin, nous concevons un ensemble de spécialistes diversifiés avec un mécanisme de vote simple et efficace en termes de calcul pour prédire les exemples adversaires avec une faible confiance tout en maintenant la confiance prédicative des échantillons propres élevée. En présence de désaccord dans notre ensemble, nous prouvons qu’une borne supérieure de 0:5 + _0 peut être établie pour la confiance, conduisant à un seuil de détection global fixe de tau = 0; 5. Nous justifions analytiquement le rôle de la diversité dans notre ensemble sur l’atténuation du risque des exemples adversaires à la fois en boîte noire et en boîte blanche. Enfin, nous évaluons empiriquement la robustesse de notre ensemble aux attaques de la boîte noire et de la boîte blanche sur plusieurs données standards. La deuxième contribution vise à aborder la détection d’échantillons OOD à travers un modèle de bout en bout entraîné sur un ensemble OOD approprié. À cette fin, nous abordons la question centrale suivante : comment différencier des différents ensembles de données OOD disponibles par rapport à une tâche de distribution donnée pour sélectionner la plus appropriée, ce qui induit à son tour un modèle calibré avec un taux de détection des ensembles inaperçus de données OOD? Pour répondre à cette question, nous proposons de différencier les ensembles OOD par leur niveau de "protection" des sub-manifolds. Pour mesurer le niveau de protection, nous concevons ensuite trois nouvelles mesures efficaces en termes de calcul à l’aide d’un CNN vanille préformé. Dans une vaste série d’expériences sur les tâches de classification d’image et d’audio, nous démontrons empiriquement la capacité d’un CNN augmenté (A-CNN) et d’un CNN explicitement calibré pour détecter une portion significativement plus grande des exemples OOD. Fait intéressant, nous observons également qu’un tel A-CNN (nommé A-CNN) peut également détecter les adversaires exemples FGS en boîte noire avec des perturbations significatives. En tant que troisième contribution, nous étudions de plus près de la capacité de l’A-CNN sur la détection de types plus larges d’adversaires boîte noire (pas seulement ceux de type FGS). Pour augmenter la capacité d’A-CNN à détecter un plus grand nombre d’adversaires,nous augmentons l’ensemble d’entraînement OOD avec des échantillons interpolés inter-classes. Ensuite, nous démontrons que l’A-CNN, entraîné sur tous ces données, a un taux de détection cohérent sur tous les types des adversaires exemples invisibles. Alors que la entraînement d’un A-CNN sur des adversaires PGD ne conduit pas à un taux de détection stable sur tous les types d’adversaires, en particulier les types inaperçus. Nous évaluons également visuellement l’espace des fonctionnalités et les limites de décision dans l’espace d’entrée d’un CNN vanille et de son homologue augmenté en présence d’adversaires et de ceux qui sont propres. Par un A-CNN correctement formé, nous visons à faire un pas vers un modèle d’apprentissage debout en bout unifié et fiable avec de faibles taux de risque sur les échantillons propres et les échantillons inhabituels, par exemple, les échantillons adversaires et OOD. La dernière contribution est de présenter une application de A-CNN pour l’entraînement d’un détecteur d’objet robuste sur un ensemble de données partiellement étiquetées, en particulier un ensemble de données fusionné. La fusion de divers ensembles de données provenant de contextes similaires mais avec différents ensembles d’objets d’intérêt (OoI) est un moyen peu coûteux de créer un ensemble de données à grande échelle qui couvre un plus large spectre d’OoI. De plus, la fusion d’ensembles de données permet de réaliser un détecteur d’objet unifié, au lieu d’en avoir plusieurs séparés, ce qui entraîne une réduction des coûts de calcul et de temps. Cependant, la fusion d’ensembles de données, en particulier à partir d’un contexte similaire, entraîne de nombreuses instances d’étiquetées manquantes. Dans le but d’entraîner un détecteur d’objet robuste intégré sur un ensemble de données partiellement étiquetées mais à grande échelle, nous proposons un cadre d’entraînement auto-supervisé pour surmonter le problème des instances d’étiquettes manquantes dans les ensembles des données fusionnés. Notre cadre est évalué sur un ensemble de données fusionné avec un taux élevé d’étiquettes manquantes. Les résultats empiriques confirment la viabilité de nos pseudo-étiquettes générées pour améliorer les performances de YOLO, en tant que détecteur d’objet à la pointe de la technologie.
In this thesis, our goal is to develop robust and reliable yet accurate learning models, particularly Convolutional Neural Networks (CNNs), in the presence of adversarial examples and Out-of-Distribution (OOD) samples. As the first contribution, we propose to predict adversarial instances with high uncertainty through encouraging diversity in an ensemble of CNNs. To this end, we devise an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism to predict the adversarial examples with low confidence while keeping the predictive confidence of the clean samples high. In the presence of high entropy in our ensemble, we prove that the predictive confidence can be upper-bounded, leading to have a globally fixed threshold over the predictive confidence for identifying adversaries. We analytically justify the role of diversity in our ensemble on mitigating the risk of both black-box and white-box adversarial examples. Finally, we empirically assess the robustness of our ensemble to the black-box and the white-box attacks on several benchmark datasets.The second contribution aims to address the detection of OOD samples through an end-to-end model trained on an appropriate OOD set. To this end, we address the following central question: how to differentiate many available OOD sets w.r.t. a given in distribution task to select the most appropriate one, which in turn induces a model with a high detection rate of unseen OOD sets? To answer this question, we hypothesize that the “protection” level of in-distribution sub-manifolds by each OOD set can be a good possible property to differentiate OOD sets. To measure the protection level, we then design three novel, simple, and cost-effective metrics using a pre-trained vanilla CNN. In an extensive series of experiments on image and audio classification tasks, we empirically demonstrate the abilityof an Augmented-CNN (A-CNN) and an explicitly-calibrated CNN for detecting a significantly larger portion of unseen OOD samples, if they are trained on the most protective OOD set. Interestingly, we also observe that the A-CNN trained on the most protective OOD set (calledA-CNN) can also detect the black-box Fast Gradient Sign (FGS) adversarial examples. As the third contribution, we investigate more closely the capacity of the A-CNN on the detection of wider types of black-box adversaries. To increase the capability of A-CNN to detect a larger number of adversaries, we augment its OOD training set with some inter-class interpolated samples. Then, we demonstrate that the A-CNN trained on the most protective OOD set along with the interpolated samples has a consistent detection rate on all types of unseen adversarial examples. Where as training an A-CNN on Projected Gradient Descent (PGD) adversaries does not lead to a stable detection rate on all types of adversaries, particularly the unseen types. We also visually assess the feature space and the decision boundaries in the input space of a vanilla CNN and its augmented counterpart in the presence of adversaries and the clean ones. By a properly trained A-CNN, we aim to take a step toward a unified and reliable end-to-end learning model with small risk rates on both clean samples and the unusual ones, e.g. adversarial and OOD samples.The last contribution is to show a use-case of A-CNN for training a robust object detector on a partially-labeled dataset, particularly a merged dataset. Merging various datasets from similar contexts but with different sets of Object of Interest (OoI) is an inexpensive way to craft a large-scale dataset which covers a larger spectrum of OoIs. Moreover, merging datasets allows achieving a unified object detector, instead of having several separate ones, resultingin the reduction of computational and time costs. However, merging datasets, especially from a similar context, causes many missing-label instances. With the goal of training an integrated robust object detector on a partially-labeled but large-scale dataset, we propose a self-supervised training framework to overcome the issue of missing-label instances in the merged datasets. Our framework is evaluated on a merged dataset with a high missing-label rate. The empirical results confirm the viability of our generated pseudo-labels to enhance the performance of YOLO, as the current (to date) state-of-the-art object detector.
10

Lu, Yifei. "Deep neural networks and fraud detection." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-331833.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Kalogiras, Vasileios. "Sentiment Classification with Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217858.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Attitydanalys är ett delfält av språkteknologi (NLP) som försöker analysera känslan av skriven text. Detta är ett komplext problem som medför många utmaningar. Av denna anledning har det studerats i stor utsträckning. Under de senaste åren har traditionella maskininlärningsalgoritmer eller handgjord metodik använts och givit utmärkta resultat. Men den senaste renässansen för djupinlärning har växlat om intresse till end to end deep learning-modeller.Å ena sidan resulterar detta i mer kraftfulla modeller men å andra sidansaknas klart matematiskt resonemang eller intuition för dessa modeller. På grund av detta görs ett försök i denna avhandling med att kasta ljus på nyligen föreslagna deep learning-arkitekturer för attitydklassificering. En studie av deras olika skillnader utförs och ger empiriska resultat för hur ändringar i strukturen eller kapacitet hos modellen kan påverka exaktheten och sättet den representerar och ''förstår'' meningarna.
Sentiment analysis is a subfield of natural language processing (NLP) that attempts to analyze the sentiment of written text.It is is a complex problem that entails different challenges. For this reason, it has been studied extensively. In the past years traditional machine learning algorithms or handcrafted methodologies used to provide state of the art results. However, the recent deep learning renaissance shifted interest towards end to end deep learning models. On the one hand this resulted into more powerful models but on the other hand clear mathematical reasoning or intuition behind distinct models is still lacking. As a result, in this thesis, an attempt to shed some light on recently proposed deep learning architectures for sentiment classification is made.A study of their differences is performed as well as provide empirical results on how changes in the structure or capacity of a model can affect its accuracy and the way it represents and ''comprehends'' sentences.
12

Billman, Linnar, and Johan Hullberg. "Speech Reading with Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-360022.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Recent growth in computational power and available data has increased popularityand progress of machine learning techniques. Methods of machine learning areused for automatic speech recognition in order to allow humans to transferinformation to computers simply by speech. In the present work, we are interestedin doing this for general contexts as e.g. speakers talking on TV or newsreadersrecorded in a studio. Automatic speech recognition systems are often solely basedon acoustic data. By introducing visual data such as lip movements, robustness ofsuch system can be increased.This thesis instead investigates how well machine learning techniques can learnthe art of lip reading as a sole source for automatic speech recognition. The keyidea is to use a sequence of 24 lip coordinates to feed to the system, rather thanlearning directly from the raw video frames.This thesis designs a solution around this principle empowered by state-of-the-artmachine learning techniques such as recurrent neural networks, making use ofGPUs. We find that this design reduces computational requirements by more thana factor of 25 compared to a state-of-art machine learning solution called LipNet.This however also scales down performance to an accuracy of 80% of what LipNetachieves, while still outperforming human recognition by a factor of 150%. Theaccuracies are based on processing of yet unseen speakers.This text presents this architecture. It details its design, reports its results, andcompares its performance to an existing solution. Basedon this, it is indicated how the result can be further refined.
13

Ioannou, Yani Andrew. "Structural priors in deep neural networks." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/278976.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deep learning has in recent years come to dominate the previously separate fields of research in machine learning, computer vision, natural language understanding and speech recognition. Despite breakthroughs in training deep networks, there remains a lack of understanding of both the optimization and structure of deep networks. The approach advocated by many researchers in the field has been to train monolithic networks with excess complexity, and strong regularization --- an approach that leaves much to desire in efficiency. Instead we propose that carefully designing networks in consideration of our prior knowledge of the task and learned representation can improve the memory and compute efficiency of state-of-the art networks, and even improve generalization --- what we propose to denote as structural priors. We present two such novel structural priors for convolutional neural networks, and evaluate them in state-of-the-art image classification CNN architectures. The first of these methods proposes to exploit our knowledge of the low-rank nature of most filters learned for natural images by structuring a deep network to learn a collection of mostly small, low-rank, filters. The second addresses the filter/channel extents of convolutional filters, by learning filters with limited channel extents. The size of these channel-wise basis filters increases with the depth of the model, giving a novel sparse connection structure that resembles a tree root. Both methods are found to improve the generalization of these architectures while also decreasing the size and increasing the efficiency of their training and test-time computation. Finally, we present work towards conditional computation in deep neural networks, moving towards a method of automatically learning structural priors in deep networks. We propose a new discriminative learning model, conditional networks, that jointly exploit the accurate representation learning capabilities of deep neural networks with the efficient conditional computation of decision trees. Conditional networks yield smaller models, and offer test-time flexibility in the trade-off of computation vs. accuracy.
14

Choi, Keunwoo. "Deep neural networks for music tagging." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/46029.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this thesis, I present my hypothesis, experiment results, and discussion that are related to various aspects of deep neural networks for music tagging. Music tagging is a task to automatically predict the suitable semantic label when music is provided. Generally speaking, the input of music tagging systems can be any entity that constitutes music, e.g., audio content, lyrics, or metadata, but only the audio content is considered in this thesis. My hypothesis is that we can fi nd effective deep learning practices for the task of music tagging task that improves the classi fication performance. As a computational model to realise a music tagging system, I use deep neural networks. Combined with the research problem, the scope of this thesis is the understanding, interpretation, optimisation, and application of deep neural networks in the context of music tagging systems. The ultimate goal of this thesis is to provide insight that can help to improve deep learning-based music tagging systems. There are many smaller goals in this regard. Since using deep neural networks is a data-driven approach, it is crucial to understand the dataset. Selecting and designing a better architecture is the next topic to discuss. Since the tagging is done with audio input, preprocessing the audio signal becomes one of the important research topics. After building (or training) a music tagging system, fi nding a suitable way to re-use it for other music information retrieval tasks is a compelling topic, in addition to interpreting the trained system. The evidence presented in the thesis supports that deep neural networks are powerful and credible methods for building a music tagging system.
15

Yin, Yonghua. "Random neural networks for deep learning." Thesis, Imperial College London, 2018. http://hdl.handle.net/10044/1/64917.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The random neural network (RNN) is a mathematical model for an 'integrate and fire' spiking network that closely resembles the stochastic behaviour of neurons in mammalian brains. Since its proposal in 1989, there have been numerous investigations into the RNN's applications and learning algorithms. Deep learning (DL) has achieved great success in machine learning, but there has been no research into the properties of the RNN for DL to combine their power. This thesis intends to bridge the gap between RNNs and DL, in order to provide powerful DL tools that are faster, and that can potentially be used with less energy expenditure than existing methods. Based on the RNN function approximator proposed by Gelenbe in 1999, the approximation capability of the RNN is investigated and an efficient classifier is developed. By combining the RNN, DL and non-negative matrix factorisation, new shallow and multi-layer non-negative autoencoders are developed. The autoencoders are tested on typical image datasets and real-world datasets from different domains, and the test results yield the desired high learning accuracy. The concept of dense nuclei/clusters is examined, using RNN theory as a basis. In dense nuclei, neurons may interconnect via soma-to-soma interactions and conventional synaptic connections. A mathematical model of the dense nuclei is proposed and the transfer function can be deduced. A multi-layer architecture of the dense nuclei is constructed for DL, whose value is demonstrated by experiments on multi-channel datasets and server-state classification in cloud servers. A theoretical study into the multi-layer architecture of the standard RNN (MLRNN) for DL is presented. Based on the layer-output analyses, the MLRNN is shown to be a universal function approximator. The effects of the layer number on the learning capability and high-level representation extraction are analysed. A hypothesis for transforming the DL problem into a moment-learning problem is also presented. The power of the standard RNN for DL is investigated. The ability of the RNN with only positive parameters to conduct image convolution operations is demonstrated. The MLRNN equipped with the developed training algorithm achieves comparable or better classification at a lower computation cost than conventional DL methods.
16

Zagoruyko, Sergey. "Weight parameterizations in deep neural networks." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1129/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les réseaux de neurones multicouches ont été proposés pour la première fois il y a plus de trois décennies, et diverses architectures et paramétrages ont été explorés depuis. Récemment, les unités de traitement graphique ont permis une formation très efficace sur les réseaux neuronaux et ont permis de former des réseaux beaucoup plus grands sur des ensembles de données plus importants, ce qui a considérablement amélioré le rendement dans diverses tâches d'apprentissage supervisé. Cependant, la généralisation est encore loin du niveau humain, et il est difficile de comprendre sur quoi sont basées les décisions prises. Pour améliorer la généralisation et la compréhension, nous réexaminons les problèmes de paramétrage du poids dans les réseaux neuronaux profonds. Nous identifions les problèmes les plus importants, à notre avis, dans les architectures modernes : la profondeur du réseau, l'efficacité des paramètres et l'apprentissage de tâches multiples en même temps, et nous essayons de les aborder dans cette thèse. Nous commençons par l'un des problèmes fondamentaux de la vision par ordinateur, le patch matching, et proposons d'utiliser des réseaux neuronaux convolutifs de différentes architectures pour le résoudre, au lieu de descripteurs manuels. Ensuite, nous abordons la tâche de détection d'objets, où un réseau devrait apprendre simultanément à prédire à la fois la classe de l'objet et l'emplacement. Dans les deux tâches, nous constatons que le nombre de paramètres dans le réseau est le principal facteur déterminant sa performance, et nous explorons ce phénomène dans les réseaux résiduels. Nos résultats montrent que leur motivation initiale, la formation de réseaux plus profonds pour de meilleures représentations, ne tient pas entièrement, et des réseaux plus larges avec moins de couches peuvent être aussi efficaces que des réseaux plus profonds avec le même nombre de paramètres. Dans l'ensemble, nous présentons une étude approfondie sur les architectures et les paramétrages de poids, ainsi que sur les moyens de transférer les connaissances entre elles
Multilayer neural networks were first proposed more than three decades ago, and various architectures and parameterizations were explored since. Recently, graphics processing units enabled very efficient neural network training, and allowed training much larger networks on larger datasets, dramatically improving performance on various supervised learning tasks. However, the generalization is still far from human level, and it is difficult to understand on what the decisions made are based. To improve on generalization and understanding we revisit the problems of weight parameterizations in deep neural networks. We identify the most important, to our mind, problems in modern architectures: network depth, parameter efficiency, and learning multiple tasks at the same time, and try to address them in this thesis. We start with one of the core problems of computer vision, patch matching, and propose to use convolutional neural networks of various architectures to solve it, instead of manual hand-crafting descriptors. Then, we address the task of object detection, where a network should simultaneously learn to both predict class of the object and the location. In both tasks we find that the number of parameters in the network is the major factor determining it's performance, and explore this phenomena in residual networks. Our findings show that their original motivation, training deeper networks for better representations, does not fully hold, and wider networks with less layers can be as effective as deeper with the same number of parameters. Overall, we present an extensive study on architectures and weight parameterizations, and ways of transferring knowledge between them
17

Landeen, Trevor J. "Association Learning Via Deep Neural Networks." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7028.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deep learning has been making headlines in recent years and is often portrayed as an emerging technology on a meteoric rise towards fully sentient artificial intelligence. In reality, deep learning is the most recent renaissance of a 70 year old technology and is far from possessing true intelligence. The renewed interest is motivated by recent successes in challenging problems, the accessibility made possible by hardware developments, and dataset availability. The predecessor to deep learning, commonly known as the artificial neural network, is a computational network setup to mimic the biological neural structure found in brains. However, unlike human brains, artificial neural networks, in most cases cannot make inferences from one problem to another. As a result, developing an artificial neural network requires a large number of examples of desired behavior for a specific problem. Furthermore, developing an artificial neural network capable of solving the problem can take days, or even weeks, of computations. Two specific problems addressed in this dissertation are both input association problems. One problem challenges a neural network to identify overlapping regions in images and is used to evaluate the ability of a neural network to learn associations between inputs of similar types. The other problem asks a neural network to identify which observed wireless signals originated from observed potential sources and is used to assess the ability of a neural network to learn associations between inputs of different types. The neural network solutions to both problems introduced, discussed, and evaluated in this dissertation demonstrate deep learning’s applicability to problems which have previously attracted little attention.
18

Srivastava, Sanjana. "On foveation of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123134.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 61-63).
The human ability to recognize objects is impaired when the object is not shown in full. "Minimal images" are the smallest regions of an image that remain recognizable for humans. [26] show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of- the-art convolutional neural networks (CNNs), and are much more prominent in CNNs. We found many cases where CNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in CNNs. Our results thus reveal a new failure mode of CNNs that also affects humans to a lesser degree. They expose how fragile CNN recognition ability is for natural images even without synthetic adversarial patterns being introduced. This opens potential for CNN robustness in natural images to be brought to the human level by taking inspiration from human robustness methods. One of these is eccentricity dependence, a model of human focus in which attention to the visual input degrades proportional to distance from the focal point [7]. We demonstrate that applying the "inverted pyramid" eccentricity method, a multi-scale input transformation, makes CNNs more robust to useless background features than a standard raw-image input. Our results also find that using the inverted pyramid method generally reduces useless background pixels, therefore reducing required training data.
by Sanjana Srivastava.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
19

Wang, Shenhao. "Deep neural networks for choice analysis." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/129894.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis: Ph. D. in Computer and Urban Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, September, 2020
Cataloged from student-submitted PDF of thesis.
Includes bibliographical references (pages 117-128).
As deep neural networks (DNNs) outperform classical discrete choice models (DCMs) in many empirical studies, one pressing question is how to reconcile them in the context of choice analysis. So far researchers mainly compare their prediction accuracy, treating them as completely different modeling methods. However, DNNs and classical choice models are closely related and even complementary. This dissertation seeks to lay out a new foundation of using DNNs for choice analysis. It consists of three essays, which respectively tackle the issues of economic interpretation, architectural design, and robustness of DNNs by using classical utility theories. Essay 1 demonstrates that DNNs can provide economic information as complete as the classical DCMs.
The economic information includes choice predictions, choice probabilities, market shares, substitution patterns of alternatives, social welfare, probability derivatives, elasticities, marginal rates of substitution (MRS), and heterogeneous values of time (VOT). Unlike DCMs, DNNs can automatically learn the utility function and reveal behavioral patterns that are not prespecified by modelers. However, the economic information from DNNs can be unreliable because the automatic learning capacity is associated with three challenges: high sensitivity to hyperparameters, model non-identification, and local irregularity. To demonstrate the strength of DNNs as well as the three issues, I conduct an empirical experiment by applying the DNNs to a stated preference survey and discuss successively the full list of economic information extracted from the DNNs. Essay 2 designs a particular DNN architecture with alternative-specific utility functions (ASU-DNN) by using prior behavioral knowledge.
Theoretically, ASU-DNN reduces the estimation error of fully connected DNN (F-DNN) because of its lighter architecture and sparser connectivity, although the constraint of alternative-specific utility could cause ASU-DNN to exhibit a larger approximation error. Both ASU-DNN and F-DNN can be treated as special cases of DNN architecture design guided by utility connectivity graph (UCG). Empirically, ASU-DNN has 2-3% higher prediction accuracy than F-DNN. The alternative-specific connectivity constraint, as a domain-knowledge- based regularization method, is more effective than other regularization methods. This essay demonstrates that prior behavioral knowledge can be used to guide the architecture design of DNN, to function as an effective domain-knowledge-based regularization method, and to improve both the interpretability and predictive power of DNNs in choice analysis.
Essay 3 designs a theory-based residual neural network (TB-ResNet) with a two-stage training procedure, which synthesizes decision-making theories and DNNs in a linear manner. Three instances of TB-ResNets based on choice modeling (CM-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting (HD-ResNets) are designed. Empirically, compared to the decision-making theories, the three instances of TB-ResNets predict significantly better in the out-of-sample test and become more interpretable owing to the rich utility function augmented by DNNs. Compared to the DNNs, the TB-ResNets predict better because the decision-making theories aid in localizing and regularizing the DNN models. TB-ResNets also become more robust than DNNs because the decision-making theories stablize the local utility function and the input gradients.
This essay demonstrates that it is both feasible and desirable to combine the handcrafted utility theory and automatic utility specification, with joint improvement in prediction, interpretation, and robustness.
by Shenhao Wang.
Ph. D. in Computer and Urban Science
Ph.D.inComputerandUrbanScience Massachusetts Institute of Technology, Department of Urban Studies and Planning
20

Sunnegårdh, Christina. "Scar detection using deep neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299576.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Object detection is a computer vision method that deals with the tasks of localizing and classifying objects within an image. The number of usages for the method is constantly growing, and this thesis investigates the unexplored area of using deep neural networks for scar detection. Furthermore, the thesis investigates using the scar detector as a basis for the binary classification task of deciding whether in-the-wild images contains a scar or not. Two pre-trained object detection models, Faster R-CNN and RetinaNet, were trained on 1830 manually labeled images using different hyperparameters. Faster R-CNN Inception ResNet V2 achieved the highest results in terms of Average Precision (AP), particularly at higher IoU thresholds, closely followed by Faster R-CNN ResNet50, and finally RetinaNet. The results both indicate the superiority of Faster R-CNN compared to RetinaNet, as well as using Inception ResNet V2 as feature extractor for a large variety of object sizes. The reason is most likely due to multiple convolutional filters of different sizes operating at the same levels in the Inception ResNet network. As for inference time, RetinaNet was the fastest, followed by Faster R-CNN ResNet50 and finally Faster R-CNN Inception ResNet V2. For the binary classification task, the models were tested on a set of 200 images, where half of the images contained clearly visible scars. Faster R-CNN ResNet50 achieved the highest accuracy, followed by Faster R-CNN Inception ResNet V2 and finally RetinaNet. While the accuracy of RetinaNet suffered mainly from a low recall, Faster R-CNN Inception ResNet V2 detected some actual scars in images that had not been labeled due to low image quality, which could be a matter of subjective labeling and that the model is punished for something that at other times might be considered correct. In conclusion, this thesis shows promising results of using object detection to detect scars in images. While two-stage Faster R-CNN holds the advantage in AP for scar detection, one-stage RetinaNet holds the advantage in speed. Suggestions for future work include eliminating biases by putting more effort into labeling data as well as including training data that contain objects for which the models produced false positives. Examples of this are wounds, knuckles, and possible background objects that are visually similar to scars.
Objektdetektion är en metod inom datorseende som inkluderar både lokalisering och klassificering av objekt i bilder. Antalet användningsområden för metoden växer ständigt och denna studie undersöker det outforskade området av att använda djupa neurala nätverk för detektering av ärr. Studien utforskar även att använda detektering av ärr som grund för den binära klassificeringsuppgiften att bestämma om bilder innehåller ett synligt ärr eller inte. Två förtränade objektdetekteringsmodeller, Faster R-CNN och RetinaNet, tränades med olika hyperparametrar på 1830 manuellt märkta bilder. Faster RCNN Inception ResNet V2 uppnådde bäst resultat med avseende på average precision (AP), tätt följd av Faster R-CNN ResNet50 och slutligen RetinaNet. Resultatet indikerar både överlägsenhet av Faster R-CNN gentemot RetinaNet, såväl som att använda Inception ResNet V2 för särdragsextrahering. Detta beror med stor sannolikhet på dess användning av faltningsfilter i flera storlekar på samma nivåer i nätverket. Gällande detekteringstid per bild var RetinaNet snabbast, följd av Faster R-CNN ResNet50 och slutligen Faster R-CNN Inception ResNet V2. För den binära klassificeringsuppgiften testades modellerna på 200 bilder, där hälften av bilderna innehöll tydligt synliga ärr. Faster RCNN ResNet50 uppnådde högst träffsäkerhet, följt av Faster R-CNN Inception ResNet V2 och till sist RetinaNet. Medan träffsäkerheten för RetinaNet huvudsakligen bestraffades på grund av att ha förbisett ärr i bilder, så detekterade Faster R-CNN Inception ResNet V2 ett flertal faktiska ärr som inte datamärkts på grund av bristande bildkvalitet. Detta kan dock vara en fråga om subjektiv datamärkning och att modellen bestraffas för något som andra gånger skulle kunna anses korrekt. Sammanfattningsvis visar denna studie lovande resultat av att använda objektdetektion för att detektera ärr i bilder. Medan tvåstegsmodellen Faster R-CNN har övertaget sett till AP, har enstegsmodellen RetinaNet övertaget sett till detekteringstid. Förslag för framtida arbete inkluderar att lägga större vikt vid märkning av data för att eliminera potentiell subjektivitet, samt inkludera träningsdata innehållande objekt som modellerna misstog för ärr. Exempel på detta är öppna sår, knogar och bakgrundsobjekt som visuellt liknar ärr.
21

Habibi, Aghdam Hamed. "Understanding Road Scenes using Deep Neural Networks." Doctoral thesis, Universitat Rovira i Virgili, 2018. http://hdl.handle.net/10803/461607.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
La comprensió de les escenes de la carretera és fonamental per als automòbils autònoms. Això requereix segmentar escenes de carreteres en regions semànticament significatives i reconèixer objectes en una escena. Tot i que objectes com ara cotxes i vianants han de segmentar-se amb precisió, és possible que no sigui necessari detectar i localitzar aquests objectes en una escena. Tanmateix, detectar i classificar objectes com ara els senyals de trànsit és fonamental per ajustar-se a les regles del camí. En aquesta tesi, primer proposem un mètode per classificar senyals de trànsit amb atributs visuals i xarxes bayesianes. A continuació, proposem dues xarxes neuronals per a aquest propòsit i desenvolupem un nou mètode per crear un conjunt de models. A continuació, estudiem la sensibilitat de les xarxes neuronals contra mostres adversàries i proposem dues xarxes de denoising que s'adjunten a les xarxes de classificació per augmentar la seva estabilitat contra el soroll. A la segona part de la tesi, primer proposem una xarxa per detectar senyals de trànsit en imatges d'alta resolució en temps real i mostrar com implementar la tècnica de la finestra d'escaneig dins de la nostra xarxa utilitzant convolucions dilatades. A continuació, formulem el problema de detecció com a problema de segmentació i proposem una xarxa totalment convolucional per detectar senyals de trànsit. ? Finalment, proposem una nova xarxa totalment convolucional composta de mòduls de foc, connexions de derivació i convolucions consecutives dilatades? En l'última part de la tesi per a escenes de camins segmentinc en regions semànticament significatives i demostrar que és més accentuat i computacionalment més eficient en comparació amb xarxes similars
Comprender las escenas de la carretera es crucial para los automóviles autónomos. Esto requiere segmentar escenas de carretera en regiones semánticamente significativas y reconocer objetos en una escena. Mientras que los objetos tales como coches y peatones tienen que segmentarse con precisión, puede que no sea necesario detectar y localizar estos objetos en una escena. Sin embargo, la detección y clasificación de objetos tales como señales de tráfico es esencial para ajustarse a las reglas de la carretera. En esta tesis, proponemos un método para la clasificación de señales de tráfico utilizando atributos visuales y redes bayesianas. A continuación, proponemos dos redes neuronales para este fin y desarrollar un nuevo método para crear un conjunto de modelos. A continuación, se estudia la sensibilidad de las redes neuronales frente a las muestras adversarias y se proponen dos redes destructoras que se unen a las redes de clasificación para aumentar su estabilidad frente al ruido. En la segunda parte de la tesis, proponemos una red para detectar señales de tráfico en imágenes de alta resolución en tiempo real y mostrar cómo implementar la técnica de ventana de escaneo dentro de nuestra red usando circunvoluciones dilatadas. A continuación, formulamos el problema de detección como un problema de segmentación y proponemos una red completamente convolucional para detectar señales de tráfico. Finalmente, proponemos una nueva red totalmente convolucional compuesta de módulos de fuego, conexiones de bypass y circunvoluciones consecutivas dilatadas en la última parte de la tesis para escenarios de carretera segmentinc en regiones semánticamente significativas y muestran que es más accuarate y computacionalmente más eficiente en comparación con redes similares
Understanding road scenes is crucial for autonomous cars. This requires segmenting road scenes into semantically meaningful regions and recognizing objects in a scene. While objects such as cars and pedestrians has to be segmented accurately, it might not be necessary to detect and locate these objects in a scene. However, detecting and classifying objects such as traffic signs is essential for conforming to road rules. In this thesis, we first propose a method for classifying traffic signs using visual attributes and Bayesian networks. Then, we propose two neural network for this purpose and develop a new method for creating an ensemble of models. Next, we study sensitivity of neural networks against adversarial samples and propose two denoising networks that are attached to the classification networks to increase their stability against noise. In the second part of the thesis, we first propose a network to detect traffic signs in high-resolution images in real-time and show how to implement the scanning window technique within our network using dilated convolutions. Then, we formulate the detection problem as a segmentation problem and propose a fully convolutional network for detecting traffic signs. Finally, we propose a new fully convolutional network composed of fire modules, bypass connections and consecutive dilated convolutions in the last part of the thesis for segmenting road scenes into semantically meaningful regions and show that it is more accurate and computationally more efficient compared to similar networks.
22

Avramova, Vanya. "Curriculum Learning with Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-178453.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Curriculum learning is a machine learning technique inspired by the way humans acquire knowledge and skills: by mastering simple concepts first, and progressing through information with increasing difficulty to grasp more complex topics. Curriculum Learning, and its derivatives Self Paced Learning (SPL) and Self Paced Learning with Diversity (SPLD), have been previously applied within various machine learning contexts: Support Vector Machines (SVMs), perceptrons, and multi-layer neural networks, where they have been shown to improve both training speed and model accuracy. This project ventured to apply the techniques within the previously unexplored context of deep learning, by investigating how they affect the performance of a deep convolutional neural network (ConvNet) trained on a large labeled image dataset. The curriculum was formed by presenting the training samples to the network in order of increasing difficulty, measured by the sample's loss value based on the network's objective function. The project evaluated SPL and SPLD, and proposed two new curriculum learning sub-variants, p-SPL and p-SPLD, which allow for a smooth progresson of sample inclusion during training. The project also explored the "inversed" versions of the SPL, SPLD, p-SPL and p-SPLD techniques, where the samples were selected for the curriculum in order of decreasing difficulty. The experiments demonstrated that all learning variants perform fairly similarly, within ≈1% average test accuracy margin, based on five trained models per variant. Surprisingly, models trained with the inversed version of the algorithms performed slightly better than the standard curriculum training variants. The SPLD-Inversed, SPL-Inversed and SPLD networks also registered marginally higher accuracy results than the network trained with the usual random sample presentation. The results suggest that while sample ordering does affect the training process, the optimal order in which samples are presented may vary based on the data set and algorithm used. The project also investigated whether some samples were more beneficial for the training process than others. Based on sample difficulty, subsets of samples were removed from the training data set. The models trained on the remaining samples were compared to a default model trained on all samples. On the data set used, removing the “easiest” 10% of samples had no effect on the achieved test accuracy compared to the default model, and removing the “easiest” 40% of samples reduced model accuracy by only ≈1% (compared to ≈6% loss when 40% of the "most difficult" samples were removed, and ≈3% loss when 40% of samples were randomly removed). Taking away the "easiest" samples first (up to a certain percentage of the data set) affected the learning process less negatively than removing random samples, while removing the "most difficult" samples first had the most detrimental effect. The results suggest that the networks derived most learning value from the "difficult" samples, and that a large subset of the "easiest" samples can be excluded from training with minimal impact on the attained model accuracy. Moreover, it is possible to identify these samples early during training, which can greatly reduce the training time for these models.
23

Karlsson, Daniel. "Classifying sport videos with deep neural networks." Thesis, Umeå universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-130654.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This project aims to apply deep neural networks to classify video clips in applications used to streamline advertisements on the web. The system focuses on sport clips but can be expanded into other advertisement fields with lower accuracy and longer training times as a consequence. The main task was to find the neural network model best suited for classifying videos. To achieve this the field was researched and three network models were introduced to see how they could handle the videos. It was proposed that applying a recurrent LSTM structure at the end of an image classification network could make it well adapted to work with videos. The most popular image classification architectures are mostly convolutional neural networks and these structures are also the foundation of all three models. The results from the evaluation of the models as well as the research suggests that using a convolutional LSTM can bean efficient and powerful way of classifying videos. Further this project shows that by reducing the size of the input data with 25%, the training and evaluation time can be cut with around 50%. This comes at the cost of lower accuracy. However it is demonstrated that the performance loss can be compensated by considering more frames from the same videos during evaluation.
24

Milner, Rosanna Margaret. "Using deep neural networks for speaker diarisation." Thesis, University of Sheffield, 2016. http://etheses.whiterose.ac.uk/16567/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speaker diarisation answers the question “who spoke when?” in an audio recording. The input may vary, but a system is required to output speaker labelled segments in time. Typical stages are Speech Activity Detection (SAD), speaker segmentation and speaker clustering. Early research focussed on Conversational Telephone Speech (CTS) and Broadcast News (BN) domains before the direction shifted to meetings and, more recently, broadcast media. The British Broadcasting Corporation (BBC) supplied data through the Multi-Genre Broadcast (MGB) Challenge in 2015 which showed the difficulties speaker diarisation systems have on broadcast media data. Diarisation is typically an unsupervised task which does not use auxiliary data or information to enhance a system. However, methods which do involve supplementary data have shown promise. Five semi-supervised methods are investigated which use a combination of inputs: different channel types and transcripts. The methods involve Deep Neural Networks (DNNs) for SAD, DNNs trained for channel detection, transcript alignment, and combinations of these approaches. However, the methods are only applicable when datasets contain the required inputs. Therefore, a method involving a pretrained Speaker Separation Deep Neural Network (ssDNN) is investigated which is applicable to every dataset. This technique performs speaker clustering and speaker segmentation using DNNs successfully for meeting data and with mixed results for broadcast media. The task of diarisation focuses on two aspects: accurate segments and speaker labels. The Diarisation Error Rate (DER) does not evaluate the segmentation quality as it does not measure the number of correctly detected segments. Other metrics exist, such as boundary and purity measures, but these also mask the segmentation quality. An alternative metric is presented based on the F-measure which considers the number of hypothesis segments correctly matched to reference segments. A deeper insight into the segment quality is shown through this metric.
25

Karlsson, Jonas. "Auditory Classification of Carsby Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-355673.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This thesis explores the challenge of using deep neural networks to classify traits incars through sound recognition. These traits could include type of engine, model, or manufacturer of the car. The problem was approached by creating three different neural networks and evaluating their performance in classifying sounds of three different cars. The top scoring neural network achieved an accuracy of 61 percent, which is far from reaching the standard accuracy of modern speech recognition systems. The results do, however, show that there are some tendencies to the data that neural networks can learn. If the methods and networks presented in this report are further built upon, a greater classification performance may be achieved.
26

Wu, Chunyang. "Structured deep neural networks for speech recognition." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/276084.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deep neural networks (DNNs) and deep learning approaches yield state-of-the-art performance in a range of machine learning tasks, including automatic speech recognition. The multi-layer transformations and activation functions in DNNs, or related network variations, allow complex and difficult data to be well modelled. However, the highly distributed representations associated with these models make it hard to interpret the parameters. The whole neural network is commonly treated a ``black box''. The behaviours of activation functions and the meanings of network parameters are rarely controlled in the standard DNN training. Though a sensible performance can be achieved, the lack of interpretations to network structures and parameters causes better regularisation and adaptation on DNN models challenging. In regularisation, parameters have to be regularised universally and indiscriminately. For instance, the widely used L2 regularisation encourages all parameters to be zeros. In adaptation, it requires to re-estimate a large number of independent parameters. Adaptation schemes in this framework cannot be effectively performed when there are limited adaptation data. This thesis investigates structured deep neural networks. Special structures are explicitly designed, and they are imposed with desired interpretation to improve DNN regularisation and adaptation. For regularisation, parameters can be separately regularised based on their functions. For adaptation, parameters can be adapted in groups or partially adapted according to their roles in the network topology. Three forms of structured DNNs are proposed in this thesis. The contributions of these models are presented as follows. The first contribution of this thesis is the multi-basis adaptive neural network. This form of structured DNN introduces a set of parallel sub-networks with restricted connections. The design of restricted connectivity allows different aspects of data to be explicitly learned. Sub-network outputs are then combined, and this combination module is used as the speaker-dependent structure that can be robustly estimated for adaptation. The second contribution of this thesis is the stimulated deep neural network. This form of structured DNN relates and smooths activation functions in regions of the network. It aids the visualisation and interpretation of DNN models but also has the potential to reduce over-fitting. Novel adaptation schemes can be performed on it, taking advantages of the smooth property that the stimulated DNN offer. The third contribution of this thesis is the deep activation mixture model. Also, this form of structured DNN encourages the outputs of activation functions to achieve a smooth surface. The output of one hidden layer is explicitly modelled as the sum of a mixture model and a residual model. The mixture model forms an activation contour, and the residual model depicts fluctuations around this contour. The smoothness yielded by a mixture model helps to regularise the overall model and allows novel adaptation schemes.
27

Antoniades, Andreas. "Interpreting biomedical data via deep neural networks." Thesis, University of Surrey, 2018. http://epubs.surrey.ac.uk/845765/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Machine learning technology has taken quantum leaps in the past few years. From the rise of voice recognition as an interface to interact with our computers, to self-organising photo albums and self-driving cars. Neural networks and deep learning contributed significantly to drive this revolution. Yet, biomedicine is one of the research areas that has yet to fully embrace the possibilities of deep learning. Engaged in a cross-disciplinary subject, researchers, and clinical experts are focused on machine learning and statistical signal processing techniques. The ability to learn hierarchical features makes deep learning models highly applicable to biomedicine and researchers have started to notice this. The first works of deep learning in biomedicine are emerging with applications in diagnostics and genomics analysis. These models offer excellent accuracy, even comparable to that of human doctors. Despite the exceptional classification performance of these models, they are still used to provide \textit{quantitative} results. Diagnosing cancer proficiently and faster than a human doctor is beneficial, but automatically finding which biomarkers indicate the existence of cancerous cells would be invaluable. This type of \textit{qualitative} insight can be enabled by the hierarchical features and learning coefficients that manifest in deep models. It is this \textit{qualitative} approach that enables the interpretability of data and explainability of neural networks for biomedicine, which is the overarching aim of this thesis. As such, the aim of this thesis is to investigate the use of neural networks and deep learning models for the qualitative assessment of biomedical datasets. The first contribution is the proposition of a non-iterative, data agnostic feature selection algorithm to retain original features and provide qualitative analysis on their importance. This algorithm is employed in numerous areas including Pima Indian diabetes and children tumour detection. Next, the thesis focuses on the topic of epilepsy studied through scalp and intracranial electroencephalogram recordings of human brain activity. The second contribution promotes the use of deep learning models for the automatic generation of clinically meaningful features, as opposed to traditional handcrafted features. Convolutional neural networks are adapted to accommodate the intricacies of electroencephalogram data and trained to detect epileptiform discharges. The learning coefficients of these models are examined and found to contain clinically significant features. When combined, in a hierarchical way, these features reveal useful insights for the evaluation of treatment effectivity. The final contribution addresses the difficulty in acquiring intracranial data due to the invasive nature of the recording procedure. A non-linear brain mapping algorithm is proposed to link the electrical activities recorded on the scalp to those inside the cranium. This process improves the generalisation of models and alleviates the need for surgical procedures. %This is accomplished via an asymmetric autoencoder that accounts for differences in the dimensionality of the electroencephalogram data and improves the quality of the data.
28

Tavanaei, Amirhossein. "Spiking Neural Networks and Sparse Deep Learning." Thesis, University of Louisiana at Lafayette, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10807940.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:

This document proposes new methods for training multi-layer and deep spiking neural networks (SNNs), specifically, spiking convolutional neural networks (CNNs). Training a multi-layer spiking network poses difficulties because the output spikes do not have derivatives and the commonly used backpropagation method for non-spiking networks is not easily applied. Our methods use novel versions of the brain-like, local learning rule named spike-timing-dependent plasticity (STDP) that incorporates supervised and unsupervised components. Our method starts with conventional learning methods and converts them to spatio-temporally local rules suited for SNNs.

The training uses two components for unsupervised feature extraction and supervised classification. The first component refers to new STDP rules for spike-based representation learning that trains convolutional filters and initial representations. The second introduces new STDP-based supervised learning rules for spike pattern classification via an approximation to gradient descent by combining the STDP and anti-STDP rules. Specifically, the STDP-based supervised learning model approximates gradient descent by using temporally local STDP rules. Stacking these components implements a novel sparse, spiking deep learning model. Our spiking deep learning model is categorized as a variation of spiking CNNs of integrate-and-fire (IF) neurons with performance comparable with the state-of-the-art deep SNNs. The experimental results show the success of the proposed model for image classification. Our network architecture is the only spiking CNN which provides bio-inspired STDP rules in a hierarchy of feature extraction and classification in an entirely spike-based framework.

29

Zhang, Jeffrey M. Eng Massachusetts Institute of Technology. "Enhancing adversarial robustness of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122994.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 57-58).
Logit-based regularization and pretrain-then-tune are two approaches that have recently been shown to enhance adversarial robustness of machine learning models. In the realm of regularization, Zhang et al. (2019) proposed TRADES, a logit-based regularization optimization function that has been shown to improve upon the robust optimization framework developed by Madry et al. (2018) [14, 9]. They were able to achieve state-of-the-art adversarial accuracy on CIFAR10. In the realm of pretrain- then-tune models, Hendrycks el al. (2019) demonstrated that adversarially pretraining a model on ImageNet then adversarially tuning on CIFAR10 greatly improves the adversarial robustness of machine learning models. In this work, we propose Adversarial Regularization, another logit-based regularization optimization framework that surpasses TRADES in adversarial generalization. Furthermore, we explore the impact of trying different types of adversarial training on the pretrain-then-tune paradigm.
by Jeffry Zhang.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
30

Miglani, Vivek N. "Comparing learned representations of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123048.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 63-64).
In recent years, a variety of deep neural network architectures have obtained substantial accuracy improvements in tasks such as image classification, speech recognition, and machine translation, yet little is known about how different neural networks learn. To further understand this, we interpret the function of a deep neural network used for classification as converting inputs to a hidden representation in a high dimensional space and applying a linear classifier in this space. This work focuses on comparing these representations as well as the learned input features for different state-of-the-art convolutional neural network architectures. By focusing on the geometry of this representation, we find that different network architectures trained on the same task have hidden representations which are related by linear transformations. We find that retraining the same network architecture with a different initialization does not necessarily lead to more similar representation geometry for most architectures, but the ResNeXt architecture consistently learns similar features and hidden representation geometry. We also study connections to adversarial examples and observe that networks with more similar hidden representation geometries also exhibit higher rates of adversarial example transferability.
by Vivek N. Miglani.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
31

Wang, Yuxuan. "Supervised Speech Separation Using Deep Neural Networks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1426366690.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
32

Peng, Zeng. "Pedestrian Tracking by using Deep Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302107.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This project aims at using deep learning to solve the pedestrian tracking problem for Autonomous driving usage. The research area is in the domain of computer vision and deep learning. Multi-Object Tracking (MOT) aims at tracking multiple targets simultaneously in a video data. The main application scenarios of MOT are security monitoring and autonomous driving. In these scenarios, we often need to track many targets at the same time which is not possible with only object detection or single object tracking algorithms for their lack of stability and usability. Therefore we need to explore the area of multiple object tracking. The proposed method breaks the MOT into different stages and utilizes the motion and appearance information of targets to track them in the video data. We used three different object detectors to detect the pedestrians in frames, a person re-identification model as appearance feature extractor and Kalman filter as motion predictor. Our proposed model achieves 47.6% MOT accuracy and 53.2% in IDF1 score while the results obtained by the model without person re-identification module is only 44.8% and 45.8% respectively. Our experiment results indicate the fact that a robust multiple object tracking algorithm can be achieved by splitted tasks and improved by the representative DNN based appearance features.
Detta projekt syftar till att använda djupinlärning för att lösa problemet med att följa fotgängare för autonom körning. For ligger inom datorseende och djupinlärning. Multi-Objekt-följning (MOT) syftar till att följa flera mål samtidigt i videodata. de viktigaste applikationsscenarierna för MOT är säkerhetsövervakning och autonom körning. I dessa scenarier behöver vi ofta följa många mål samtidigt, vilket inte är möjligt med endast objektdetektering eller algoritmer för enkel följning av objekt för deras bristande stabilitet och användbarhet, därför måste utforska området för multipel objektspårning. Vår metod bryter MOT i olika steg och använder rörelse- och utseendinformation för mål för att spåra dem i videodata, vi använde tre olika objektdetektorer för att upptäcka fotgängare i ramar en personidentifieringsmodell som utseendefunktionsavskiljare och Kalmanfilter som rörelsesprediktor. Vår föreslagna modell uppnår 47,6 % MOT-noggrannhet och 53,2 % i IDF1 medan resultaten som erhållits av modellen utan personåteridentifieringsmodul är endast 44,8%respektive 45,8 %. Våra experimentresultat visade att den robusta algoritmen för multipel objektspårning kan uppnås genom delade uppgifter och förbättras av de representativa DNN-baserade utseendefunktionerna.
33

Novoa, Ilic José Eduardo. "Robust speech recognition in noisy and reverberant environments using deep neural network-based systems." Tesis, Universidad de Chile, 2018. http://repositorio.uchile.cl/handle/2250/168062.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Doctor en Ingeniería Eléctrica
In this thesis an uncertainty weighting scheme for deep neural network-hidden Markov model (DNN-HMM) based automatic speech recognition (ASR) is proposed to increase discriminability in the decoding process. To this end, the DNN pseudo-log-likelihoods are weighted according to the uncertainty variance assigned to the acoustic observation. The results presented here suggest that substantial reduction in word error rate (WER) is achieved with clean training. Moreover, modelling the uncertainty propagation through the DNN is not required and no approximations for non linear activation functions are made. The presented method can be applied to any network topology that delivers log likelihood-like scores. It can be combined with any noise removal technique and adds a minimal computational cost. This technique was exhaustively evaluated and combined with uncertainty-propagation-based schemes for computing the pseudo-log-likelihoods and uncertainty variance at the DNN output. Two proposed methods optimized the parameters of the weighting function by leveraging the grid search either on a development database representing the given task or on each utterance based on discrimination metrics. Experiments with Aurora-4 task showed that, with clean training, the proposed weighting scheme can reduce WER by a maximum of 21% compared with a baseline system with spectral subtraction and uncertainty propagation using the unscented transform. Additionally, it is proposed to replace the classical black box integration of automatic speech recognition technology in human-robot interaction (HRI) applications with the incorporation of the HRI environment representation and modeling, and the robot and user states and contexts. Accordingly, this thesis focuses on the environment representation and modeling by training a DNN-HMM based automatic speech recognition engine combining clean utterances with the acoustic channel responses and noise that were obtained from an HRI testbed built with a PR2 mobile manipulation robot. This method avoids recording a training database in all the possible acoustic environments given an HRI scenario. In the generated testbed, the resulting ASR engine provided a WER that is at least 26% and 38% lower than publicly available speech recognition application programming interfaces (APIs) with the loudspeaker and human speakers testing databases, respectively, with a limited amount of training data. This thesis demonstrates that even state-of-the-art DNN-HMM based speech recognizers can benefit by combining systems for which the acoustic models have been trained using different feature sets. In this context, the complementarity of DNN-HMM based ASR systems trained with the same data set but with different signal representations is discussed. DNN fusion methods based on flat-weight combination, the minimization of mutual information and the maximization of discrimination metrics were proposed and tested. Schemes that consider the combination of ASR systems with lattice combination and minimum Bayes risk decoding were also evaluated and combined with DNN fusion techniques. The experimental results were obtained using a publicly-available naturally-recorded highly reverberant speech data. Significant improvements in WER were observed by combining DNN-HMM based ASR systems with different feature sets, obtaining relative improvements of 10% with two classifiers and 18% with four classifiers, without any tuning or a priori information of the ASR accuracy.
34

Tamascelli, Nicola. "A Machine Learning Approach to Predict Chattering Alarms." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The alarm system plays a vital role to grant safety and reliability in the process industry. Ideally, an alarm should inform the operator about critical conditions only; during alarm floods, the operator may be overwhelmed by several alarms in a short time span. Crucial alarms are more likely to be missed during these situations. Poor alarm management is one of the main causes of unintended plant shut down, incidents and near misses in the chemical industry. Most of the alarms triggered during a flood episode are nuisance alarms –i.e. alarms that do not communicate new information to the operator, or alarms that do not require an operator action. Chattering alarms –i.e. that repeat three or more times in a minute, and redundant alarms –i.e. duplicated alarms, are common forms of nuisance. Identifying nuisance alarms is a key step to improve the performance of the alarm system. Advanced techniques for alarm rationalization have been developed, proposing methods to quantify chattering, redundancy and correlation between alarms. Although very effective, these techniques produce static results. Machine Learning appears to be an interesting opportunity to retrieve further knowledge and support these techniques. This knowledge can be used to produce more flexible and dynamic models, as well as to predict alarm behaviour during floods. The aim of this study is to develop a machine learning-based algorithm for real-time alarm classification and rationalization, whose results can be used to support the operator decision-making procedure. Specifically, efforts have been directed towards chattering prediction during alarm floods. Advanced techniques for chattering, redundancy and correlation assessment have been performed on a real industrial alarm database. A modified approach has been developed to dynamically assess chattering, and the results have been used to train three different machine learning models, whose performance has been evaluated and discussed.
35

Pons, Puig Jordi. "Deep neural networks for music and audio tagging." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/668036.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Automatic music and audio tagging can help increase the retrieval and re-use possibilities of many audio databases that remain poorly labeled. In this dissertation, we tackle the task of music and audio tagging from the deep learning perspective and, within that context, we address the following research questions: (i) Which deep learning architectures are most appropriate for (music) audio signals? (ii) In which scenarios is waveform-based end-to-end learning feasible? (iii) How much data is required for carrying out competitive deep learning research? In pursuit of answering research question (i), we propose to use musically motivated convolutional neural networks as an alternative to designing deep learning models that is based on domain knowledge, and we evaluate several deep learning architectures for audio at a low computational cost with a novel methodology based on non-trained (randomly weighted) convolutional neural networks. Throughout our work, we find that employing music and audio domain knowledge during the model’s design can help improve the efficiency, interpretability, and performance of spectrogram-based deep learning models. For research questions (ii) and (iii), we perform a study with the SampleCNN, a recently proposed end-to-end learning model, to assess its viability for music audio tagging when variable amounts of training data —ranging from 25k to 1.2M songs— are available. We compare the SampleCNN against a spectrogram-based architecture that is musically motivated and conclude that, given enough data, end-to-end learning models can achieve better results. Finally, throughout our quest for answering research question (iii), we also investigate whether a naive regularization of the solution space, prototypical networks, transfer learning, or their combination, can foster deep learning models to better leverage a small number of training examples. Results indicate that transfer learning and prototypical networks are powerful strategies in such low-data regimes.
L’etiquetatge automàtic d’àudio i de música pot augmentar les possibilitats de reutilització de moltes de les bases de dades d’àudio que romanen pràcticament sense etiquetar. En aquesta tesi, abordem la tasca de l’etiquetatge automàtic d’àudio i de música des de la perspectiva de l’aprenentatge profund i, en aquest context, abordem les següents qüestions cientı́fiques: (i) Quines arquitectures d’aprenentatge profund són les més adients per a senyals d’àudio (musicals)? (ii) En quins escenaris és viable que els models d’aprenentatge profund processin directament formes d’ona? (iii) Quantes dades es necessiten per dur a terme estudis d’investigació en aprenentatge profund? Per tal de respondre a la primera pregunta (i), proposem utilitzar xarxes neuronals convolucionals motivades musicalment i avaluem diverses arquitectures d’aprenentatge profund per a àudio a un baix cost computacional. Al llarg de les nostres investigacions, trobem que els coneixements previs que tenim sobre la música i l’àudio ens poden ajudar a millorar l’eficiència, la interpretabilitat i el rendiment dels models d’aprenentatge basats en espectrogrames. Per a les preguntes (ii – iii) estudiem com el SampleCNN, un model d’aprenentatge profund que processa formes d’ona, funciona quan disposem de quantitats variables de dades d’entrenament — des de 25k cançons fins a 1’2M cançons. En aquest estudi, comparem el SampleCNN amb una arquitectura basada en espectrogrames que està motivada musicalment. Els resultats experimentals que obtenim indiquen que, en escenaris on disposem de suficients dades, els models d’aprenentatge profund que processen formes d’ona (com el SampleCNN) poden aconseguir millors resultats que els que processen espectrogrames. Finalment, per tal d’intentar respondre a la pregunta (iii), també investiguem si una regularització severa de l’espai de solucions, les xarxes prototipades, l’aprenentatge per transferència de coneixement, o la seva combinació, poden permetre als models d’aprenentatge profund obtenir més bons resultats en escenaris on no hi ha gaires dades d’entrenament. Els resultats dels nostres experiments indiquen que l’aprenentatge per transferència de coneixement i les xarxes prototipades són estratègies útils quan les dades d’entrenament no són abundants.
36

Purmonen, Sami. "Predicting Game Level Difficulty Using Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217140.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We explored the usage of Monte Carlo tree search (MCTS) and deep learning in order to predict game level difficulty in Candy Crush Saga (Candy) measured as number of attempts per success. A deep neural network (DNN) was trained to predict moves from game states from large amounts of game play data. The DNN played a diverse set of levels in Candy and a regression model was fitted to predict human difficulty from bot difficulty. We compared our results to an MCTS bot. Our results show that the DNN can make estimations of game level difficulty comparable to MCTS in substantially shorter time.
Vi utforskade användning av Monte Carlo tree search (MCTS) och deep learning för attuppskatta banors svårighetsgrad i Candy Crush Saga (Candy). Ett deep neural network(DNN) tränades för att förutse speldrag från spelbanor från stora mängder speldata. DNN:en spelade en varierad mängd banor i Candy och en modell byggdes för att förutsemänsklig svårighetsgrad från DNN:ens svårighetsgrad. Resultatet jämfördes medMCTS. Våra resultat indikerar att DNN:ens kan göra uppskattningar jämförbara medMCTS men på substantiellt kortare tid.
37

Winsnes, Casper. "Automatic Subcellular Protein Localization Using Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189991.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Protein localization is an important part in understanding the functionality of a protein. The current method of localizing proteins is to manually annotate microscopy images. This thesis investigates the feasibility of using deep artificial neural networks to automatically classify subcellular protein locations based on immunoflourescent images. We investigate the applicability in both single-label and multi-label classification, as well as cross cell line classification. We show that deep single-label neural networks can be used for protein localization with up to 73% accuracy. We also show the potential of deep multi-label neural networks for protein localization and cross cell line classification but conclude that more research is needed before we can say for certain that the method is applicable.
38

Peterson, Joshua C. "Leveraging Deep Neural Networks to Study Human Cognition." Thesis, University of California, Berkeley, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10930700.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:

The majority of computational theories of inductive processes in psychology derive from small-scale experiments with simple stimuli that are easy to represent. However, real-world stimuli are complex, hard to represent efficiently, and likely require very different cognitive strategies to cope with. Indeed, the difficulty of such tasks are part of what make humans so impressive, yet methodological resources for modeling their solutions are limited. This presents a fundamental challenge to the precision of psychology as a science, especially if traditional laboratory methods fail to generalize. Recently, a number of computationally tractable, data-driven methods such as deep neural networks have emerged in machine learning for deriving useful representations of complex perceptual stimuli, but they are explicitly optimized in service to engineering objectives rather than modeling human cognition. It has remained unclear to what extent engineering models, while often state-of-the-art in terms of human-level task performance, can be leveraged to model, predict, and understand humans.

In the following, I outline a methodology by which psychological research can confidently leverage representations learned by deep neural networks to model and predict complex human behavior, potentially extending the scope of the field. In Chapter 1, I discuss the challenges to ecological validity in the laboratory that may be partially circumvented by technological advances and trends in machine learning, and weigh the advantages and disadvantages of bootstrapping from largely uninterpretable models. In Chapter 2, I contrast methods from psychology and machine learning for representing complex stimuli like images. Chapter 3 provides a first case study of applying deep neural networks to predict whether objects in a large database of images will be remembered by humans. Chapter 4 provides the central argument for using representations from deep neural networks as proxies for human psychological representations in general. To do this, I establish and demonstrate methods for quantifying their correspondence, improving their correspondence with minimal cost, and applying the result to the modeling of downstream cognitive processes. Building on this, Chapter 5 develops a method for modeling human subjective probability over deep representations in order to capture multimodal mental visual concepts such as "landscape". Finally, in Chapter 6, I discuss the implications of the overall paradigm espoused in the current work, along with the most crucial challenges ahead and potential ways forward. The overall endeavor is almost certainly a stepping stone to methods that may look very different in the near future, as the gains in leveraging machine learning methods are consolidated and made more interpretable/useful. The hope is that a synergy can be formed between the two fields, each bootstrapping and learning from the other.

39

Lenc, Karel. "Representation of spatial transformations in deep neural networks." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:87a16dc2-9d77-49c3-8096-cf3416fa6893.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection. Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well. Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived covariance constraint can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification. The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.
40

Paula, Thomas da Silva. "Contributions in face detection with deep neural networks." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2017. http://tede2.pucrs.br/tede2/handle/tede/7563.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Submitted by Caroline Xavier (caroline.xavier@pucrs.br) on 2017-07-04T12:23:43Z No. of bitstreams: 1 DIS_THOMAS_DA_SILVA_PAULA_COMPLETO.pdf: 10601063 bytes, checksum: f63f9b6e33e22c4a2553f784a3a029e1 (MD5)
Made available in DSpace on 2017-07-04T12:23:44Z (GMT). No. of bitstreams: 1 DIS_THOMAS_DA_SILVA_PAULA_COMPLETO.pdf: 10601063 bytes, checksum: f63f9b6e33e22c4a2553f784a3a029e1 (MD5) Previous issue date: 2017-03-28
Reconhecimento facial ? um dos assuntos mais estudos no campo de Vis?o Computacional. Dada uma imagem arbitr?ria ou um frame arbitr?rio, o objetivo do reconhecimento facial ? determinar se existem faces na imagem e, se existirem, obter a localiza??o e a extens?o de cada face encontrada. Tal detec??o ? facilmente feita por seres humanos, por?m continua sendo um desafio em Vis?o Computacional. O alto grau de variabilidade e a dinamicidade da face humana tornam-a dif?cil de detectar, principalmente em ambientes complexos. Recentementemente, abordagens de Aprendizado Profundo come?aram a ser utilizadas em tarefas de Vis?o Computacional com bons resultados. Tais resultados abriram novas possibilidades de pesquisa em diferentes aplica??es, incluindo Reconhecimento Facial. Embora abordagens de Aprendizado Profundo tenham sido aplicadas com sucesso para tal tarefa, a maior parte das implementa??es estado da arte utilizam detectores faciais off-the-shelf e n?o avaliam as diferen?as entre eles. Em outros casos, os detectores faciais s?o treinados para m?ltiplas tarefas, como detec??o de pontos fiduciais, detec??o de idade, entre outros. Portanto, n?s temos tr?s principais objetivos. Primeiramente, n?s resumimos e explicamos alguns avan?os do Aprendizado Profundo, detalhando como cada arquitetura e implementa??o funcionam. Depois, focamos no problema de detec??o facial em si, realizando uma rigorosa an?lise de alguns dos detectores existentes assim como algumas implementa??es nossas. N?s experimentamos e avaliamos varia??es de alguns hiper-par?metros para cada um dos detectores e seu impacto em diferentes bases de dados. N?s exploramos tanto implementa??es tradicionais quanto mais recentes, al?m de implementarmos nosso pr?prio detector facial. Por fim, n?s implementamos, testamos e comparamos uma abordagem de meta-aprendizado para detec??o facial, que visa aprender qual o melhor detector facial para uma determinada imagem. Nossos experimentos contribuem para o entendimento do papel do Aprendizado Profundo em detec??o facial, assim como os detalhes relacionados a mudan?a de hiper-par?metros dos detectores faciais e seu impacto no resultado da detec??o facial. N?s tamb?m mostramos o qu?o bem features obtidas com redes neurais profundas ? treinadas em bases de dados de prop?sito geral ? combinadas com uma abordagem de meta-aprendizado, se aplicam a detec??o facial. Nossos experimentos e conclus?es mostram que o aprendizado profundo possui de fato um papel not?vel em detec??o facial.
Face Detection is one of the most studied subjects in the Computer Vision field. Given an arbitrary image or video frame, the goal of face detection is to determine whether there are any faces in the image and, if present, return the image location and the extent of each face. Such a detection is easily done by humans, but it is still a challenge within Computer Vision. The high degree of variability and the dynamicity of the human face makes it an object very difficult to detect, mainly in complex environments. Recently, Deep Learning approaches started to be applied for Computer Vision tasks with great results. They opened new research possibilities in different applications, including Face Detection. Even though Deep Learning has been successfully applied for such a task, most of the state-of-the-art implementations make use of off-the-shelf face detectors and do not evaluate differences among them. In other cases, the face detectors are trained in a multitask manner that includes face landmark detection, age detection, and so on. Hence, our goal is threefold. First, we summarize and explain many advances of deep learning, detailing how each different architecture and implementation work. Second, we focus on the face detection problem itself, performing a rigorous analysis of some of the existing face detectors as well as implementations of our own. We experiment and evaluate variations of hyper-parameters for each of the detectors and their impact in different datasets. We explore both traditional and more recent approaches, as well as implementing our own face detectors. Finally, we implement, test, and compare a meta learning approach for face detection, which aims to learn the best face detector for a given image. Our experiments contribute in understanding the role of deep learning in face detection as well as the subtleties of changing hyper-parameters of the face detectors and their impact in face detection. We also show how well features obtained with deep neural networks trained on a general-purpose dataset perform on a meta learning approach for face detection. Our experiments and conclusions show that deep learning has indeed a notable role in face detection.
41

Pitkänen, P. (Perttu). "Automatic image quality enhancement using deep neural networks." Master's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201904101454.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract. Photo retouching can significantly improve image quality and it is considered an essential part of photography. Traditionally this task has been completed manually with special image enhancement software. However, recent research utilizing neural networks has been proven to perform better in the automated image enhancement task compared to traditional methods. During the literature review of this thesis, multiple automatic neural-network-based image enhancement methods were studied, and one of these methods was chosen for closer examination and evaluation. The chosen network design has several appealing qualities such as the ability to learn both local and global enhancements, and its simple architecture constructed for efficient computational speed. This research proposes a novel dataset generation method for automated image enhancement research, and tests its usefulness with the chosen network design. This dataset generation method simulates commonly occurring photographic errors, and the original high-quality images can be used as the target data. This dataset design allows studying fixes for individual and combined aberrations. The underlying idea of this design choice is that the network would learn to fix these aberrations while producing aesthetically pleasing and consistent results. The quantitative evaluation proved that the network can learn to counter these errors, and with greater effort, it could also learn to enhance all of these aspects simultaneously. Additionally, the network’s capability of learning local and portrait specific enhancement tasks were evaluated. The models can apply the effect successfully, but the results did not gain the same level of accuracy as with global enhancement tasks. According to the completed qualitative survey, the images enhanced by the proposed general enhancement model can successfully enhance the image quality, and it can perform better than some of the state-of-the-art image enhancement methods.Automaattinen kuvanlaadun parantaminen käyttämällä syviä neuroverkkoja. Tiivistelmä. Manuaalinen valokuvien käsittely voi parantaa kuvanlaatua huomattavasti ja sitä pidetään oleellisena osana valokuvausprosessia. Perinteisesti tätä tehtävää varten on käytetty erityisiä manuaalisesti operoitavia kuvankäsittelyohjelmia. Nykytutkimus on kuitenkin todistanut neuroverkkojen paremmuuden automaattisessa kuvanparannussovelluksissa perinteisiin menetelmiin verrattuna. Tämän diplomityön kirjallisuuskatsauksessa tutkittiin useita neuroverkkopohjaisia kuvanparannusmenetelmiä, ja yksi näistä valittiin tarkempaa tutkimusta ja arviointia varten. Valitulla verkkomallilla on useita vetoavia ominaisuuksia, kuten paikallisten sekä globaalien kuvanparannusten oppiminen ja sen yksinkertaistettu arkkitehtuuri, joka on rakennettu tehokasta suoritusnopeutta varten. Tämä tutkimus esittää uuden opetusdatan generointimenetelmän automaattisia kuvanparannusmetodeja varten, ja testaa sen soveltuvuutta käyttämällä valittua neuroverkkorakennetta. Tämä opetusdatan generointimenetelmä simuloi usein esiintyviä valokuvauksellisia virheitä, ja alkuperäisiä korkealaatuisia kuvia voi käyttää opetuksen tavoitedatana. Tämän generointitavan avulla voitiin tutkia erillisten valokuvausvirheiden, sekä näiden yhdistelmän korjausta. Tämän menetelmän tarkoitus oli opettaa verkkoa korjaamaan erilaisia virheitä sekä tuottamaan esteettisesti miellyttäviä ja yhtenäisiä tuloksia. Kvalitatiivinen arviointi todisti, että käytetty neuroverkko kykenee oppimaan erillisiä korjauksia näille virheille. Neuroverkko pystyy oppimaan myös mallin, joka korjaa kaikkia ennalta määrättyjä virheitä samanaikaisesti, mutta alhaisemmalla tarkkuudella. Lisäksi neuroverkon kyvykkyyttä oppia paikallisia muotokuvakohtaisia kuvanparannuksia arvioitiin. Koulutetut mallit pystyvät myös toteuttamaan paikallisen kuvanparannuksen onnistuneesti, mutta nämä mallit eivät yltäneet globaalien parannusten tasolle. Toteutetun kyselytutkimuksen mukaan esitetty yleisen kuvanparannuksen malli pystyy parantamaan kuvanlaatua onnistuneesti, sekä tuottaa parempia tuloksia kuin osa vertailluista kuvanparannustekniikoista.
42

Wu, Jimmy M. Eng Massachusetts Institute of Technology. "Robotic object pose estimation with deep neural networks." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119699.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 39-45).
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively-annotated object pose data, our pose interpreter network is trained entirely on synthetic data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmentation model trained on RGB images, our synthetically-trained pose interpreter network is able to generalize to real data. Our end-to-end system for object pose estimation runs in real-time (20 Hz) on live RGB data, without using depth information or ICP refinement.
by Jimmy Wu.
M. Eng.
43

Larsson, Susanna. "Monocular Depth Estimation Using Deep Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159981.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
For a long time stereo-cameras have been deployed in visual Simultaneous Localization And Mapping (SLAM) systems to gain 3D information. Even though stereo-cameras show good performance, the main disadvantage is the complex and expensive hardware setup it requires, which limits the use of the system. A simpler and cheaper alternative are monocular cameras, however monocular images lack the important depth information. Recent works have shown that having access to depth maps in monocular SLAM system is beneficial since they can be used to improve the 3D reconstruction. This work proposes a deep neural network that predicts dense high-resolution depth maps from monocular RGB images by casting the problem as a supervised regression task. The network architecture follows an encoder-decoder structure in which multi-scale information is captured and skip-connections are used to recover details. The network is trained and evaluated on the KITTI dataset achieving results comparable to state-of-the-art methods. With further development, this network shows good potential to be incorporated in a monocular SLAM system to improve the 3D reconstruction.
44

Venigalla, Abhinav S. "Strongly-transferring memorized examples in deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123124.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 59-60).
Training deep neural networks requires large quantities of labeled training data, on the order of thousands of examples per class. These requirements make model training both time-consuming and expensive, which provides an incentive for adversaries to steal, or copy, other users' models. In this work, we examine a recent defense method called neural network watermarking via memorized examples, where an owner intentionally trains his model to mislabel particular inputs. We try to isolate the mechanism by which memorized examples are learned by a model in order to better evaluate their robustness. We find that memorized examples are indeed strongly embedded in trained models and actually transfer to stolen models under one form of model stealing. When access to local input-logit gradient information is used by an attacker, the stolen model also learns to mislabel the memorized examples. We show that this transfer is robust to architecture mismatch and perturbations of the query set used for stealing. We present different possible mechanisms for memorized example transfer and find that local input geometry is insufficient to explain the phenomenon. Finally, we describe a simple method for a model owner to boost the transfer rate of memorized examples, increasing their effectiveness as a defense against model stealing.
by Abhinav S. Venigalla.
M. Eng. in Computer Science and Engineering
M.Eng.inComputerScienceandEngineering Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
45

Boukli, Hacene Ghouthi. "Processing and learning deep neural networks on chip." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2019. http://www.theses.fr/2019IMTA0153/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Dans le domaine de l'apprentissage machine, les réseaux de neurones profonds sont devenus la référence incontournable pour un très grand nombre de problèmes. Ces systèmes sont constitués par un assemblage de couches, lesquelles réalisent des traitements élémentaires, paramétrés par un grand nombre de variables. À l'aide de données disponibles pendant une phase d'apprentissage, ces variables sont ajustées de façon à ce que le réseau de neurones réponde à la tâche donnée. Il est ensuite possible de traiter de nouvelles données. Si ces méthodes atteignent les performances à l'état de l'art dans bien des cas, ils reposent pour cela sur un très grand nombre de paramètres, et donc des complexités en mémoire et en calculs importantes. De fait, ils sont souvent peu adaptés à l'implémentation matérielle sur des systèmes contraints en ressources. Par ailleurs, l'apprentissage requiert de repasser sur les données d'entraînement plusieurs fois, et s'adapte donc difficilement à des scénarios où de nouvelles informations apparaissent au fil de l'eau. Dans cette thèse, nous nous intéressons dans un premier temps aux méthodes permettant de réduire l'impact en calculs et en mémoire des réseaux de neurones profonds. Nous proposons dans un second temps des techniques permettant d'effectuer l'apprentissage au fil de l'eau, dans un contexte embarqué
In the field of machine learning, deep neural networks have become the inescapablereference for a very large number of problems. These systems are made of an assembly of layers,performing elementary operations, and using a large number of tunable variables. Using dataavailable during a learning phase, these variables are adjusted such that the neural networkaddresses the given task. It is then possible to process new data.To achieve state-of-the-art performance, in many cases these methods rely on a very largenumber of parameters, and thus large memory and computational costs. Therefore, they are oftennot very adapted to a hardware implementation on constrained resources systems. Moreover, thelearning process requires to reuse the training data several times, making it difficult to adapt toscenarios where new information appears on the fly.In this thesis, we are first interested in methods allowing to reduce the impact of computations andmemory required by deep neural networks. Secondly, we propose techniques for learning on thefly, in an embedded context
46

Conway, Alexander. "Deep neural networks for video classification in ecology." Master's thesis, University of Cape Town, 2020. http://hdl.handle.net/11427/32520.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Analyzing large volumes of video data is a challenging and time-consuming task. Automating this process would very valuable, especially in ecological research where massive amounts of video can be used to unlock new avenues of ecological research into the behaviour of animals in their environments. Deep Neural Networks, particularly Deep Convolutional Neural Networks, are a powerful class of models for computer vision. When combined with Recurrent Neural Networks, Deep Convolutional models can be applied to video for frame level video classification. This research studies two datasets: penguins and seals. The purpose of the research is to compare the performance of image-only CNNs, which treat each frame of a video independently, against a combined CNN-RNN approach; and to assess whether incorporating the motion information in the temporal aspect of video improves the accuracy of classifications in these two datasets. Video and image-only models offer similar out-of-sample performance on the simpler seals dataset but the video model led to moderate performance improvements on the more complex penguin action recognition dataset.
47

Hocquet, Guillaume. "Class Incremental Continual Learning in Deep Neural Networks." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPAST070.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Nous nous intéressons au problème de l'apprentissage continu de réseaux de neurones artificiels dans le cas où les données ne sont accessibles que pour une seule catégorie à la fois. Pour remédier au problème de l'oubli catastrophique qui limite les performances d'apprentissage dans ces conditions, nous proposons une approche basée sur la représentation des données d'une catégorie par une loi normale. Les transformations associées à ces représentations sont effectuées à l'aide de réseaux inversibles, qui peuvent alors être entraînés avec les données d'une seule catégorie. Chaque catégorie se voit attribuer un réseau pour représenter ses caractéristiques. Prédire la catégorie revient alors à identifier le réseau le plus représentatif. L'avantage d'une telle approche est qu'une fois qu'un réseau est entraîné, il n'est plus nécessaire de le mettre à jour par la suite, chaque réseau étant indépendant des autres. C'est cette propriété particulièrement avantageuse qui démarque notre méthode des précédents travaux dans ce domaine. Nous appuyons notre démonstration sur des expériences réalisées sur divers jeux de données et montrons que notre approche fonctionne favorablement comparé à l'état de l'art. Dans un second temps, nous proposons d'optimiser notre approche en réduisant son impact en mémoire en factorisant les paramètres des réseaux. Il est alors possible de réduire significativement le coût de stockage de ces réseaux avec une perte de performances limitée. Enfin, nous étudions également des stratégies pour produire des réseaux capables d'être réutilisés sur le long terme et nous montrons leur pertinence par rapport aux réseaux traditionnellement utilisés pour l'apprentissage continu
We are interested in the problem of continual learning of artificial neural networks in the case where the data are available for only one class at a time. To address the problem of catastrophic forgetting that restrain the learning performances in these conditions, we propose an approach based on the representation of the data of a class by a normal distribution. The transformations associated with these representations are performed using invertible neural networks, which can be trained with the data of a single class. Each class is assigned a network that will model its features. In this setting, predicting the class of a sample corresponds to identifying the network that best fit the sample. The advantage of such an approach is that once a network is trained, it is no longer necessary to update it later, as each network is independent of the others. It is this particularly advantageous property that sets our method apart from previous work in this area. We support our demonstration with experiments performed on various datasets and show that our approach performs favorably compared to the state of the art. Subsequently, we propose to optimize our approach by reducing its impact on memory by factoring the network parameters. It is then possible to significantly reduce the storage cost of these networks with a limited performance loss. Finally, we also study strategies to produce efficient feature extractor models for continual learning and we show their relevance compared to the networks traditionally used for continual learning
48

D'Amicantonio, Giacomo. "Improvements to knowledge distillation of deep neural networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24178/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
One of the main problems in the field of Artificial Intelligence is the efficiency of neural networks models. In the past few years, it seemed that most tasks involving such models could simply be solved by designing larger, deeper models and training them on larger datasets for longer time. This approach requires better performing and therefore expensive and energy consuming hardware and will have an increasingly significant environmental impact when those models are deployed at scale. In 2015 G. Hinton, J. Dean and O. Vinyals presented Knowledge Distillation (KD), a technique that leveraged the logits produced by a big, cumbersome model to guide the training of a smaller model. The two networks were called “Teacher” and “Student” given the analogy between the big model with large knowledge and the small model which has yet to learn everything. They proved that it is possible to extract useful knowledge from the teacher logits and use it to obtain a better performing student when compared with the same model that learned all by itself. This thesis provides an overview of the current state-of-the-art in the field of Knowledge Distillation, analyses some of the most interesting approaches, and builds on them to exploit very confident logits in a more effective way. Furthermore, it provides experimental evidence on the importance of using also smaller logit entries and correcting mistaken predictions from the teacher in the distillation process.
49

Kalchbrenner, Nal. "Encoder-decoder neural networks." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:d56e48db-008b-4814-bd82-a5d612000de9.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This thesis introduces the concept of an encoder-decoder neural network and develops architectures for the construction of such networks. Encoder-decoder neural networks are probabilistic conditional generative models of high-dimensional structured items such as natural language utterances and natural images. Encoder-decoder neural networks estimate a probability distribution over structured items belonging to a target set conditioned on structured items belonging to a source set. The distribution over structured items is factorized into a product of tractable conditional distributions over individual elements that compose the items. The networks estimate these conditional factors explicitly. We develop encoder-decoder neural networks for core tasks in natural language processing and natural image and video modelling. In Part I, we tackle the problem of sentence modelling and develop deep convolutional encoders to classify sentences; we extend these encoders to models of discourse. In Part II, we go beyond encoders to study the longstanding problem of translating from one human language to another. We lay the foundations of neural machine translation, a novel approach that views the entire translation process as a single encoder-decoder neural network. We propose a beam search procedure to search over the outputs of the decoder to produce a likely translation in the target language. Besides known recurrent decoders, we also propose a decoder architecture based solely on convolutional layers. Since the publication of these new foundations for machine translation in 2013, encoder-decoder translation models have been richly developed and have displaced traditional translation systems both in academic research and in large-scale industrial deployment. In services such as Google Translate these models process in the order of a billion translation queries a day. In Part III, we shift from the linguistic domain to the visual one to study distributions over natural images and videos. We describe two- and three- dimensional recurrent and convolutional decoder architectures and address the longstanding problem of learning a tractable distribution over high-dimensional natural images and videos, where the likely samples from the distribution are visually coherent. The empirical validation of encoder-decoder neural networks as state-of- the-art models of tasks ranging from machine translation to video prediction has a two-fold significance. On the one hand, it validates the notions of assigning probabilities to sentences or images and of learning a distribution over a natural language or a domain of natural images; it shows that a probabilistic principle of compositionality, whereby a high- dimensional item is composed from individual elements at the encoder side and whereby a corresponding item is decomposed into conditional factors over individual elements at the decoder side, is a general method for modelling cognition involving high-dimensional items; and it suggests that the relations between the elements are best learnt in an end-to-end fashion as non-linear functions in distributed space. On the other hand, the empirical success of the networks on the tasks characterizes the underlying cognitive processes themselves: a cognitive process as complex as translating from one language to another that takes a human a few seconds to perform correctly can be accurately modelled via a learnt non-linear deterministic function of distributed vectors in high-dimensional space.
50

Mansour, Tarek M. Eng Massachusetts Institute of Technology. "Deep neural networks are lazy : on the inductive bias of deep learning." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/121680.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 75-78).
Deep learning models exhibit superior generalization performance despite being heavily overparametrized. Although widely observed in practice, there is currently very little theoretical backing for such a phenomena. In this thesis, we propose a step forward towards understanding generalization in deep learning. We present evidence that deep neural networks have an inherent inductive bias that makes them inclined to learn generalizable hypotheses and avoid memorization. In this respect, we propose results that suggest that the inductive bias stems from neural networks being lazy: they tend to learn simpler rules first. We also propose a definition of simplicity in deep learning based on the implicit priors ingrained in deep neural networks.
by Tarek Mansour.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

До бібліографії