To see the other types of publications on this topic, follow the link: Deep neural networks architecture.

Dissertations / Theses on the topic 'Deep neural networks architecture'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Deep neural networks architecture.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Heuillet, Alexandre. "Exploring deep neural network differentiable architecture design." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG069.

Full text
Abstract:
L'intelligence artificielle (IA) a gagné en popularité ces dernières années, principalement en raison de ses applications réussies dans divers domaines tels que l'analyse de données textuelles, la vision par ordinateur et le traitement audio. La résurgence des techniques d'apprentissage profond a joué un rôle central dans ce succès. L'article révolutionnaire de Krizhevsky et al., AlexNet, a réduit l'écart entre les performances humaines et celles des machines dans les tâches de classification d'images. Des articles ultérieurs tels que Xception et ResNet ont encore renforcé l'apprentissage profond en tant que technique de pointe, ouvrant de nouveaux horizons pour la communauté de l'IA. Le succès de l'apprentissage profond réside dans son architecture, conçue manuellement avec des connaissances d'experts et une validation empirique. Cependant, ces architectures n'ont pas la certitude d'être la solution optimale. Pour résoudre ce problème, des articles récents ont introduit le concept de Recherche d'Architecture Neuronale ( extit{NAS}), permettant l'automatisation de la conception des architectures profondes. Cependant, la majorités des approches initiales se sont concentrées sur de grandes architectures avec des objectifs spécifiques (par exemple, l'apprentissage supervisé) et ont utilisé des techniques d'optimisation coûteuses en calcul telles que l'apprentissage par renforcement et les algorithmes génétiques. Dans cette thèse, nous approfondissons cette idée en explorant la conception automatique d'architectures profondes, avec une emphase particulière sur les méthodes extit{NAS} différentiables ( extit{DNAS}), qui représentent la tendance actuelle en raison de leur efficacité computationnelle. Bien que notre principal objectif soit les réseaux convolutifs ( extit{CNNs}), nous explorons également les Vision Transformers (ViTs) dans le but de concevoir des architectures rentables adaptées aux applications en temps réel<br>Artificial Intelligence (AI) has gained significant popularity in recent years, primarily due to its successful applications in various domains, including textual data analysis, computer vision, and audio processing. The resurgence of deep learning techniques has played a central role in this success. The groundbreaking paper by Krizhevsky et al., AlexNet, narrowed the gap between human and machine performance in image classification tasks. Subsequent papers such as Xception and ResNet have further solidified deep learning as a leading technique, opening new horizons for the AI community. The success of deep learning lies in its architecture, which is manually designed with expert knowledge and empirical validation. However, these architectures lack the certainty of an optimal solution. To address this issue, recent papers introduced the concept of Neural Architecture Search (NAS), enabling the learning of deep architectures. However, most initial approaches focused on large architectures with specific targets (e.g., supervised learning) and relied on computationally expensive optimization techniques such as reinforcement learning and evolutionary algorithms. In this thesis, we further investigate this idea by exploring automatic deep architecture design, with a particular emphasis on differentiable NAS (DNAS), which represents the current trend in NAS due to its computational efficiency. While our primary focus is on Convolutional Neural Networks (CNNs), we also explore Vision Transformers (ViTs) with the goal of designing cost-effective architectures suitable for real-time applications
APA, Harvard, Vancouver, ISO, and other styles
2

Jeanneret, Sanmiguel Guillaume. "Towards explainable and interpretable deep neural networks." Electronic Thesis or Diss., Normandie, 2024. http://www.theses.fr/2024NORMC229.

Full text
Abstract:
Les architectures neuronales profondes ont démontré des résultats remarquables dans diverses tâches de vision par ordinateur. Cependant, leur performance extraordinaire se fait au détriment de l'interprétabilité. En conséquence, le domaine de l'IA explicable a émergé pour comprendre réellement ce que ces modèles apprennent et pour découvrir leurs sources d'erreur. Cette thèse explore les algorithmes explicables afin de révéler les biais et les variables utilisés par ces modèles de boîte noire dans le contexte de la classification d'images. Par conséquent, nous divisons cette thèse en quatre parties. Dans les trois premiers chapitres, nous proposons plusieurs méthodes pour générer des explications contrefactuelles. Tout d'abord, nous incorporons des modèles de diffusion pour générer ces explications. Ensuite, nous lions les domaines de recherche des exemples adversariaux et des contrefactuels pour générer ces derniers. Le suivant chapitre propose une nouvelle méthode pour générer des contrefactuels en mode totalement boîte noire, c'est-à-dire en utilisant uniquement l'entrée et la prédiction sans accéder au modèle. La dernière partie de cette thèse concerne la création de méthodes interprétables par conception. Plus précisément, nous étudions comment étendre les transformeurs de vision en architectures interprétables. Nos méthodes proposées ont montré des résultats prometteurs et ont avancé la frontière des connaissances de la littérature actuelle sur l'IA explicable<br>Deep neural architectures have demonstrated outstanding results in a variety of computer vision tasks. However, their extraordinary performance comes at the cost of interpretability. As a result, the field of Explanable AI has emerged to understand what these models are learning as well as to uncover their sources of error. In this thesis, we explore the world of explainable algorithms to uncover the biases and variables used by these parametric models in the context of image classification. To this end, we divide this thesis into four parts. The first three chapters proposes several methods to generate counterfactual explanations. In the first chapter, we proposed to incorporate diffusion models to generate these explanations. Next, we link the research areas of adversarial attacks and counterfactuals. The next chapter proposes a new pipeline to generate counterfactuals in a fully black-box mode, \ie, using only the input and the prediction without accessing the model. The final part of this thesis is related to the creation of interpretable by-design methods. More specifically, we investigate how to extend vision transformers into interpretable architectures. Our proposed methods have shown promising results and have made a step forward in the knowledge frontier of current XAI literature
APA, Harvard, Vancouver, ISO, and other styles
3

Li, Yanxi. "Efficient Neural Architecture Search with an Active Performance Predictor." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/24092.

Full text
Abstract:
This thesis searches for the optimal neural architecture by minimizing a proxy of validation loss. Existing neural architecture search (NAS) methods used to discover the optimal neural architecture that best fits the validation examples given the up-to-date network weights. However, back propagation with a number of validation examples could be time consuming, especially when it needs to be repeated many times in NAS. Though these intermediate validation results are invaluable, they would be wasted if we cannot use them to predict the future from the past. In this thesis, we propose to approximate the validation loss landscape by learning a mapping from neural architectures to their corresponding validate losses. The optimal neural architecture thus can be easily identified as the minimum of this proxy validation loss landscape. A novel sampling strategy is further developed for an efficient approximation of the loss landscape. Theoretical analysis indicates that the validation loss estimator learned with our sampling strategy can reach a lower error rate and a lower label complexity compared with a uniform sampling. Experimental results on benchmarks demonstrate that the architecture searched by the proposed algorithm can achieve a satisfactory accuracy with less time cost.
APA, Harvard, Vancouver, ISO, and other styles
4

Silfa, Franyell. "Energy-efficient architectures for recurrent neural networks." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671448.

Full text
Abstract:
Deep Learning algorithms have been remarkably successful in applications such as Automatic Speech Recognition and Machine Translation. Thus, these kinds of applications are ubiquitous in our lives and are found in a plethora of devices. These algorithms are composed of Deep Neural Networks (DNNs), such as Convolutional Neural Networks and Recurrent Neural Networks (RNNs), which have a large number of parameters and require a large amount of computations. Hence, the evaluation of DNNs is challenging due to their large memory and power requirements. RNNs are employed to solve sequence to sequence problems such as Machine Translation. They contain data dependencies among the executions of time-steps hence the amount of parallelism is severely limited. Thus, evaluating them in an energy-efficient manner is more challenging than evaluating other DNN algorithms. This thesis studies applications using RNNs to improve their energy efficiency on specialized architectures. Specifically, we propose novel energy-saving techniques and highly efficient architectures tailored to the evaluation of RNNs. We focus on the most successful RNN topologies which are the Long Short Term memory and the Gated Recurrent Unit. First, we characterize a set of RNNs running on a modern SoC. We identify that accessing the memory to fetch the model weights is the main source of energy consumption. Thus, we propose E-PUR: an energy-efficient processing unit for RNN inference. E-PUR achieves 6.8x speedup and improves energy consumption by 88x compared to the SoC. These benefits are obtained by improving the temporal locality of the model weights. In E-PUR, fetching the parameters is the main source of energy consumption. Thus, we strive to reduce memory accesses and propose a scheme to reuse previous computations. Our observation is that when evaluating the input sequences of an RNN model, the output of a given neuron tends to change lightly between consecutive evaluations.Thus, we develop a scheme that caches the neurons' outputs and reuses them whenever it detects that the change between the current and previously computed output value for a given neuron is small avoiding to fetch the weights. In order to decide when to reuse a previous value we employ a Binary Neural Network (BNN) as a predictor of reusability. The low-cost BNN can be employed in this context since its output is highly correlated to the output of RNNs. We show that our proposal avoids more than 24.2% of computations. Hence, on average, energy consumption is reduced by 18.5% for a speedup of 1.35x. RNN models’ memory footprint is usually reduced by using low precision for evaluation and storage. In this case, the minimum precision used is identified offline and it is set such that the model maintains its accuracy. This method utilizes the same precision to compute all time-steps.Yet, we observe that some time-steps can be evaluated with a lower precision while preserving the accuracy. Thus, we propose a technique that dynamically selects the precision used to compute each time-step. A challenge of our proposal is choosing a lower bit-width. We address this issue by recognizing that information from a previous evaluation can be employed to determine the precision required in the current time-step. Our scheme evaluates 57% of the computations on a bit-width lower than the fixed precision employed by static methods. We implement it on E-PUR and it provides 1.46x speedup and 19.2% energy savings on average.<br>Los algoritmos de aprendizaje profundo han tenido un éxito notable en aplicaciones como el reconocimiento automático de voz y la traducción automática. Por ende, estas aplicaciones son omnipresentes en nuestras vidas y se encuentran en una gran cantidad de dispositivos. Estos algoritmos se componen de Redes Neuronales Profundas (DNN), tales como las Redes Neuronales Convolucionales y Redes Neuronales Recurrentes (RNN), las cuales tienen un gran número de parámetros y cálculos. Por esto implementar DNNs en dispositivos móviles y servidores es un reto debido a los requisitos de memoria y energía. Las RNN se usan para resolver problemas de secuencia a secuencia tales como traducción automática. Estas contienen dependencias de datos entre las ejecuciones de cada time-step, por ello la cantidad de paralelismo es limitado. Por eso la evaluación de RNNs de forma energéticamente eficiente es un reto. En esta tesis se estudian RNNs para mejorar su eficiencia energética en arquitecturas especializadas. Para esto, proponemos técnicas de ahorro energético y arquitecturas de alta eficiencia adaptadas a la evaluación de RNN. Primero, caracterizamos un conjunto de RNN ejecutándose en un SoC. Luego identificamos que acceder a la memoria para leer los pesos es la mayor fuente de consumo energético el cual llega hasta un 80%. Por ende, creamos E-PUR: una unidad de procesamiento para RNN. E-PUR logra una aceleración de 6.8x y mejora el consumo energético en 88x en comparación con el SoC. Esas mejoras se deben a la maximización de la ubicación temporal de los pesos. En E-PUR, la lectura de los pesos representa el mayor consumo energético. Por ende, nos enfocamos en reducir los accesos a la memoria y creamos un esquema que reutiliza resultados calculados previamente. La observación es que al evaluar las secuencias de entrada de un RNN, la salida de una neurona dada tiende a cambiar ligeramente entre evaluaciones consecutivas, por lo que ideamos un esquema que almacena en caché las salidas de las neuronas y las reutiliza cada vez que detecta un cambio pequeño entre el valor de salida actual y el valor previo, lo que evita leer los pesos. Para decidir cuándo usar un cálculo anterior utilizamos una Red Neuronal Binaria (BNN) como predictor de reutilización, dado que su salida está altamente correlacionada con la salida de la RNN. Esta propuesta evita más del 24.2% de los cálculos y reduce el consumo energético promedio en 18.5%. El tamaño de la memoria de los modelos RNN suele reducirse utilizando baja precisión para la evaluación y el almacenamiento de los pesos. En este caso, la precisión mínima utilizada se identifica de forma estática y se establece de manera que la RNN mantenga su exactitud. Normalmente, este método utiliza la misma precisión para todo los cálculos. Sin embargo, observamos que algunos cálculos se pueden evaluar con una precisión menor sin afectar la exactitud. Por eso, ideamos una técnica que selecciona dinámicamente la precisión utilizada para calcular cada time-step. Un reto de esta propuesta es como elegir una precisión menor. Abordamos este problema reconociendo que el resultado de una evaluación previa se puede emplear para determinar la precisión requerida en el time-step actual. Nuestro esquema evalúa el 57% de los cálculos con una precisión menor que la precisión fija empleada por los métodos estáticos. Por último, la evaluación en E-PUR muestra una aceleración de 1.46x con un ahorro de energía promedio de 19.2%
APA, Harvard, Vancouver, ISO, and other styles
5

Xiao, Yao. "Vehicle Detection in Deep Learning." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/91375.

Full text
Abstract:
Computer vision techniques are becoming increasingly popular. For example, face recognition is used to help police find criminals, vehicle detection is used to prevent drivers from serious traffic accidents, and written word recognition is used to convert written words into printed words. With the rapid development of vehicle detection given the use of deep learning techniques, there are still concerns about the performance of state-of-the-art vehicle detection techniques. For example, state-of-the-art vehicle detectors are restricted by the large variation of scales. People working on vehicle detection are developing techniques to solve this problem. This thesis proposes an advanced vehicle detection model, adopting one of the classical neural networks, which are the residual neural network and the region proposal network. The model utilizes the residual neural network as a feature extractor and the region proposal network to detect the potential objects' information.<br>Master of Science<br>Computer vision techniques are becoming increasingly popular. For example, face recognition is used to help police find criminals, vehicle detection is used to prevent drivers from serious traffic accidents, and written word recognition is used to convert written words into printed words. With the rapid development of vehicle detection given the use of deep learning techniques, there are still concerns about the performance of state-of-the art vehicle detection techniques. For example, state-of-the-art vehicle detectors are restricted by the large variation of scales. People working on vehicle detection are developing techniques to solve this problem. This thesis proposes an advanced vehicle detection model, utilizing deep learning techniques to detect the potential objects’ information.
APA, Harvard, Vancouver, ISO, and other styles
6

Fayyazifar, Najmeh. "Deep learning and neural architecture search for cardiac arrhythmias classification." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2022. https://ro.ecu.edu.au/theses/2553.

Full text
Abstract:
Cardiovascular disease (CVD) is the primary cause of mortality worldwide. Among people with CVD, cardiac arrhythmias (changes in the natural rhythm of the heart), are a leading cause of death. The clinical routine for arrhythmia diagnosis includes acquiring an electrocardiogram (ECG) and manually reviewing the ECG trace to identify the arrhythmias. However, due to the varying expertise level of clinicians, accurate diagnosis of arrhythmias with similar visual characteristics (that naturally exists in some different types of arrhythmias) can be challenging for some front-line clinicians. In addition, there is a shortage of trained cardiologists globally, and especially in remote areas of Australia, where patients are sometimes required to wait for weeks or months for a visiting cardiologist. This impacts the timely care of patients living in remote areas. Therefore, developing an AI-based model, that assists clinicians in accurate real-time decision-making, is an essential task. This thesis provides supporting evidence that the problem of delayed and/or inaccurate cardiac arrhythmias diagnosis can be addressed by designing accurate deep learning models through Neural Architecture Search (NAS). These models can automatically differentiate different types of arrhythmias in a timely manner. Many different deep learning models and more specifically, Convolutional Neural Networks (CNNs) have been developed for automatic and accurate cardiac arrhythmias detection. However, these models are heavily hand-crafted which means designing an accurate model for a given task, requires significant trial and error. In this thesis, the process of designing an accurate CNN model for 1-dimensional biomedical data classification is automated by applying NAS techniques. NAS is a recent research paradigm in which the process of designing an accurate model (for a given task) is automated by employing a search algorithm over a pre-defined search space of possible operations in a deep learning model. In this thesis, we developed a CNN model for detection of ‘Atrial Fibrillation’ (AF) among ‘normal sinus rhythm’, ‘noise’, and ‘other arrhythmias. This model is designed by employing a well-known NAS method, Efficient Neural Architecture Search (ENAS) which uses Reinforcement Learning (RL) to perform a search over common operations in a CNN structure. This CNN model outperformed state-of-the-art deep learning models for AF detection while minimizing human intervention in CNN structure design. In order to reduce the high computation time that was required by ENAS (and typically by RL-based NAS), in this thesis, a recent NAS method called DARTS was utilized to design a CNN model for accurate diagnosis of a wider range of cardiac arrhythmias. This method employs Stochastic Gradient Descent (SGD) to perform the search procedure over a continuous and therefore differentiable search space. The search space (operations and building blocks) of DARTS was tailored to implement the search procedure over a public dataset of standard 12-lead ECG recordings containing 111 types of arrhythmias (released by the PhysioNet challenge, 2020). The performance of DARTS was further studied by utilizing it to differentiate two major sub-types of Wide QRS Complex Tachycardia (Ventricular Tachycardia- VT vs Supraventricular Tachycardia- SVT). These sub-types have similar visual characteristics, which makes differentiating between them challenging, even for experienced clinicians. This dataset is a unique collection of Wide Complex Tachycardia (WCT) recordings, collected by our medical collaborator (University of Ottawa heart institute) over the course of 11 years. The DARTS-derived model achieved 91% accuracy, outperforming cardiologists (77% accuracy) and state-of-the-art deep learning models (88% accuracy). Lastly, the efficacy of the original DARTS algorithm for the image classification task is empirically studied. Our experiments showed that the performance of the DARTS search algorithm does not deteriorate over the search course; however, the search procedure can be terminated earlier than what was designated in the original algorithm. In addition, the accuracy of the derived model could be further improved by modifying the original search operations (excluding the zero operation), making it highly valuable in a clinical setting.
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Yu-Hsin Ph D. Massachusetts Institute of Technology. "Architecture design for highly flexible and energy-efficient deep neural network accelerators." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117838.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.<br>This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.<br>Cataloged from student-submitted PDF version of thesis.<br>Includes bibliographical references (pages 141-147).<br>Deep neural networks (DNNs) are the backbone of modern artificial intelligence (AI). However, due to their high computational complexity and diverse shapes and sizes, dedicated accelerators that can achieve high performance and energy efficiency across a wide range of DNNs are critical for enabling AI in real-world applications. To address this, we present Eyeriss, a co-design of software and hardware architecture for DNN processing that is optimized for performance, energy efficiency and flexibility. Eyeriss features a novel Row-Stationary (RS) dataflow to minimize data movement when processing a DNN, which is the bottleneck of both performance and energy efficiency. The RS dataflow supports highly-parallel processing while fully exploiting data reuse in a multi-level memory hierarchy to optimize for the overall system energy efficiency given any DNN shape and size. It achieves 1.4x to 2.5x higher energy efficiency than other existing dataflows. To support the RS dataflow, we present two versions of the Eyeriss architecture. Eyeriss v1 targets large DNNs that have plenty of data reuse. It features a flexible mapping strategy for high performance and a multicast on-chip network (NoC) for high data reuse, and further exploits data sparsity to reduce processing element (PE) power by 45% and off-chip bandwidth by up to 1.9x. Fabricated in a 65nm CMOS, Eyeriss v1 consumes 278 mW at 34.7 fps for the CONV layers of AlexNet, which is 10x more efficient than a mobile GPU. Eyeriss v2 addresses support for the emerging compact DNNs that introduce higher variation in data reuse. It features a RS+ dataflow that improves PE utilization, and a flexible and scalable NoC that adapts to the bandwidth requirement while also exploiting available data reuse. Together, they provide over 10x higher throughput than Eyeriss v1 at 256 PEs. Eyeriss v2 also exploits sparsity and SIMD for an additional 6x increase in throughput.<br>by Yu-Hsin Chen.<br>Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
8

Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.

Full text
Abstract:
La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images<br>In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
APA, Harvard, Vancouver, ISO, and other styles
9

Marti, Marco Ros. "Deep Convolutional Neural Network for Effective Image Analysis : DESIGN AND IMPLEMENTATION OF A DEEP PIXEL-WISE SEGMENTATION ARCHITECTURE." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-227851.

Full text
Abstract:
This master thesis presents the process of designing and implementing a CNN-based architecture for image recognition included in a larger project in the field of fashion recommendation with deep learning. Concretely, the presented network aims to perform localization and segmentation tasks. Therefore, an accurate analysis of the most well-known localization and segmentation networks in the state of the art has been performed. Afterwards, a multi-task network performing RoI pixel-wise segmentation has been created. This proposal solves the detected weaknesses of the pre-existing networks in the field of application, i.e. fashion recommendation. These weaknesses are basically related with the lack of a fine-grained quality of the segmentation and problems with computational efficiency. When it comes to improve the details of the segmentation, this network proposes to work pixel- wise, i.e. performing a classification task for each of the pixels of the image. Thus, the network is more suitable to detect all the details presented in the analysed images. However, a pixel-wise task requires working in pixel resolution, which implies that the number of operations to perform is usually large. To reduce the total number of operations to perform in the network and increase the computational efficiency, this pixel-wise segmentation is only done in the meaningful regions of the image (Regions of Interest), which are also computed in the network (RoI masks). Then, after a study of the more recent deep learning libraries, the network has been successfully implemented. Finally, to prove the correct operation of the design, a set of experiments have been satisfactorily conducted. In this sense, it must be noted that the evaluation of the results obtained during testing phase with respect to the most well-known architectures is out of the scope of this thesis as the experimental conditions, especially in terms of dataset, have not been suitable for doing so. Nevertheless, the proposed network is totally prepared to perform this evaluation in the future, when the required experimental conditions are available.<br>Denna examensarbete presenterar processen för att designa och implementera en CNN-baserad arkitektur för bildigenkänning som ingår i ett större projekt inom moderekommendation med djup inlärning. Konkret, det presenterade nätverket syftar till att utföra lokaliseringsoch segmenteringsuppgifter. Därför har en noggrann analys av de mest kända lokaliseringsoch segmenteringsnätena utförts inom den senaste tekniken. Därefter har ett multi-task-nätverk som utför RoI pixel-wise segmentering skapats. Detta förslag löser de upptäckta svagheterna hos de befintliga näten inom tillämpningsområdet, dvs modeanbefaling. Dessa svagheter är i grund och botten relaterade till bristen på en finkornad kvalitet på segmenteringen och problem med beräkningseffektivitet. När det gäller att förbättra detaljerna i segmenteringen, föreslår detta nätverk att arbeta pixelvis, dvs att utföra en klassificeringsuppgift för var och en av bildpunkterna i bilden. Nätverket är sålunda lämpligare att detektera alla detaljer som presenteras i de analyserade bilderna. En pixelvis uppgift kräver dock att man arbetar med pixelupplösning, vilket innebär att antalet operationer som ska utföras är vanligtvis stor. För att minska det totala antalet operationer som ska utföras i nätverket och öka beräkningseffektiviteten görs denna pixelvisa segmentering endast i de meningsfulla regionerna i bilden (intressanta regioner), som också beräknas i nätverket (RoI-masker) . Sedan, efter en studie av de senaste djuplärningsbiblioteken, har nätverket framgångsrikt implementerats. Slutligen, för att bevisa korrekt funktion av konstruktionen, har en uppsättning experiment genomförts på ett tillfredsställande sätt. I detta avseende måste det noteras att utvärderingen av de resultat som uppnåtts under testfasen i förhållande till de mest kända arkitekturerna ligger utanför denna avhandling, eftersom de experimentella förhållandena, särskilt vad gäller dataset, inte har varit lämpliga För att göra det. Ändå är det föreslagna nätverket helt beredd att utföra denna utvärdering i framtiden när de nödvändiga försöksvillkoren är tillgängliga.<br>En aquest treball de fi de màster es presenta el disseny i la implementació d’una arquitectura pel reconeixement d’imatges fent ús de CNN. Aquesta xarxa es troba inclosa en un projecte de major envergadura en el camp de la recomanació de moda. En concret, la xarxa presentada en aquest document s’encarrega de realitzar les tasques de localització i segmentació. Després d’un estudi a consciència de les xarxes més conegudes de l’estat de l’art, s’ha dissenyat una xarxa multi-tasca encarregada de realitzar una segmentació a resolució de píxel de les regions d’interès de la imatge, les quals han sigut prèviament calculades i emmascarades. Aquesta proposta soluciona les mancances detectades en les xarxes ja existents pel que fa a la tasca de recomanació de moda. Aquestes mancances es basen en la obtenció d’una segmentació sense prou nivell de detalls i en una rellevant complexitat computacional. Pel que fa a la qualitat de la segmentació, aquesta tesi proposa treballar en resolució de píxel, classificant tots els píxels de la imatge de forma individual, per tal de poder adaptar-se a tots els detalls que puguin aparèixer a la imatge analitzada. No obstant, treballar píxel a píxel implica la realització d’una gran quantitat d’operacions. Per reduir-les, proposem fer la segmentació píxel a píxel només a les regions d’interès de la imatge. A continuació, després d’un estudi detallat de les llibreries de deep learnign més destacades, el disseny ha sigut implementat. Finalment s’han dut a terme una sèrie d’experiments per provar el correcte funcionament del disseny. En aquest sentit és important destacar que aquesta tesi no té com a objectiu avaluar el disseny respecte d’altres xarxes ja existents. La raó és que les condicions d’experimentació, sobretot pel que fa a la base de dades, no són adequades per aquesta tasca. No obstant, la xarxa està perfectament preparada per fer aquesta avaluació un cop les condicions d’experimentació així ho permetin.
APA, Harvard, Vancouver, ISO, and other styles
10

Bhattarai, Smrity. "Digital Architecture for real-time face detection for deep video packet inspection systems." University of Akron / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=akron1492787219112947.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Ferré, Paul. "Adéquation algorithme-architecture de réseaux de neurones à spikes pour les architectures matérielles massivement parallèles." Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30318/document.

Full text
Abstract:
Cette dernière décennie a donné lieu à la réémergence des méthodes d'apprentissage machine basées sur les réseaux de neurones formels sous le nom d'apprentissage profond. Bien que ces méthodes aient permis des avancées majeures dans le domaine de l'apprentissage machine, plusieurs obstacles à la possibilité d'industrialiser ces méthodes persistent, notamment la nécessité de collecter et d'étiqueter une très grande quantité de données ainsi que la puissance de calcul nécessaire pour effectuer l'apprentissage et l'inférence avec ce type de réseau neuronal. Dans cette thèse, nous proposons d'étudier l'adéquation entre des algorithmes d'inférence et d'apprentissage issus des réseaux de neurones biologiques pour des architectures matérielles massivement parallèles. Nous montrons avec trois contributions que de telles adéquations permettent d'accélérer drastiquement les temps de calculs inhérents au réseaux de neurones. Dans notre premier axe, nous réalisons l'étude d'adéquation du moteur BCVision de Brainchip SAS pour les plate-formes GPU. Nous proposons également l'introduction d'une architecture hiérarchique basée sur des cellules complexes. Nous montrons que l'adéquation pour GPU accélère les traitements par un facteur sept, tandis que l'architecture hiérarchique atteint un facteur mille. La deuxième contribution présente trois algorithmes de propagation de décharges neuronales adaptés aux architectures parallèles. Nous réalisons une étude complète des modèles computationels de ces algorithmes, permettant de sélectionner ou de concevoir un système matériel adapté aux paramètres du réseau souhaité. Dans notre troisième axe nous présentons une méthode pour appliquer la règle Spike-Timing-Dependent-Plasticity à des données images afin d'apprendre de manière non-supervisée des représentations visuelles. Nous montrons que notre approche permet l'apprentissage d'une hiérarchie de représentations pertinente pour des problématiques de classification d'images, tout en nécessitant dix fois moins de données que les autres approches de la littérature<br>The last decade has seen the re-emergence of machine learning methods based on formal neural networks under the name of deep learning. Although these methods have enabled a major breakthrough in machine learning, several obstacles to the possibility of industrializing these methods persist, notably the need to collect and label a very large amount of data as well as the computing power necessary to perform learning and inference with this type of neural network. In this thesis, we propose to study the adequacy between inference and learning algorithms derived from biological neural networks and massively parallel hardware architectures. We show with three contribution that such adequacy drastically accelerates computation times inherent to neural networks. In our first axis, we study the adequacy of the BCVision software engine developed by Brainchip SAS for GPU platforms. We also propose the introduction of a coarse-to-fine architecture based on complex cells. We show that GPU portage accelerates processing by a factor of seven, while the coarse-to-fine architecture reaches a factor of one thousand. The second contribution presents three algorithms for spike propagation adapted to parallel architectures. We study exhaustively the computational models of these algorithms, allowing the selection or design of the hardware system adapted to the parameters of the desired network. In our third axis we present a method to apply the Spike-Timing-Dependent-Plasticity rule to image data in order to learn visual representations in an unsupervised manner. We show that our approach allows the effective learning a hierarchy of representations relevant to image classification issues, while requiring ten times less data than other approaches in the literature
APA, Harvard, Vancouver, ISO, and other styles
12

Prellberg, Jonas [Verfasser], Oliver [Akademischer Betreuer] Kramer, and Paul [Akademischer Betreuer] Kaufmann. "Evolving deep neural networks: optimization of weights and architectures / Jonas Prellberg ; Oliver Kramer, Paul Kaufmann." Oldenburg : BIS der Universität Oldenburg, 2020. http://d-nb.info/1216241767/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Bai, Kang Jun. "Moving Toward Intelligence: A Hybrid Neural Computing Architecture for Machine Intelligence Applications." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/103711.

Full text
Abstract:
Rapid advances in machine learning have made information analysis more efficient than ever before. However, to extract valuable information from trillion bytes of data for learning and decision-making, general-purpose computing systems or cloud infrastructures are often deployed to train a large-scale neural network, resulting in a colossal amount of resources in use while themselves exposing other significant security issues. Among potential approaches, the neuromorphic architecture, which is not only amenable to low-cost implementation, but can also deployed with in-memory computing strategy, has been recognized as important methods to accelerate machine intelligence applications. In this dissertation, theoretical and practical properties of a hybrid neural computing architecture are introduced, which utilizes a dynamic reservoir having the short-term memory to enable the historical learning capability with the potential to classify non-separable functions. The hybrid neural computing architecture integrates both spatial and temporal processing structures, sidestepping the limitations introduced by the vanishing gradient. To be specific, this is made possible through four critical features: (i) a feature extractor built based upon the in-memory computing strategy, (ii) a high-dimensional mapping with the Mackey-Glass neural activation, (iii) a delay-dynamic system with historical learning capability, and (iv) a unique learning mechanism by only updating readout weights. To support the integration of neuromorphic architecture and deep learning strategies, the first generation of delay-feedback reservoir network has been successfully fabricated in 2017, better yet, the spatial-temporal hybrid neural network with an improved delay-feedback reservoir network has been successfully fabricated in 2020. To demonstrate the effectiveness and performance across diverse machine intelligence applications, the introduced network structures are evaluated through (i) time series prediction, (ii) image classification, (iii) speech recognition, (iv) modulation symbol detection, (v) radio fingerprint identification, and (vi) clinical disease identification.<br>Doctor of Philosophy<br>Deep learning strategies are the cutting-edge of artificial intelligence, in which the artificial neural networks are trained to extract key features or finding similarities from raw sensory information. This is made possible through multiple processing layers with a colossal amount of neurons, in a similar way to humans. Deep learning strategies run on von Neumann computers are deployed worldwide. However, in today's data-driven society, the use of general-purpose computing systems and cloud infrastructures can no longer offer a timely response while themselves exposing other significant security issues. Arose with the introduction of neuromorphic architecture, application-specific integrated circuit chips have paved the way for machine intelligence applications in recently years. The major contributions in this dissertation include designing and fabricating a new class of hybrid neural computing architecture and implementing various deep learning strategies to diverse machine intelligence applications. The resulting hybrid neural computing architecture offers an alternative solution to accelerate the neural computations required for sophisticated machine intelligence applications with a simple system-level design, and therefore, opening the door to low-power system-on-chip design for future intelligence computing, what is more, providing prominent design solutions and performance improvements for internet of things applications.
APA, Harvard, Vancouver, ISO, and other styles
14

Le, Blevec Hugo. "Joint design of deep neural networks and FPGA dataflow architectures for semantic segmentation in autonomous vehicles." Electronic Thesis or Diss., Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2024. http://www.theses.fr/2024IMTA0445.

Full text
Abstract:
Cette thèse examine le déploiement de modèles d’apprentissage profond, notamment de segmentation sémantique, sur des circuits reconfigurables (FPGAs), qui offrent des solutions efficaces en termes d’énergie et de faible latence pour des systèmes comme les véhicules autonomes ou les drones. Bien que les FPGAs soient avantageux pour ces applications, leurs ressources limitées en calcul et mémoire posent des défis. Les travaux présentés répondent à ces défis en s’intéressant à la conception conjointe de réseaux de neurones et d’architectures FPGA flux de données. Deux modèles sont deployés, ResNet18-UNet et ENet, étant optimisés pour l’inférence en temps réel. P-ENet, une version améliorée d’ENet grâce à une exploration d’architectures de réseaux de neurones, atteint des performances à l’état de l’art avec 70.3% de mIoU sur le jeu de données Cityscapes, 226 FPS, et une latence de 4,2 ms, démontrant que la co-conception des réseaux et du matériel offre de meilleurs compromis entre précision et performances matérielles que la simple compression. Les mesures comparatives effectuées indiquent que les cette architecture sur FPGA est 1,15 fois plus efficace énergétiquement qu’un GPU embarqué<br>This thesis explores the deployment of deep learning models, particularly semantic segmentation, on Field-Programmable Gate Arrays (FPGAs), which offer power-efficient, low-latency solutions for resource-constrained systems like autonomous vehicles and drones. While FPGAs have advantages for these applications, they present challenges due to limited computational and memory resources. This research addresses these challenges by focusing on the joint design of neural networks and dataflow architectures on FPGA. Two encoder-decoder models, ResNet18-UNet and ENet, were implemented, and optimized for real-time inference. P-ENet an optimized version of ENet thanks to neural network architecture design space exploration, achieves state-of-the-art performance with 70.3% mIoU on the Cityscapes dataset, 226 FPS, and 4.2 ms latency. The study concludes that co-designing neural networks and hardware yields better trade-offs between accuracy and hardware performance than compression techniques alone. Power comparisons show this FPGA design offers 1.15 times better power efficiency than an embedded GPU
APA, Harvard, Vancouver, ISO, and other styles
15

Waltsburger, Hugo. "Methodology and tooling for energy-efficient neural networks computation and optimization." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPAST195.

Full text
Abstract:
Les réseaux de neurones ont connu d'impressionnants développements depuis l'émergence de l'apprentissage profond, vers 2012, et sont désormais l'état de l'art de toute une gamme de tâches automatisées, telles que le traitement automatique du langage naturel, la classification, la prédiction, etc. Néanmoins, dans un contexte où la recherche se focalise sur l'optimisation d'un unique indicateur de performance -- typiquement, le taux d'exactitude --, il apparaît que les performances tendent à croître de façon fiable, voire prévisible, en fonction de la taille du jeu de données d'entraînement, du nombre de paramètres, et de la quantité de calculs réalisés pendant l'entraînement. Les progrès réalisés sont-ils alors plus le fait des recherches menées dans le domaine des réseaux de neurones, ou celui de l'écosystème logiciel et matériel sur lequel il s'appuie ? Afin de répondre à cette question, nous avons créé une nouvelle figure de mérite illustrant les choix architecturaux faits entre capacités et complexité. Nous avons choisi pour estimer la complexité d'utiliser la consommation énergétique lors de l'inférience, de sorte à représenter l'adéquation entre l'algorithme et l'architecture. Nous avons établi une façon de mesurer cette consommation énergétique, confirmé sa pertinence, et établi un classement de réseaux de neurones de l'état de l'art selon cette méthodologie. Nous avons ensuite exploré comment différents paramètres d'exécution influencent notre score, et comment le rafiner en allant de l'avant, en insistant sur le besoin de "fonction objectif" adaptées au cas d'usage. Nous finissons en établissant diverses façons de poursuivre le travail entamé durant cette thèse<br>Neural networks have seen impressive developments since the emergence of deep learning, around 2012, and are now the state of the art for diverse tasks, such as natural language processing, classification, prediction, autonomous systems etc. However, as the research tends to focus around the optimization of a single performance metric -- typically accuracy --, it appears that performances tend to scale reliably and even predictably with the size of the training dataset, the neural network's complexity and the total amount of training compute. In this context, we ask how much of the recent progress in the field of neural networks can be attributed to progress made in compute, software support, and hardware optimization. To answer this question, we created a new figure of merit illustrating tradeoffs between the complexity and capability of a network. We used the measured energy consumption per inference as an estimator of complexity and a way of representing the adequation between the algorithm and the architecture. We established a way of measuring this energy consumption, verified its relevance, and benchmarked networks from the state of the art according to this methodology. We then explored how different execution parameters influence our score, and how to further refine it, insisting on the need for diverse objective functions reflecting different usecases in the field of neural networks. We end by acknowledging the social and environmental responsibility of the neural network field, and lay out the envisioned continuation of our work
APA, Harvard, Vancouver, ISO, and other styles
16

Anani-Manyo, Nina K. "Computer Vision and Building Envelopes." Kent State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=kent1619539038754026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Buttar, Sarpreet Singh. "Applying Artificial Neural Networks to Reduce the Adaptation Space in Self-Adaptive Systems : an exploratory work." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-87117.

Full text
Abstract:
Self-adaptive systems have limited time to adjust their configurations whenever their adaptation goals, i.e., quality requirements, are violated due to some runtime uncertainties. Within the available time, they need to analyze their adaptation space, i.e., a set of configurations, to find the best adaptation option, i.e., configuration, that can achieve their adaptation goals. Existing formal analysis approaches find the best adaptation option by analyzing the entire adaptation space. However, exhaustive analysis requires time and resources and is therefore only efficient when the adaptation space is small. The size of the adaptation space is often in hundreds or thousands, which makes formal analysis approaches inefficient in large-scale self-adaptive systems. In this thesis, we tackle this problem by presenting an online learning approach that enables formal analysis approaches to analyze large adaptation spaces efficiently. The approach integrates with the standard feedback loop and reduces the adaptation space to a subset of adaptation options that are relevant to the current runtime uncertainties. The subset is then analyzed by the formal analysis approaches, which allows them to complete the analysis faster and efficiently within the available time. We evaluate our approach on two different instances of an Internet of Things application. The evaluation shows that our approach dramatically reduces the adaptation space and analysis time without compromising the adaptation goals.
APA, Harvard, Vancouver, ISO, and other styles
18

Sarpangala, Kishan. "Semantic Segmentation Using Deep Learning Neural Architectures." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin157106185092304.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Bono, Guillaume. "Deep multi-agent reinforcement learning for dynamic and stochastic vehicle routing problems." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI096.

Full text
Abstract:
La planification de tournées de véhicules dans des environnements urbains denses est un problème difficile qui nécessite des solutions robustes et flexibles. Les approches existantes pour résoudre ces problèmes de planification de tournées dynamiques et stochastiques (DS-VRPs) sont souvent basés sur les mêmes heuristiques utilisées dans le cas statique et déterministe, en figeant le problème à chaque fois que la situation évolue. Au lieu de cela, nous proposons dans cette thèse d’étudier l’application de méthodes d’apprentissage par renforcement multi-agent (MARL) aux DS-VRPs en s’appuyant sur des réseaux de neurones profonds (DNNs). Plus précisément, nous avons d’abord contribuer à étendre les méthodes basées sur le gradient de la politique (PG) aux cadres des processus de décision de Markov (MDPs) partiellement observables et décentralisés (Dec-POMDPs). Nous avons ensuite proposé un nouveau modèle de décision séquentiel en relâchant la contrainte d’observabilité partielle que nous avons baptisé MDP multi-agent séquentiel (sMMDP). Ce modèle permet de décrire plus naturellement les DS-VRPs, dans lesquels les véhicules prennent la décision de servir leurs prochains clients à l’issu de leurs précédents services, sans avoir à attendre les autres. Pour représenter nos solutions, des politiques stochastiques fournissant aux véhicules des règles de décisions, nous avons développé une architecture de DNN basée sur des mécanismes d’attention (MARDAM). Nous avons évalué MARDAM sur un ensemble de bancs de test artificiels qui nous ont permis de valider la qualité des solutions obtenues, la robustesse et la flexibilité de notre approche dans un contexte dynamique et stochastique, ainsi que sa capacité à généraliser à toute une classe de problèmes sans avoir à être ré-entraînée. Nous avons également développé un banc de test plus réaliste à base d’une simulation micro-traffic, et présenté une preuve de concept de l’applicabilité de MARDAM face à une variété de situations différentes<br>Routing delivery vehicles in dynamic and uncertain environments like dense city centers is a challenging task, which requires robustness and flexibility. Such logistic problems are usually formalized as Dynamic and Stochastic Vehicle Routing Problems (DS-VRPs) with a variety of additional operational constraints, such as Capacitated vehicles or Time Windows (DS-CVRPTWs). Main heuristic approaches to dynamic and stochastic problems simply consist in restarting the optimization process on a frozen (static and deterministic) version of the problem given the new information. Instead, Reinforcement Learning (RL) offers models such as Markov Decision Processes (MDPs) which naturally describe the evolution of stochastic and dynamic systems. Their application to more complex problems has been facilitated by recent progresses in Deep Neural Networks, which can learn to represent a large class of functions in high dimensional spaces to approximate solutions with high performances. Finding a compact and sufficiently expressive state representation is the key challenge in applying RL to VRPs. Recent work exploring this novel approach demonstrated the capabilities of Attention Mechanisms to represent sets of customers and learn policies generalizing to different configurations of customers. However, all existing work using DNNs reframe the VRP as a single-vehicle problem and cannot provide online decision rules for a fleet of vehicles.In this thesis, we study how to apply Deep RL methods to rich DS-VRPs as multi-agent systems. We first explore the class of policy-based approaches in Multi-Agent RL and Actor-Critic methods for Decentralized, Partially Observable MDPs in the Centralized Training for Decentralized Control (CTDC) paradigm. To address DS-VRPs, we then introduce a new sequential multi-agent model we call sMMDP. This fully observable model is designed to capture the fact that consequences of decisions can be predicted in isolation. Afterwards, we use it to model a rich DS-VRP and propose a new modular policy network to represent the state of the customers and the vehicles in this new model, called MARDAM. It provides online decision rules adapted to the information contained in the state and takes advantage of the structural properties of the model. Finally, we develop a set of artificial benchmarks to evaluate the flexibility, the robustness and the generalization capabilities of MARDAM. We report promising results in the dynamic and stochastic case, which demonstrate the capacity of MARDAM to address varying scenarios with no re-optimization, adapting to new customers and unexpected delays caused by stochastic travel times. We also implement an additional benchmark based on micro-traffic simulation to better capture the dynamics of a real city and its road infrastructures. We report preliminary results as a proof of concept that MARDAM can learn to represent different scenarios, handle varying traffic conditions, and customers configurations
APA, Harvard, Vancouver, ISO, and other styles
20

García, López Javier. "Geometric computer vision meets deep learning for autonomous driving applications." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672708.

Full text
Abstract:
This dissertation intends to provide theoretical and practical contributions on the development of deep learning algorithms for autonomous driving applications. The research is motivated by the need of deep neural networks (DNNs) to get a full understanding of the surrounding area and to be executed on real driving scenarios with real vehicles equipped with specific hardware, such as memory constrained (DSP or GPU platforms) or multiple optical sensors, which constraints the algorithm's development forcing the designed deep networks to be accurate, with minimum number of operations and low memory consumption. The main objective of this thesis is, on one hand, the research in the actual limitations of DL-based algorithms that prevent them of being integrated in nowadays' ADAS (Autonomous Driving System) functionalities, and on the other hand, the design and implementation of deep learning algorithms able to overcome such constraints to be applied on real autonomous driving scenarios, enabling their integration in low memory hardware platforms and avoiding sensor redundancy. Deep learning (DL) applications have been widely exploited over the last years but have some weak points that need to be faced and overcame in order to fully integrate DL into the development process of big manufacturers or automotive companies, like the time needed to design, train and validate and optimal network for a specific application or the vast knowledge from the required experts to tune hyperparameters of predefined networks in order to make them executable in the target platform and to obtain the biggest advantage of the hardware resources. During this thesis, we have addressed these topics and focused on the implementations of breakthroughs that would help in the industrial integration of DL-based applications in the automobile industry. This work has been done as part of the "Doctorat Industrial" program, at the company FICOSA ADAS, and it is because of the possibilities that developing this research at the company's facilities have brought to the author, that a direct impact of the achieved algorithms could be tested on real scenarios to proof their validity. Moreover, in this work, the author investigates deep in the automatic design of deep neural networks (DNN) based on state-of-the-art deep learning frameworks like NAS (neural architecture search). As stated in this work, one of the identified barriers of deep learning technology in nowadays automobile companies is the difficulty of developing light and accurate networks that could be integrated in small system on chips (SoC) or DSP. To overcome this constraint, the author proposes a framework named E-DNAS for the automatic design, training and validation of deep neural networks to perform image classification tasks and run on resource-limited hardware platforms. This apporach have been validated on a real system on chip by the company Texas Instrumets (tda2x) provided by the company, whose results are published within this thesis. As an extension of the mentioned E-DNAS, in the last chapter of this work the author presents a framework based on NAS valid for detecting objects whose main contribution is a learnable and fast way of finding object proposals on images that, on a second step, will be classified into one of the labeled classes.<br>Esta disertación tiene como objetivo principal proporcionar contribuciones teóricas y prácticas sobre el desarrollo de algoritmos de aprendizaje profundo para aplicaciones de conducción autónoma. La investigación está motivada por la necesidad de redes neuronales profundas (DNN) para obtener una comprensión completa del entorno y para ejecutarse en escenarios de conducción reales con vehículos reales equipados con hardware específico, los cuales tienen memoria limitada (plataformas DSP o GPU) o utilizan múltiples sensores ópticos Esto limita el desarrollo del algoritmo obligando a las redes profundas diseñadas a ser precisas, con un número mínimo de operaciones y bajo consumo de memoria y energía. El objetivo principal de esta tesis es, por un lado, investigar las limitaciones reales de los algoritmos basados en DL que impiden que se integren en las funcionalidades ADAS (Autonomous Driving System) actuales, y por otro, el diseño e implementación de algoritmos de aprendizaje profundo capaces de superar tales limitaciones para ser aplicados en escenarios reales de conducción autónoma, permitiendo su integración en plataformas de hardware de baja memoria y evitando la redundancia de sensores. Las aplicaciones de aprendizaje profundo (DL) se han explotado ampliamente en los últimos años, pero tienen algunos puntos débiles que deben enfrentarse y superarse para integrar completamente la DL en el proceso de desarrollo de los grandes fabricantes o empresas automobilísticas, como el tiempo necesario para diseñar, entrenar y validar una red óptima para una aplicación específica o el vasto conocimiento de los expertos requeridos para tunear hiperparámetros de redes predefinidas con el fin de hacerlas ejecutables en una plataforma concreta y obtener la mayor ventaja de los recursos de hardware. Durante esta tesis, hemos abordado estos temas y nos hemos centrado en las implementaciones de avances que ayudarían en la integración industrial de aplicaciones basadas en DL en la industria del automóvil. Este trabajo se ha realizado en el marco del programa "Doctorat Industrial", en la empresa FICOSA ADAS, y es por las posibilidades que la empresa ha ofrecido que se ha podido demostrar un impacto rápido y directo de los algoritmos conseguidos en escenarios de test reales para probar su validez. Además, en este trabajo, se investiga en profundidad el diseño automático de redes neuronales profundas (DNN) basadas en frameworks de deep learning de última generación como NAS (neural architecture search). Como se afirma en esta tesis, una de las barreras identificadas de la tecnología de aprendizaje profundo en las empresas automotrices de hoy en día es la dificultad de desarrollar redes ligeras y precisas que puedan integrarse en pequeños systems on chip(SoC) o DSP. Para superar esta restricción, se propone un framework llamado E-DNAS para el diseño automático, entrenamiento y validación de redes neuronales profundas para realizar tareas de clasificación de imágenes y ejecutarse en plataformas de hardware con recursos limitados. Este apporach ha sido validado en un system on chip real de la empresa Texas Instrumets (tda2x) facilitado por FICOSA ADAS, cuyos resultados se publican dentro de esta tesis. Como extensión del mencionado E-DNAS, en el último capítulo de este trabajo se presenta un framework basado en NAS válido para la detección de objetos cuya principal contribución es una forma fácil y rápida de encontrar propuestas de objetos en imágenes que, en un segundo paso, se clasificará en una de las clases etiquetadas.<br>Automàtica, robòtica i visió
APA, Harvard, Vancouver, ISO, and other styles
21

Ullah, I. "A PYRAMIDAL APPROACH FOR DESIGNING DEEP NEURAL NETWORK ARCHITECTURES." Doctoral thesis, Università degli Studi di Milano, 2017. http://hdl.handle.net/2434/466758.

Full text
Abstract:
Developing an intelligent system, capable of learning discriminative high-level features from high dimensional data lies at the core of solving many computer vision (CV ) and machine learning (ML) tasks. Scene or human action recognition from videos is an important topic in CV and ML. Its applications include video surveillance, robotics, human-computer interaction, video retrieval, etc. Several bio inspired hand crafted feature extraction systems have been proposed for processing temporal data. However, recent deep learning techniques have dominated CV and ML by their good performance on large scale datasets. One of the most widely used deep learning technique is Convolutional neural network (CNN) or its variations, e.g. ConvNet, 3DCNN, C3D. CNN kernel scheme reduces the number of parameters with respect to fully connected Neural Networks. Recent deep CNNs have more layers and more kernels for each layer with respect to early CNNs, and as a consequence, they result in a large number of parameters. In addition, they violate the pyramidal plausible architecture of biological neural network due to the increasing number of filters at each higher layer resulting in difficulty for convergence at training step. In this dissertation, we address three main questions central to pyramidal structure and deep neural networks: 1) Is it worth to utilize pyramidal architecture for proposing a generalized recognition system? 2) How to enhance pyramidal neural network (PyraNet) for recognizing action and dynamic scenes in the videos? 3) What will be the impact of imposing pyramidal structure on a deep CNN? In the first part of the thesis, we provide a brief review of the work done for action and dynamic scene recognition using traditional computer vision and machine learning approaches. In addition, we give a historical and present overview of pyramidal neural networks and how deep learning emerged. In the second part, we introduce a strictly pyramidal deep architecture for dynamic scene and human action recognition. It is based on the 3DCNN model and the image pyramid concept. We introduce a new 3D weighting scheme that presents a simple connection scheme with lower computational and memory costs and results in less number of learnable parameters compared to other neural networks. 3DPyraNet extracts features from both spatial and temporal dimensions by keeping biological structure, thereby it is capable to capture the motion information encoded in multiple adjacent frames. 3DPyraNet model is extended with three modifications: 1) changing input image size; 2) changing receptive field and overlap size in correlation layers; and 3) adding a linear classifier at the end to classify the learned features. It results in a discriminative approach for spatiotemporal feature learning in action and dynamic scene recognition. In combination with a linear SVM classifier, our model outperforms state-of-the-art methods in one-vs-all accuracy on three video benchmark datasets (KTH, Weizmann, and Maryland). Whereas, it gives competitive accuracy on a 4th dataset (YUPENN). In the last part of our thesis, we investigate to what extent CNN may take advantage of pyramid structure typical of biological neurons. A generalized statement over convolutional layers from input up-to fully connected layer is introduced that further helps in understanding and designing a successful deep network. It reduces ambiguity, number of parameters, and their size on disk without degrading overall accuracy. It also helps in giving a generalize guideline for modeling a deep architecture by keeping certain ratio of filters in starting layers vs. other deeper layers. Competitive results are achieved compared to similar well-engineered deeper architectures on four benchmark datasets. The same approach is further applied on person re-identification. Less ambiguity in features increase Rank-1 performance and results in better or comparable results to the state-of-the-art deep models.
APA, Harvard, Vancouver, ISO, and other styles
22

Луцишин, Роман Олегович, та Roman Olehovych Lutsyshyn. "Методи автоматизованого перекладу природної мови на основі нейромережевої моделі “послідовність-послідовність”". Master's thesis, Тернопільський національний технічний університет імені Івана Пулюя, 2020. http://elartu.tntu.edu.ua/handle/lib/33271.

Full text
Abstract:
Кваліфікаційну роботу магістра присвячено дослідженню та реалізації методів автоматизованого перекладу природної мови на основі нейромережевої моделі “послідовність-послідовність”. Розглянуто основні принципи та підходи до підготовки тренувальної вибірки даних, у тому числі з використанням глибоких нейронних мереж у якості енкодерів. Досліджено та проаналізовано наявні методи вирішення задачі перекладу природної мови, зокрема, було розглянуто декілька нейромережевих архітектур глибокого машинного навчання. Наведено приклади створення та обробки корпусів природної мови для вирішення задачі формування тренувальної та тестувальної вибірок даних. Було проведено повну оцінку вартості створення комп’ютерної системи, необхідної для вирішення поставленого завдання, а також описано повний процес розгортання програмного забезпечення на даному середовищі за допомогою сторонніх платформ.<br>The master's thesis is devoted to the research and implementation of methods of automated translation of natural language on the basis of the neural network model "sequence-sequence". The basic principles and approaches to the preparation of training data sampling, including the use of deep neural networks as encoders, are considered. The existing methods of solving the problem of natural language translation have been studied and analyzed, in particular, several neural network architectures of deep machine origin have been considered. Examples of creation and processing of natural language corpora to solve the problem of forming training and test data samples are given. A full assessment of the cost of creating a computer system required to solve the problem was performed, as well as a complete process of deploying software in this environment using third-party platforms. The results of the research were a complete review of existing solutions to solve the problem, choosing the best technology, improving the latter, implementation and training of a deep neural network model such as sequence-sequence" for the problem of natural language translation.<br>1. ВСТУП 2. АНАЛІЗ ПРЕДМЕТНОЇ ОБЛАСТІ 3. ОБҐРУНТУВАННЯ ОБРАНИХ ЗАСОБІВ 4. РЕАЛІЗАЦІЯ СИСТЕМИ ПЕРЕКЛАДУ ПРИРОДНОЇ МОВИ НА ОСНОВІ МОДЕЛІ "ПОСЛІДОВНІСТЬ-ПОСЛІДОВНІСТЬ" ТА НЕЙРОМЕРЕЖЕВОЇ АРХІТЕКТУРИ ТРАСНФОРМЕРС 5. ОХОРОНА ПРАЦІ ТА БЕЗПЕКА В НАДЗВИЧАЙНИХ СИТУАЦІЯХ
APA, Harvard, Vancouver, ISO, and other styles
23

Liu, Qian. "Deep spiking neural networks." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/deep-spiking-neural-networks(336e6a37-2a0b-41ff-9ffb-cca897220d6c).html.

Full text
Abstract:
Neuromorphic Engineering (NE) has led to the development of biologically-inspired computer architectures whose long-term goal is to approach the performance of the human brain in terms of energy efficiency and cognitive capabilities. Although there are a number of neuromorphic platforms available for large-scale Spiking Neural Network (SNN) simulations, the problem of programming these brain-like machines to be competent in cognitive applications still remains unsolved. On the other hand, Deep Learning has emerged in Artificial Neural Network (ANN) research to dominate state-of-the-art solutions for cognitive tasks. Thus the main research problem emerges of understanding how to operate and train biologically-plausible SNNs to close the gap in cognitive capabilities between SNNs and ANNs. SNNs can be trained by first training an equivalent ANN and then transferring the tuned weights to the SNN. This method is called ‘off-line’ training, since it does not take place on an SNN directly, but rather on an ANN instead. However, previous work on such off-line training methods has struggled in terms of poor modelling accuracy of the spiking neurons and high computational complexity. In this thesis we propose a simple and novel activation function, Noisy Softplus (NSP), to closely model the response firing activity of biologically-plausible spiking neurons, and introduce a generalised off-line training method using the Parametric Activation Function (PAF) to map the abstract numerical values of the ANN to concrete physical units, such as current and firing rate in the SNN. Based on this generalised training method and its fine tuning, we achieve the state-of-the-art accuracy on the MNIST classification task using spiking neurons, 99.07%, on a deep spiking convolutional neural network (ConvNet). We then take a step forward to ‘on-line’ training methods, where Deep Learning modules are trained purely on SNNs in an event-driven manner. Existing work has failed to provide SNNs with recognition accuracy equivalent to ANNs due to the lack of mathematical analysis. Thus we propose a formalised Spike-based Rate Multiplication (SRM) method which transforms the product of firing rates to the number of coincident spikes of a pair of rate-coded spike trains. Moreover, these coincident spikes can be captured by the Spike-Time-Dependent Plasticity (STDP) rule to update the weights between the neurons in an on-line, event-based, and biologically-plausible manner. Furthermore, we put forward solutions to reduce correlations between spike trains; thereby addressing the result of performance drop in on-line SNN training. The promising results of spiking Autoencoders (AEs) and Restricted Boltzmann Machines (SRBMs) exhibit equivalent, sometimes even superior, classification and reconstruction capabilities compared to their non-spiking counterparts. To provide meaningful comparisons between these proposed SNN models and other existing methods within this rapidly advancing field of NE, we propose a large dataset of spike-based visual stimuli and a corresponding evaluation methodology to estimate the overall performance of SNN models and their hardware implementations.
APA, Harvard, Vancouver, ISO, and other styles
24

Hanson, Jack. "Protein Structure Prediction by Recurrent and Convolutional Deep Neural Network Architectures." Thesis, Griffith University, 2018. http://hdl.handle.net/10072/382722.

Full text
Abstract:
In this thesis, the application of convolutional and recurrent machine learning techniques to several key structural properties of proteins is explored. Chapter 2 presents the rst application of an LSTM-BRNN in structural bioinformat- ics. The method, called SPOT-Disorder, predicts the per-residue probability of a protein being intrinsically disordered (ie. unstructured, or exible). Using this methodology, SPOT-Disorder achieved the highest accuracy in the literature without separating short and long disordered regions during training as was required in previous models, and was additionally proven capable of indirectly discerning functional sites located in disordered regions. Chapter 3 extends the application of an LSTM-BRNN to a two-dimensional problem in the prediction of protein contact maps. Protein contact maps describe the intra-sequence distance between each residue pairing at a distance cuto , providing key restraints towards the possible conformations of a protein. This work, entitled SPOT-Contact, introduced the coupling of two-dimensional LSTM-BRNNs with ResNets to maximise dependency propagation in order to achieve the highest reported accuracies for contact map preci- sion. Several models of varying architectures were trained and combined as an ensemble predictor in order to minimise incorrect generalisations. Chapter 4 discusses the utilisation of an ensemble of LSTM-BRNNs and ResNets to predict local protein one-dimensional structural properties. The method, called SPOT-1D, predicts for a wide range of local structural descriptors, including several solvent exposure metrics, secondary structure, and real-valued backbone angles. SPOT-1D was signi cantly improved by the inclusion of the outputs of SPOT-Contact in the input features. Using this topology led to the best reported accuracy metrics for all predicted properties. The protein structures constructed by the backbone angles predicted by SPOT-1D achieved the lowest average error from their native structures in the literature. Chapter 5 presents an update on SPOT-Disorder, as it employs the inputs from SPOT- 1D in conjunction with an ensemble of LSTM-BRNN's and Inception Residual Squeeze and Excitation networks to predict for protein intrinsic disorder. This model con rmed the enhancement provided by utilising the coupled architectures over the LSTM-BRNN solely, whilst also introducing a new convolutional format to the bioinformatics eld. The work in Chapter 6 utilises the same topology from SPOT-1D for single-sequence prediction of protein intrinsic disorder in SPOT-Disorder-Single. Single-sequence predic- tion describes the prediction of a protein's properties without the use of evolutionary information. While evolutionary information generally improves the performance of a computational model, it comes at the expense of a greatly increased computational and time load. Removing this from the model allows for genome-scale protein analysis at a minor drop in accuracy. However, models trained without evolutionary profi les can be more accurate for proteins with limited and therefore unreliable evolutionary information.<br>Thesis (PhD Doctorate)<br>Doctor of Philosophy (PhD)<br>School of Eng & Built Env<br>Science, Environment, Engineering and Technology<br>Full Text
APA, Harvard, Vancouver, ISO, and other styles
25

Squadrani, Lorenzo. "Deep neural networks and thermodynamics." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text
Abstract:
Deep learning is the most effective and used approach to artificial intelligence, and yet it is far from being properly understood. The understanding of it is the way to go to further improve its effectiveness and in the best case to gain some understanding of the "natural" intelligence. We attempt a step in this direction with the aim of physics. We describe a convolutional neural network for image classification (trained on CIFAR-10) within the descriptive framework of Thermodynamics. In particular we define and study the temperature of each component of the network. Our results provides a new point of view on deep learning models, which may be a starting point towards a better understanding of artificial intelligence.
APA, Harvard, Vancouver, ISO, and other styles
26

Mancevo, del Castillo Ayala Diego. "Compressing Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217316.

Full text
Abstract:
Deep Convolutional Neural Networks and "deep learning" in general stand at the cutting edge on a range of applications, from image based recognition and classification to natural language processing, speech and speaker recognition and reinforcement learning. Very deep models however are often large, complex and computationally expensive to train and evaluate. Deep learning models are thus seldom deployed natively in environments where computational resources are scarce or expensive. To address this problem we turn our attention towards a range of techniques that we collectively refer to as "model compression" where a lighter student model is trained to approximate the output produced by the model we wish to compress. To this end, the output from the original model is used to craft the training labels of the smaller student model. This work contains some experiments on CIFAR-10 and demonstrates how to use the aforementioned techniques to compress a people counting model whose precision, recall and F1-score are improved by as much as 14% against our baseline.
APA, Harvard, Vancouver, ISO, and other styles
27

Abbasi, Mahdieh. "Toward robust deep neural networks." Doctoral thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67766.

Full text
Abstract:
Dans cette thèse, notre objectif est de développer des modèles d’apprentissage robustes et fiables mais précis, en particulier les Convolutional Neural Network (CNN), en présence des exemples anomalies, comme des exemples adversaires et d’échantillons hors distribution –Out-of-Distribution (OOD). Comme la première contribution, nous proposons d’estimer la confiance calibrée pour les exemples adversaires en encourageant la diversité dans un ensemble des CNNs. À cette fin, nous concevons un ensemble de spécialistes diversifiés avec un mécanisme de vote simple et efficace en termes de calcul pour prédire les exemples adversaires avec une faible confiance tout en maintenant la confiance prédicative des échantillons propres élevée. En présence de désaccord dans notre ensemble, nous prouvons qu’une borne supérieure de 0:5 + _0 peut être établie pour la confiance, conduisant à un seuil de détection global fixe de tau = 0; 5. Nous justifions analytiquement le rôle de la diversité dans notre ensemble sur l’atténuation du risque des exemples adversaires à la fois en boîte noire et en boîte blanche. Enfin, nous évaluons empiriquement la robustesse de notre ensemble aux attaques de la boîte noire et de la boîte blanche sur plusieurs données standards. La deuxième contribution vise à aborder la détection d’échantillons OOD à travers un modèle de bout en bout entraîné sur un ensemble OOD approprié. À cette fin, nous abordons la question centrale suivante : comment différencier des différents ensembles de données OOD disponibles par rapport à une tâche de distribution donnée pour sélectionner la plus appropriée, ce qui induit à son tour un modèle calibré avec un taux de détection des ensembles inaperçus de données OOD? Pour répondre à cette question, nous proposons de différencier les ensembles OOD par leur niveau de "protection" des sub-manifolds. Pour mesurer le niveau de protection, nous concevons ensuite trois nouvelles mesures efficaces en termes de calcul à l’aide d’un CNN vanille préformé. Dans une vaste série d’expériences sur les tâches de classification d’image et d’audio, nous démontrons empiriquement la capacité d’un CNN augmenté (A-CNN) et d’un CNN explicitement calibré pour détecter une portion significativement plus grande des exemples OOD. Fait intéressant, nous observons également qu’un tel A-CNN (nommé A-CNN) peut également détecter les adversaires exemples FGS en boîte noire avec des perturbations significatives. En tant que troisième contribution, nous étudions de plus près de la capacité de l’A-CNN sur la détection de types plus larges d’adversaires boîte noire (pas seulement ceux de type FGS). Pour augmenter la capacité d’A-CNN à détecter un plus grand nombre d’adversaires,nous augmentons l’ensemble d’entraînement OOD avec des échantillons interpolés inter-classes. Ensuite, nous démontrons que l’A-CNN, entraîné sur tous ces données, a un taux de détection cohérent sur tous les types des adversaires exemples invisibles. Alors que la entraînement d’un A-CNN sur des adversaires PGD ne conduit pas à un taux de détection stable sur tous les types d’adversaires, en particulier les types inaperçus. Nous évaluons également visuellement l’espace des fonctionnalités et les limites de décision dans l’espace d’entrée d’un CNN vanille et de son homologue augmenté en présence d’adversaires et de ceux qui sont propres. Par un A-CNN correctement formé, nous visons à faire un pas vers un modèle d’apprentissage debout en bout unifié et fiable avec de faibles taux de risque sur les échantillons propres et les échantillons inhabituels, par exemple, les échantillons adversaires et OOD. La dernière contribution est de présenter une application de A-CNN pour l’entraînement d’un détecteur d’objet robuste sur un ensemble de données partiellement étiquetées, en particulier un ensemble de données fusionné. La fusion de divers ensembles de données provenant de contextes similaires mais avec différents ensembles d’objets d’intérêt (OoI) est un moyen peu coûteux de créer un ensemble de données à grande échelle qui couvre un plus large spectre d’OoI. De plus, la fusion d’ensembles de données permet de réaliser un détecteur d’objet unifié, au lieu d’en avoir plusieurs séparés, ce qui entraîne une réduction des coûts de calcul et de temps. Cependant, la fusion d’ensembles de données, en particulier à partir d’un contexte similaire, entraîne de nombreuses instances d’étiquetées manquantes. Dans le but d’entraîner un détecteur d’objet robuste intégré sur un ensemble de données partiellement étiquetées mais à grande échelle, nous proposons un cadre d’entraînement auto-supervisé pour surmonter le problème des instances d’étiquettes manquantes dans les ensembles des données fusionnés. Notre cadre est évalué sur un ensemble de données fusionné avec un taux élevé d’étiquettes manquantes. Les résultats empiriques confirment la viabilité de nos pseudo-étiquettes générées pour améliorer les performances de YOLO, en tant que détecteur d’objet à la pointe de la technologie.<br>In this thesis, our goal is to develop robust and reliable yet accurate learning models, particularly Convolutional Neural Networks (CNNs), in the presence of adversarial examples and Out-of-Distribution (OOD) samples. As the first contribution, we propose to predict adversarial instances with high uncertainty through encouraging diversity in an ensemble of CNNs. To this end, we devise an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism to predict the adversarial examples with low confidence while keeping the predictive confidence of the clean samples high. In the presence of high entropy in our ensemble, we prove that the predictive confidence can be upper-bounded, leading to have a globally fixed threshold over the predictive confidence for identifying adversaries. We analytically justify the role of diversity in our ensemble on mitigating the risk of both black-box and white-box adversarial examples. Finally, we empirically assess the robustness of our ensemble to the black-box and the white-box attacks on several benchmark datasets.The second contribution aims to address the detection of OOD samples through an end-to-end model trained on an appropriate OOD set. To this end, we address the following central question: how to differentiate many available OOD sets w.r.t. a given in distribution task to select the most appropriate one, which in turn induces a model with a high detection rate of unseen OOD sets? To answer this question, we hypothesize that the “protection” level of in-distribution sub-manifolds by each OOD set can be a good possible property to differentiate OOD sets. To measure the protection level, we then design three novel, simple, and cost-effective metrics using a pre-trained vanilla CNN. In an extensive series of experiments on image and audio classification tasks, we empirically demonstrate the abilityof an Augmented-CNN (A-CNN) and an explicitly-calibrated CNN for detecting a significantly larger portion of unseen OOD samples, if they are trained on the most protective OOD set. Interestingly, we also observe that the A-CNN trained on the most protective OOD set (calledA-CNN) can also detect the black-box Fast Gradient Sign (FGS) adversarial examples. As the third contribution, we investigate more closely the capacity of the A-CNN on the detection of wider types of black-box adversaries. To increase the capability of A-CNN to detect a larger number of adversaries, we augment its OOD training set with some inter-class interpolated samples. Then, we demonstrate that the A-CNN trained on the most protective OOD set along with the interpolated samples has a consistent detection rate on all types of unseen adversarial examples. Where as training an A-CNN on Projected Gradient Descent (PGD) adversaries does not lead to a stable detection rate on all types of adversaries, particularly the unseen types. We also visually assess the feature space and the decision boundaries in the input space of a vanilla CNN and its augmented counterpart in the presence of adversaries and the clean ones. By a properly trained A-CNN, we aim to take a step toward a unified and reliable end-to-end learning model with small risk rates on both clean samples and the unusual ones, e.g. adversarial and OOD samples.The last contribution is to show a use-case of A-CNN for training a robust object detector on a partially-labeled dataset, particularly a merged dataset. Merging various datasets from similar contexts but with different sets of Object of Interest (OoI) is an inexpensive way to craft a large-scale dataset which covers a larger spectrum of OoIs. Moreover, merging datasets allows achieving a unified object detector, instead of having several separate ones, resultingin the reduction of computational and time costs. However, merging datasets, especially from a similar context, causes many missing-label instances. With the goal of training an integrated robust object detector on a partially-labeled but large-scale dataset, we propose a self-supervised training framework to overcome the issue of missing-label instances in the merged datasets. Our framework is evaluated on a merged dataset with a high missing-label rate. The empirical results confirm the viability of our generated pseudo-labels to enhance the performance of YOLO, as the current (to date) state-of-the-art object detector.
APA, Harvard, Vancouver, ISO, and other styles
28

Singh, Jaswinder. "RNA Structure Prediction using Deep Neural Network Architectures and Improved Evolutionary Profiles." Thesis, Griffith University, 2022. http://hdl.handle.net/10072/414924.

Full text
Abstract:
RNAs are important biological macro-molecules that play critical roles in many biological processes. The functionality of RNA depends on its three-dimensional (3D) structure, which further depends on its primary structure, i.e. the order of sequence of nucleotides in the RNA chain. Direct prediction of the 3D structure of an RNA from its sequence is a challenging task. Therefore, the 3D structure is further divided into two-dimensional (2D) properties such as secondary structure, contact maps and one-dimensional (1D) properties such as torsion angles and solvent accessibility. An accurate prediction of these 1D and 2D structural properties will increase the accuracy in predicting the 3D structure of the RNA. This thesis explores various deep learning algorithms and input features relevant to predicting the 1D and 2D structural properties of an RNA. Using these predicted 1D and 2D structural properties further as restraints, we have demonstrated an improvement in the prediction of the RNA 3D structure. There are four primary studies performed in this thesis for RNA structural properties prediction. The first study introduces two methods (SPOT-RNA and SPOT-RNA2) for RNA secondary structure prediction using an ensemble of Residual Con-volution and Bi-directional LSTM recurrent neural networks. This study shows that deep learning based methods can outperform existing dynamic programming based algorithms and achieve state-of-the-art performance using single-sequence and evolutionary information as input. The second study investigates the application of deep neural networks for predicting RNA backbone torsion and pseudotorsion angles. We have pioneered in predicting the backbone torsion and pseudotorsion angles using deep learning (SPOT-RNA-1D). The angles predicted using SPOT-RNA-1D could be used as 3D model quality indicators. The third study introduces a method (SPOT-RNA-2D) to predict RNA distance-based contact maps using an ensemble of deep neural networks and improved evolutionary profles from RNAcmap. This study shows that the use of predicted distance-based contact maps as restraints can signifcantly improve the performance of 3D structure prediction. The fourth study developed a fully automated pipeline (RNAcmap2) to generate aligned homologs. Here, we showed that using a combination of BLAST-N and iterative INFERNAL searches along with an expanded sequence database leads to multiple sequence alignments (MSA) comparable to those provided by Rfam MSAs according to secondary structure extracted from mutational coupling analysis and alignment accuracy when compared to structural alignment. This fully automatic tool (RNAcmap2) allows to search homolog, multiple sequence alignment, and mutational coupling analysis for any non-Rfam RNA sequences with Rfam-like performance. The improved RNA 1D and 2D structural properties predictions using deep learning along with improved homolog search collectively is expected to be useful in predicting RNA three-dimensional structure and better un-derstand its biological function.<br>Thesis (PhD Doctorate)<br>Doctor of Philosophy (PhD)<br>School of Eng & Built Env<br>Science, Environment, Engineering and Technology<br>Full Text
APA, Harvard, Vancouver, ISO, and other styles
29

Dhamija, Tanush. "Deep Learning Architectures for time of arrival detection in Acoustic Emissions Monitoring." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24620/.

Full text
Abstract:
Acoustic Emission (AE) monitoring can be used to detect the presence of damage as well as determine its location in Structural Health Monitoring (SHM) applications. Information on the time difference of the signal generated by the damage event arriving at different sensors is essential in performing localization. This makes the time of arrival (ToA) an important piece of information to retrieve from the AE signal. Generally, this is determined using statistical methods such as the Akaike Information Criterion (AIC) which is particularly prone to errors in the presence of noise. And given that the structures of interest are surrounded with harsh environments, a way to accurately estimate the arrival time in such noisy scenarios is of particular interest. In this work, two new methods are presented to estimate the arrival times of AE signals which are based on Machine Learning. Inspired by great results in the field, two models are presented which are Deep Learning models - a subset of machine learning. They are based on Convolutional Neural Network (CNN) and Capsule Neural Network (CapsNet). The primary advantage of such models is that they do not require the user to pre-define selected features but only require raw data to be given and the models establish non-linear relationships between the inputs and outputs. The performance of the models is evaluated using AE signals generated by a custom ray-tracing algorithm by propagating them on an aluminium plate and compared to AIC. It was found that the relative error in estimation on the test set was < 5% for the models compared to around 45% of AIC. The testing process was further continued by preparing an experimental setup and acquiring real AE signals to test on. Similar performances were observed where the two models not only outperform AIC by more than a magnitude in their average errors but also they were shown to be a lot more robust as compared to AIC which fails in the presence of noise.
APA, Harvard, Vancouver, ISO, and other styles
30

Lu, Yifei. "Deep neural networks and fraud detection." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-331833.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Kalogiras, Vasileios. "Sentiment Classification with Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217858.

Full text
Abstract:
Attitydanalys är ett delfält av språkteknologi (NLP) som försöker analysera känslan av skriven text. Detta är ett komplext problem som medför många utmaningar. Av denna anledning har det studerats i stor utsträckning. Under de senaste åren har traditionella maskininlärningsalgoritmer eller handgjord metodik använts och givit utmärkta resultat. Men den senaste renässansen för djupinlärning har växlat om intresse till end to end deep learning-modeller.Å ena sidan resulterar detta i mer kraftfulla modeller men å andra sidansaknas klart matematiskt resonemang eller intuition för dessa modeller. På grund av detta görs ett försök i denna avhandling med att kasta ljus på nyligen föreslagna deep learning-arkitekturer för attitydklassificering. En studie av deras olika skillnader utförs och ger empiriska resultat för hur ändringar i strukturen eller kapacitet hos modellen kan påverka exaktheten och sättet den representerar och ''förstår'' meningarna.<br>Sentiment analysis is a subfield of natural language processing (NLP) that attempts to analyze the sentiment of written text.It is is a complex problem that entails different challenges. For this reason, it has been studied extensively. In the past years traditional machine learning algorithms or handcrafted methodologies used to provide state of the art results. However, the recent deep learning renaissance shifted interest towards end to end deep learning models. On the one hand this resulted into more powerful models but on the other hand clear mathematical reasoning or intuition behind distinct models is still lacking. As a result, in this thesis, an attempt to shed some light on recently proposed deep learning architectures for sentiment classification is made.A study of their differences is performed as well as provide empirical results on how changes in the structure or capacity of a model can affect its accuracy and the way it represents and ''comprehends'' sentences.
APA, Harvard, Vancouver, ISO, and other styles
32

Choi, Keunwoo. "Deep neural networks for music tagging." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/46029.

Full text
Abstract:
In this thesis, I present my hypothesis, experiment results, and discussion that are related to various aspects of deep neural networks for music tagging. Music tagging is a task to automatically predict the suitable semantic label when music is provided. Generally speaking, the input of music tagging systems can be any entity that constitutes music, e.g., audio content, lyrics, or metadata, but only the audio content is considered in this thesis. My hypothesis is that we can fi nd effective deep learning practices for the task of music tagging task that improves the classi fication performance. As a computational model to realise a music tagging system, I use deep neural networks. Combined with the research problem, the scope of this thesis is the understanding, interpretation, optimisation, and application of deep neural networks in the context of music tagging systems. The ultimate goal of this thesis is to provide insight that can help to improve deep learning-based music tagging systems. There are many smaller goals in this regard. Since using deep neural networks is a data-driven approach, it is crucial to understand the dataset. Selecting and designing a better architecture is the next topic to discuss. Since the tagging is done with audio input, preprocessing the audio signal becomes one of the important research topics. After building (or training) a music tagging system, fi nding a suitable way to re-use it for other music information retrieval tasks is a compelling topic, in addition to interpreting the trained system. The evidence presented in the thesis supports that deep neural networks are powerful and credible methods for building a music tagging system.
APA, Harvard, Vancouver, ISO, and other styles
33

Yin, Yonghua. "Random neural networks for deep learning." Thesis, Imperial College London, 2018. http://hdl.handle.net/10044/1/64917.

Full text
Abstract:
The random neural network (RNN) is a mathematical model for an 'integrate and fire' spiking network that closely resembles the stochastic behaviour of neurons in mammalian brains. Since its proposal in 1989, there have been numerous investigations into the RNN's applications and learning algorithms. Deep learning (DL) has achieved great success in machine learning, but there has been no research into the properties of the RNN for DL to combine their power. This thesis intends to bridge the gap between RNNs and DL, in order to provide powerful DL tools that are faster, and that can potentially be used with less energy expenditure than existing methods. Based on the RNN function approximator proposed by Gelenbe in 1999, the approximation capability of the RNN is investigated and an efficient classifier is developed. By combining the RNN, DL and non-negative matrix factorisation, new shallow and multi-layer non-negative autoencoders are developed. The autoencoders are tested on typical image datasets and real-world datasets from different domains, and the test results yield the desired high learning accuracy. The concept of dense nuclei/clusters is examined, using RNN theory as a basis. In dense nuclei, neurons may interconnect via soma-to-soma interactions and conventional synaptic connections. A mathematical model of the dense nuclei is proposed and the transfer function can be deduced. A multi-layer architecture of the dense nuclei is constructed for DL, whose value is demonstrated by experiments on multi-channel datasets and server-state classification in cloud servers. A theoretical study into the multi-layer architecture of the standard RNN (MLRNN) for DL is presented. Based on the layer-output analyses, the MLRNN is shown to be a universal function approximator. The effects of the layer number on the learning capability and high-level representation extraction are analysed. A hypothesis for transforming the DL problem into a moment-learning problem is also presented. The power of the standard RNN for DL is investigated. The ability of the RNN with only positive parameters to conduct image convolution operations is demonstrated. The MLRNN equipped with the developed training algorithm achieves comparable or better classification at a lower computation cost than conventional DL methods.
APA, Harvard, Vancouver, ISO, and other styles
34

Zagoruyko, Sergey. "Weight parameterizations in deep neural networks." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1129/document.

Full text
Abstract:
Les réseaux de neurones multicouches ont été proposés pour la première fois il y a plus de trois décennies, et diverses architectures et paramétrages ont été explorés depuis. Récemment, les unités de traitement graphique ont permis une formation très efficace sur les réseaux neuronaux et ont permis de former des réseaux beaucoup plus grands sur des ensembles de données plus importants, ce qui a considérablement amélioré le rendement dans diverses tâches d'apprentissage supervisé. Cependant, la généralisation est encore loin du niveau humain, et il est difficile de comprendre sur quoi sont basées les décisions prises. Pour améliorer la généralisation et la compréhension, nous réexaminons les problèmes de paramétrage du poids dans les réseaux neuronaux profonds. Nous identifions les problèmes les plus importants, à notre avis, dans les architectures modernes : la profondeur du réseau, l'efficacité des paramètres et l'apprentissage de tâches multiples en même temps, et nous essayons de les aborder dans cette thèse. Nous commençons par l'un des problèmes fondamentaux de la vision par ordinateur, le patch matching, et proposons d'utiliser des réseaux neuronaux convolutifs de différentes architectures pour le résoudre, au lieu de descripteurs manuels. Ensuite, nous abordons la tâche de détection d'objets, où un réseau devrait apprendre simultanément à prédire à la fois la classe de l'objet et l'emplacement. Dans les deux tâches, nous constatons que le nombre de paramètres dans le réseau est le principal facteur déterminant sa performance, et nous explorons ce phénomène dans les réseaux résiduels. Nos résultats montrent que leur motivation initiale, la formation de réseaux plus profonds pour de meilleures représentations, ne tient pas entièrement, et des réseaux plus larges avec moins de couches peuvent être aussi efficaces que des réseaux plus profonds avec le même nombre de paramètres. Dans l'ensemble, nous présentons une étude approfondie sur les architectures et les paramétrages de poids, ainsi que sur les moyens de transférer les connaissances entre elles<br>Multilayer neural networks were first proposed more than three decades ago, and various architectures and parameterizations were explored since. Recently, graphics processing units enabled very efficient neural network training, and allowed training much larger networks on larger datasets, dramatically improving performance on various supervised learning tasks. However, the generalization is still far from human level, and it is difficult to understand on what the decisions made are based. To improve on generalization and understanding we revisit the problems of weight parameterizations in deep neural networks. We identify the most important, to our mind, problems in modern architectures: network depth, parameter efficiency, and learning multiple tasks at the same time, and try to address them in this thesis. We start with one of the core problems of computer vision, patch matching, and propose to use convolutional neural networks of various architectures to solve it, instead of manual hand-crafting descriptors. Then, we address the task of object detection, where a network should simultaneously learn to both predict class of the object and the location. In both tasks we find that the number of parameters in the network is the major factor determining it's performance, and explore this phenomena in residual networks. Our findings show that their original motivation, training deeper networks for better representations, does not fully hold, and wider networks with less layers can be as effective as deeper with the same number of parameters. Overall, we present an extensive study on architectures and weight parameterizations, and ways of transferring knowledge between them
APA, Harvard, Vancouver, ISO, and other styles
35

Ioannou, Yani Andrew. "Structural priors in deep neural networks." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/278976.

Full text
Abstract:
Deep learning has in recent years come to dominate the previously separate fields of research in machine learning, computer vision, natural language understanding and speech recognition. Despite breakthroughs in training deep networks, there remains a lack of understanding of both the optimization and structure of deep networks. The approach advocated by many researchers in the field has been to train monolithic networks with excess complexity, and strong regularization --- an approach that leaves much to desire in efficiency. Instead we propose that carefully designing networks in consideration of our prior knowledge of the task and learned representation can improve the memory and compute efficiency of state-of-the art networks, and even improve generalization --- what we propose to denote as structural priors. We present two such novel structural priors for convolutional neural networks, and evaluate them in state-of-the-art image classification CNN architectures. The first of these methods proposes to exploit our knowledge of the low-rank nature of most filters learned for natural images by structuring a deep network to learn a collection of mostly small, low-rank, filters. The second addresses the filter/channel extents of convolutional filters, by learning filters with limited channel extents. The size of these channel-wise basis filters increases with the depth of the model, giving a novel sparse connection structure that resembles a tree root. Both methods are found to improve the generalization of these architectures while also decreasing the size and increasing the efficiency of their training and test-time computation. Finally, we present work towards conditional computation in deep neural networks, moving towards a method of automatically learning structural priors in deep networks. We propose a new discriminative learning model, conditional networks, that jointly exploit the accurate representation learning capabilities of deep neural networks with the efficient conditional computation of decision trees. Conditional networks yield smaller models, and offer test-time flexibility in the trade-off of computation vs. accuracy.
APA, Harvard, Vancouver, ISO, and other styles
36

Billman, Linnar, and Johan Hullberg. "Speech Reading with Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-360022.

Full text
Abstract:
Recent growth in computational power and available data has increased popularityand progress of machine learning techniques. Methods of machine learning areused for automatic speech recognition in order to allow humans to transferinformation to computers simply by speech. In the present work, we are interestedin doing this for general contexts as e.g. speakers talking on TV or newsreadersrecorded in a studio. Automatic speech recognition systems are often solely basedon acoustic data. By introducing visual data such as lip movements, robustness ofsuch system can be increased.This thesis instead investigates how well machine learning techniques can learnthe art of lip reading as a sole source for automatic speech recognition. The keyidea is to use a sequence of 24 lip coordinates to feed to the system, rather thanlearning directly from the raw video frames.This thesis designs a solution around this principle empowered by state-of-the-artmachine learning techniques such as recurrent neural networks, making use ofGPUs. We find that this design reduces computational requirements by more thana factor of 25 compared to a state-of-art machine learning solution called LipNet.This however also scales down performance to an accuracy of 80% of what LipNetachieves, while still outperforming human recognition by a factor of 150%. Theaccuracies are based on processing of yet unseen speakers.This text presents this architecture. It details its design, reports its results, andcompares its performance to an existing solution. Basedon this, it is indicated how the result can be further refined.
APA, Harvard, Vancouver, ISO, and other styles
37

Wang, Shenhao. "Deep neural networks for choice analysis." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/129894.

Full text
Abstract:
Thesis: Ph. D. in Computer and Urban Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, September, 2020<br>Cataloged from student-submitted PDF of thesis.<br>Includes bibliographical references (pages 117-128).<br>As deep neural networks (DNNs) outperform classical discrete choice models (DCMs) in many empirical studies, one pressing question is how to reconcile them in the context of choice analysis. So far researchers mainly compare their prediction accuracy, treating them as completely different modeling methods. However, DNNs and classical choice models are closely related and even complementary. This dissertation seeks to lay out a new foundation of using DNNs for choice analysis. It consists of three essays, which respectively tackle the issues of economic interpretation, architectural design, and robustness of DNNs by using classical utility theories. Essay 1 demonstrates that DNNs can provide economic information as complete as the classical DCMs.<br>The economic information includes choice predictions, choice probabilities, market shares, substitution patterns of alternatives, social welfare, probability derivatives, elasticities, marginal rates of substitution (MRS), and heterogeneous values of time (VOT). Unlike DCMs, DNNs can automatically learn the utility function and reveal behavioral patterns that are not prespecified by modelers. However, the economic information from DNNs can be unreliable because the automatic learning capacity is associated with three challenges: high sensitivity to hyperparameters, model non-identification, and local irregularity. To demonstrate the strength of DNNs as well as the three issues, I conduct an empirical experiment by applying the DNNs to a stated preference survey and discuss successively the full list of economic information extracted from the DNNs. Essay 2 designs a particular DNN architecture with alternative-specific utility functions (ASU-DNN) by using prior behavioral knowledge.<br>Theoretically, ASU-DNN reduces the estimation error of fully connected DNN (F-DNN) because of its lighter architecture and sparser connectivity, although the constraint of alternative-specific utility could cause ASU-DNN to exhibit a larger approximation error. Both ASU-DNN and F-DNN can be treated as special cases of DNN architecture design guided by utility connectivity graph (UCG). Empirically, ASU-DNN has 2-3% higher prediction accuracy than F-DNN. The alternative-specific connectivity constraint, as a domain-knowledge- based regularization method, is more effective than other regularization methods. This essay demonstrates that prior behavioral knowledge can be used to guide the architecture design of DNN, to function as an effective domain-knowledge-based regularization method, and to improve both the interpretability and predictive power of DNNs in choice analysis.<br>Essay 3 designs a theory-based residual neural network (TB-ResNet) with a two-stage training procedure, which synthesizes decision-making theories and DNNs in a linear manner. Three instances of TB-ResNets based on choice modeling (CM-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting (HD-ResNets) are designed. Empirically, compared to the decision-making theories, the three instances of TB-ResNets predict significantly better in the out-of-sample test and become more interpretable owing to the rich utility function augmented by DNNs. Compared to the DNNs, the TB-ResNets predict better because the decision-making theories aid in localizing and regularizing the DNN models. TB-ResNets also become more robust than DNNs because the decision-making theories stablize the local utility function and the input gradients.<br>This essay demonstrates that it is both feasible and desirable to combine the handcrafted utility theory and automatic utility specification, with joint improvement in prediction, interpretation, and robustness.<br>by Shenhao Wang.<br>Ph. D. in Computer and Urban Science<br>Ph.D.inComputerandUrbanScience Massachusetts Institute of Technology, Department of Urban Studies and Planning
APA, Harvard, Vancouver, ISO, and other styles
38

Sunnegårdh, Christina. "Scar detection using deep neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299576.

Full text
Abstract:
Object detection is a computer vision method that deals with the tasks of localizing and classifying objects within an image. The number of usages for the method is constantly growing, and this thesis investigates the unexplored area of using deep neural networks for scar detection. Furthermore, the thesis investigates using the scar detector as a basis for the binary classification task of deciding whether in-the-wild images contains a scar or not. Two pre-trained object detection models, Faster R-CNN and RetinaNet, were trained on 1830 manually labeled images using different hyperparameters. Faster R-CNN Inception ResNet V2 achieved the highest results in terms of Average Precision (AP), particularly at higher IoU thresholds, closely followed by Faster R-CNN ResNet50, and finally RetinaNet. The results both indicate the superiority of Faster R-CNN compared to RetinaNet, as well as using Inception ResNet V2 as feature extractor for a large variety of object sizes. The reason is most likely due to multiple convolutional filters of different sizes operating at the same levels in the Inception ResNet network. As for inference time, RetinaNet was the fastest, followed by Faster R-CNN ResNet50 and finally Faster R-CNN Inception ResNet V2. For the binary classification task, the models were tested on a set of 200 images, where half of the images contained clearly visible scars. Faster R-CNN ResNet50 achieved the highest accuracy, followed by Faster R-CNN Inception ResNet V2 and finally RetinaNet. While the accuracy of RetinaNet suffered mainly from a low recall, Faster R-CNN Inception ResNet V2 detected some actual scars in images that had not been labeled due to low image quality, which could be a matter of subjective labeling and that the model is punished for something that at other times might be considered correct. In conclusion, this thesis shows promising results of using object detection to detect scars in images. While two-stage Faster R-CNN holds the advantage in AP for scar detection, one-stage RetinaNet holds the advantage in speed. Suggestions for future work include eliminating biases by putting more effort into labeling data as well as including training data that contain objects for which the models produced false positives. Examples of this are wounds, knuckles, and possible background objects that are visually similar to scars.<br>Objektdetektion är en metod inom datorseende som inkluderar både lokalisering och klassificering av objekt i bilder. Antalet användningsområden för metoden växer ständigt och denna studie undersöker det outforskade området av att använda djupa neurala nätverk för detektering av ärr. Studien utforskar även att använda detektering av ärr som grund för den binära klassificeringsuppgiften att bestämma om bilder innehåller ett synligt ärr eller inte. Två förtränade objektdetekteringsmodeller, Faster R-CNN och RetinaNet, tränades med olika hyperparametrar på 1830 manuellt märkta bilder. Faster RCNN Inception ResNet V2 uppnådde bäst resultat med avseende på average precision (AP), tätt följd av Faster R-CNN ResNet50 och slutligen RetinaNet. Resultatet indikerar både överlägsenhet av Faster R-CNN gentemot RetinaNet, såväl som att använda Inception ResNet V2 för särdragsextrahering. Detta beror med stor sannolikhet på dess användning av faltningsfilter i flera storlekar på samma nivåer i nätverket. Gällande detekteringstid per bild var RetinaNet snabbast, följd av Faster R-CNN ResNet50 och slutligen Faster R-CNN Inception ResNet V2. För den binära klassificeringsuppgiften testades modellerna på 200 bilder, där hälften av bilderna innehöll tydligt synliga ärr. Faster RCNN ResNet50 uppnådde högst träffsäkerhet, följt av Faster R-CNN Inception ResNet V2 och till sist RetinaNet. Medan träffsäkerheten för RetinaNet huvudsakligen bestraffades på grund av att ha förbisett ärr i bilder, så detekterade Faster R-CNN Inception ResNet V2 ett flertal faktiska ärr som inte datamärkts på grund av bristande bildkvalitet. Detta kan dock vara en fråga om subjektiv datamärkning och att modellen bestraffas för något som andra gånger skulle kunna anses korrekt. Sammanfattningsvis visar denna studie lovande resultat av att använda objektdetektion för att detektera ärr i bilder. Medan tvåstegsmodellen Faster R-CNN har övertaget sett till AP, har enstegsmodellen RetinaNet övertaget sett till detekteringstid. Förslag för framtida arbete inkluderar att lägga större vikt vid märkning av data för att eliminera potentiell subjektivitet, samt inkludera träningsdata innehållande objekt som modellerna misstog för ärr. Exempel på detta är öppna sår, knogar och bakgrundsobjekt som visuellt liknar ärr.
APA, Harvard, Vancouver, ISO, and other styles
39

Landeen, Trevor J. "Association Learning Via Deep Neural Networks." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7028.

Full text
Abstract:
Deep learning has been making headlines in recent years and is often portrayed as an emerging technology on a meteoric rise towards fully sentient artificial intelligence. In reality, deep learning is the most recent renaissance of a 70 year old technology and is far from possessing true intelligence. The renewed interest is motivated by recent successes in challenging problems, the accessibility made possible by hardware developments, and dataset availability. The predecessor to deep learning, commonly known as the artificial neural network, is a computational network setup to mimic the biological neural structure found in brains. However, unlike human brains, artificial neural networks, in most cases cannot make inferences from one problem to another. As a result, developing an artificial neural network requires a large number of examples of desired behavior for a specific problem. Furthermore, developing an artificial neural network capable of solving the problem can take days, or even weeks, of computations. Two specific problems addressed in this dissertation are both input association problems. One problem challenges a neural network to identify overlapping regions in images and is used to evaluate the ability of a neural network to learn associations between inputs of similar types. The other problem asks a neural network to identify which observed wireless signals originated from observed potential sources and is used to assess the ability of a neural network to learn associations between inputs of different types. The neural network solutions to both problems introduced, discussed, and evaluated in this dissertation demonstrate deep learning’s applicability to problems which have previously attracted little attention.
APA, Harvard, Vancouver, ISO, and other styles
40

Srivastava, Sanjana. "On foveation of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123134.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.<br>Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019<br>Cataloged from student-submitted PDF version of thesis.<br>Includes bibliographical references (pages 61-63).<br>The human ability to recognize objects is impaired when the object is not shown in full. "Minimal images" are the smallest regions of an image that remain recognizable for humans. [26] show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of- the-art convolutional neural networks (CNNs), and are much more prominent in CNNs. We found many cases where CNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in CNNs. Our results thus reveal a new failure mode of CNNs that also affects humans to a lesser degree. They expose how fragile CNN recognition ability is for natural images even without synthetic adversarial patterns being introduced. This opens potential for CNN robustness in natural images to be brought to the human level by taking inspiration from human robustness methods. One of these is eccentricity dependence, a model of human focus in which attention to the visual input degrades proportional to distance from the focal point [7]. We demonstrate that applying the "inverted pyramid" eccentricity method, a multi-scale input transformation, makes CNNs more robust to useless background features than a standard raw-image input. Our results also find that using the inverted pyramid method generally reduces useless background pixels, therefore reducing required training data.<br>by Sanjana Srivastava.<br>M. Eng.<br>M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
41

Grechka, Asya. "Image editing with deep neural networks." Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS683.pdf.

Full text
Abstract:
L'édition d'images a une histoire riche remontant à plus de deux siècles. Cependant, l'édition "classique" des images requiert une grande maîtrise artistique et nécessitent un temps considérable, souvent plusieurs heures, pour modifier chaque image. Ces dernières années, d'importants progrès dans la modélisation générative ont permis la synthèse d'images réalistes et de haute qualité. Toutefois, l'édition d'une image réelle est un vrai défi nécessitant de synthétiser de nouvelles caractéristiques tout en préservant fidèlement une partie de l'image d'origine. Dans cette thèse, nous explorons différentes approches pour l'édition d'images en exploitant trois familles de modèles génératifs : les GANs, les auto-encodeurs variationnels et les modèles de diffusion. Tout d'abord, nous étudions l'utilisation d'un GAN pré-entraîné pour éditer une image réelle. Bien que des méthodes d'édition d'images générées par des GANs soient bien connues, elles ne se généralisent pas facilement aux images réelles. Nous analysons les raisons de cette limitation et proposons une solution pour mieux projeter une image réelle dans un GAN afin de la rendre éditable. Ensuite, nous utilisons des autoencodeurs variationnels avec quantification vectorielle pour obtenir directement une représentation compacte de l'image (ce qui faisait défaut avec les GANs) et optimiser le vecteur latent de manière à se rapprocher d'un texte souhaité. Nous cherchons à contraindre ce problème, qui pourrait être vulnérable à des exemples adversariaux. Nous proposons une méthode pour choisir les hyperparamètres en fonction de la fidélité et de l'édition des images modifiées. Nous présentons un protocole d'évaluation robuste et démontrons l'intérêt de notre approche. Enfin, nous abordons l'édition d'images sous l'angle particulier de l'inpainting. Notre objectif est de synthétiser une partie de l'image tout en préservant le reste intact. Pour cela, nous exploitons des modèles de diffusion pré-entraînés et nous appuyons sur la méthode classique d'inpainting en remplaçant, à chaque étape du processus de débruitage, la partie que nous ne souhaitons pas modifier par l'image réelle bruitée. Cependant, cette méthode peut entraîner une désynchronisation entre la partie générée et la partie réelle. Nous proposons une approche basée sur le calcul du gradient d'une fonction qui évalue l'harmonisation entre les deux parties. Nous guidons ainsi le processus de débruitage en utilisant ce gradient<br>Image editing has a rich history which dates back two centuries. That said, "classic" image editing requires strong artistic skills as well as considerable time, often in the scale of hours, to modify an image. In recent years, considerable progress has been made in generative modeling which has allowed realistic and high-quality image synthesis. However, real image editing is still a challenge which requires a balance between novel generation all while faithfully preserving parts of the original image. In this thesis, we will explore different approaches to edit images, leveraging three families of generative networks: GANs, VAEs and diffusion models. First, we study how to use a GAN to edit a real image. While methods exist to modify generated images, they do not generalize easily to real images. We analyze the reasons for this and propose a solution to better project a real image into the GAN's latent space so as to make it editable. Then, we use variational autoencoders with vector quantification to directly obtain a compact image representation (which we could not obtain with GANs) and optimize the latent vector so as to match a desired text input. We aim to constrain this problem, which on the face could be vulnerable to adversarial attacks. We propose a method to chose the hyperparameters while optimizing simultaneously the image quality and the fidelity to the original image. We present a robust evaluation protocol and show the interest of our method. Finally, we abord the problem of image editing from the view of inpainting. Our goal is to synthesize a part of an image while preserving the rest unmodified. For this, we leverage pre-trained diffusion models and build off on their classic inpainting method while replacing, at each denoising step, the part which we do not wish to modify with the noisy real image. However, this method leads to a disharmonization between the real and generated parts. We propose an approach based on calculating a gradient of a loss which evaluates the harmonization of the two parts. We guide the denoising process with this gradient
APA, Harvard, Vancouver, ISO, and other styles
42

Chen, Zhe. "Augmented Context Modelling Neural Networks." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20654.

Full text
Abstract:
Contexts provide beneficial information for machine-based image understanding tasks. However, existing context modelling methods still cannot fully exploit contexts, especially for object recognition and detection. In this thesis, we develop augmented context modelling neural networks to better utilize contexts for different object recognition and detection tasks. Our contributions are two-fold: 1) we introduce neural networks to better model instance-level visual relationships; 2) we introduce neural network-based algorithms to better utilize contexts from 3D information and synthesized data. In particular, to augment the modelling of instance-level visual relationships, we propose a context refinement network and an encapsulated context modelling network for object detection. In the context refinement study, we propose to improve the modeling of visual relationships by introducing overlap scores and confidence scores of different regions. In addition, in the encapsulated context modelling study, we boost the context modelling performance by exploiting the more powerful capsule-based neural networks. To augment the modeling of contexts from different sources, we propose novel neural networks to better utilize 3D information and synthesis-based contexts. For the modelling of 3D information, we mainly investigate the modelling of LiDAR data for road detection and the depth data for instance segmentation, respectively. In road detection, we develop a progressive LiDAR adaptation algorithm to improve the fusion of 3D LiDAR data and 2D image data. Regarding instance segmentation, we model depth data as context to help tackle the low-resolution annotation-based training problem. Moreover, to improve the modelling of synthesis-based contexts, we devise a shape translation-based pedestrian generation framework to help improve the pedestrian detection performance.
APA, Harvard, Vancouver, ISO, and other styles
43

Habibi, Aghdam Hamed. "Understanding Road Scenes using Deep Neural Networks." Doctoral thesis, Universitat Rovira i Virgili, 2018. http://hdl.handle.net/10803/461607.

Full text
Abstract:
La comprensió de les escenes de la carretera és fonamental per als automòbils autònoms. Això requereix segmentar escenes de carreteres en regions semànticament significatives i reconèixer objectes en una escena. Tot i que objectes com ara cotxes i vianants han de segmentar-se amb precisió, és possible que no sigui necessari detectar i localitzar aquests objectes en una escena. Tanmateix, detectar i classificar objectes com ara els senyals de trànsit és fonamental per ajustar-se a les regles del camí. En aquesta tesi, primer proposem un mètode per classificar senyals de trànsit amb atributs visuals i xarxes bayesianes. A continuació, proposem dues xarxes neuronals per a aquest propòsit i desenvolupem un nou mètode per crear un conjunt de models. A continuació, estudiem la sensibilitat de les xarxes neuronals contra mostres adversàries i proposem dues xarxes de denoising que s'adjunten a les xarxes de classificació per augmentar la seva estabilitat contra el soroll. A la segona part de la tesi, primer proposem una xarxa per detectar senyals de trànsit en imatges d'alta resolució en temps real i mostrar com implementar la tècnica de la finestra d'escaneig dins de la nostra xarxa utilitzant convolucions dilatades. A continuació, formulem el problema de detecció com a problema de segmentació i proposem una xarxa totalment convolucional per detectar senyals de trànsit. ? Finalment, proposem una nova xarxa totalment convolucional composta de mòduls de foc, connexions de derivació i convolucions consecutives dilatades? En l'última part de la tesi per a escenes de camins segmentinc en regions semànticament significatives i demostrar que és més accentuat i computacionalment més eficient en comparació amb xarxes similars<br>Comprender las escenas de la carretera es crucial para los automóviles autónomos. Esto requiere segmentar escenas de carretera en regiones semánticamente significativas y reconocer objetos en una escena. Mientras que los objetos tales como coches y peatones tienen que segmentarse con precisión, puede que no sea necesario detectar y localizar estos objetos en una escena. Sin embargo, la detección y clasificación de objetos tales como señales de tráfico es esencial para ajustarse a las reglas de la carretera. En esta tesis, proponemos un método para la clasificación de señales de tráfico utilizando atributos visuales y redes bayesianas. A continuación, proponemos dos redes neuronales para este fin y desarrollar un nuevo método para crear un conjunto de modelos. A continuación, se estudia la sensibilidad de las redes neuronales frente a las muestras adversarias y se proponen dos redes destructoras que se unen a las redes de clasificación para aumentar su estabilidad frente al ruido. En la segunda parte de la tesis, proponemos una red para detectar señales de tráfico en imágenes de alta resolución en tiempo real y mostrar cómo implementar la técnica de ventana de escaneo dentro de nuestra red usando circunvoluciones dilatadas. A continuación, formulamos el problema de detección como un problema de segmentación y proponemos una red completamente convolucional para detectar señales de tráfico. Finalmente, proponemos una nueva red totalmente convolucional compuesta de módulos de fuego, conexiones de bypass y circunvoluciones consecutivas dilatadas en la última parte de la tesis para escenarios de carretera segmentinc en regiones semánticamente significativas y muestran que es más accuarate y computacionalmente más eficiente en comparación con redes similares<br>Understanding road scenes is crucial for autonomous cars. This requires segmenting road scenes into semantically meaningful regions and recognizing objects in a scene. While objects such as cars and pedestrians has to be segmented accurately, it might not be necessary to detect and locate these objects in a scene. However, detecting and classifying objects such as traffic signs is essential for conforming to road rules. In this thesis, we first propose a method for classifying traffic signs using visual attributes and Bayesian networks. Then, we propose two neural network for this purpose and develop a new method for creating an ensemble of models. Next, we study sensitivity of neural networks against adversarial samples and propose two denoising networks that are attached to the classification networks to increase their stability against noise. In the second part of the thesis, we first propose a network to detect traffic signs in high-resolution images in real-time and show how to implement the scanning window technique within our network using dilated convolutions. Then, we formulate the detection problem as a segmentation problem and propose a fully convolutional network for detecting traffic signs. Finally, we propose a new fully convolutional network composed of fire modules, bypass connections and consecutive dilated convolutions in the last part of the thesis for segmenting road scenes into semantically meaningful regions and show that it is more accurate and computationally more efficient compared to similar networks.
APA, Harvard, Vancouver, ISO, and other styles
44

Antoniades, Andreas. "Interpreting biomedical data via deep neural networks." Thesis, University of Surrey, 2018. http://epubs.surrey.ac.uk/845765/.

Full text
Abstract:
Machine learning technology has taken quantum leaps in the past few years. From the rise of voice recognition as an interface to interact with our computers, to self-organising photo albums and self-driving cars. Neural networks and deep learning contributed significantly to drive this revolution. Yet, biomedicine is one of the research areas that has yet to fully embrace the possibilities of deep learning. Engaged in a cross-disciplinary subject, researchers, and clinical experts are focused on machine learning and statistical signal processing techniques. The ability to learn hierarchical features makes deep learning models highly applicable to biomedicine and researchers have started to notice this. The first works of deep learning in biomedicine are emerging with applications in diagnostics and genomics analysis. These models offer excellent accuracy, even comparable to that of human doctors. Despite the exceptional classification performance of these models, they are still used to provide \textit{quantitative} results. Diagnosing cancer proficiently and faster than a human doctor is beneficial, but automatically finding which biomarkers indicate the existence of cancerous cells would be invaluable. This type of \textit{qualitative} insight can be enabled by the hierarchical features and learning coefficients that manifest in deep models. It is this \textit{qualitative} approach that enables the interpretability of data and explainability of neural networks for biomedicine, which is the overarching aim of this thesis. As such, the aim of this thesis is to investigate the use of neural networks and deep learning models for the qualitative assessment of biomedical datasets. The first contribution is the proposition of a non-iterative, data agnostic feature selection algorithm to retain original features and provide qualitative analysis on their importance. This algorithm is employed in numerous areas including Pima Indian diabetes and children tumour detection. Next, the thesis focuses on the topic of epilepsy studied through scalp and intracranial electroencephalogram recordings of human brain activity. The second contribution promotes the use of deep learning models for the automatic generation of clinically meaningful features, as opposed to traditional handcrafted features. Convolutional neural networks are adapted to accommodate the intricacies of electroencephalogram data and trained to detect epileptiform discharges. The learning coefficients of these models are examined and found to contain clinically significant features. When combined, in a hierarchical way, these features reveal useful insights for the evaluation of treatment effectivity. The final contribution addresses the difficulty in acquiring intracranial data due to the invasive nature of the recording procedure. A non-linear brain mapping algorithm is proposed to link the electrical activities recorded on the scalp to those inside the cranium. This process improves the generalisation of models and alleviates the need for surgical procedures. %This is accomplished via an asymmetric autoencoder that accounts for differences in the dimensionality of the electroencephalogram data and improves the quality of the data.
APA, Harvard, Vancouver, ISO, and other styles
45

Tavanaei, Amirhossein. "Spiking Neural Networks and Sparse Deep Learning." Thesis, University of Louisiana at Lafayette, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10807940.

Full text
Abstract:
<p> This document proposes new methods for training multi-layer and deep spiking neural networks (SNNs), specifically, spiking convolutional neural networks (CNNs). Training a multi-layer spiking network poses difficulties because the output spikes do not have derivatives and the commonly used backpropagation method for non-spiking networks is not easily applied. Our methods use novel versions of the brain-like, local learning rule named spike-timing-dependent plasticity (STDP) that incorporates supervised and unsupervised components. Our method starts with conventional learning methods and converts them to spatio-temporally local rules suited for SNNs. </p><p> The training uses two components for unsupervised feature extraction and supervised classification. The first component refers to new STDP rules for spike-based representation learning that trains convolutional filters and initial representations. The second introduces new STDP-based supervised learning rules for spike pattern classification via an approximation to gradient descent by combining the STDP and anti-STDP rules. Specifically, the STDP-based supervised learning model approximates gradient descent by using temporally local STDP rules. Stacking these components implements a novel sparse, spiking deep learning model. Our spiking deep learning model is categorized as a variation of spiking CNNs of integrate-and-fire (IF) neurons with performance comparable with the state-of-the-art deep SNNs. The experimental results show the success of the proposed model for image classification. Our network architecture is the only spiking CNN which provides bio-inspired STDP rules in a hierarchy of feature extraction and classification in an entirely spike-based framework.</p><p>
APA, Harvard, Vancouver, ISO, and other styles
46

Avramova, Vanya. "Curriculum Learning with Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-178453.

Full text
Abstract:
Curriculum learning is a machine learning technique inspired by the way humans acquire knowledge and skills: by mastering simple concepts first, and progressing through information with increasing difficulty to grasp more complex topics. Curriculum Learning, and its derivatives Self Paced Learning (SPL) and Self Paced Learning with Diversity (SPLD), have been previously applied within various machine learning contexts: Support Vector Machines (SVMs), perceptrons, and multi-layer neural networks, where they have been shown to improve both training speed and model accuracy. This project ventured to apply the techniques within the previously unexplored context of deep learning, by investigating how they affect the performance of a deep convolutional neural network (ConvNet) trained on a large labeled image dataset. The curriculum was formed by presenting the training samples to the network in order of increasing difficulty, measured by the sample's loss value based on the network's objective function. The project evaluated SPL and SPLD, and proposed two new curriculum learning sub-variants, p-SPL and p-SPLD, which allow for a smooth progresson of sample inclusion during training. The project also explored the "inversed" versions of the SPL, SPLD, p-SPL and p-SPLD techniques, where the samples were selected for the curriculum in order of decreasing difficulty. The experiments demonstrated that all learning variants perform fairly similarly, within ≈1% average test accuracy margin, based on five trained models per variant. Surprisingly, models trained with the inversed version of the algorithms performed slightly better than the standard curriculum training variants. The SPLD-Inversed, SPL-Inversed and SPLD networks also registered marginally higher accuracy results than the network trained with the usual random sample presentation. The results suggest that while sample ordering does affect the training process, the optimal order in which samples are presented may vary based on the data set and algorithm used. The project also investigated whether some samples were more beneficial for the training process than others. Based on sample difficulty, subsets of samples were removed from the training data set. The models trained on the remaining samples were compared to a default model trained on all samples. On the data set used, removing the “easiest” 10% of samples had no effect on the achieved test accuracy compared to the default model, and removing the “easiest” 40% of samples reduced model accuracy by only ≈1% (compared to ≈6% loss when 40% of the "most difficult" samples were removed, and ≈3% loss when 40% of samples were randomly removed). Taking away the "easiest" samples first (up to a certain percentage of the data set) affected the learning process less negatively than removing random samples, while removing the "most difficult" samples first had the most detrimental effect. The results suggest that the networks derived most learning value from the "difficult" samples, and that a large subset of the "easiest" samples can be excluded from training with minimal impact on the attained model accuracy. Moreover, it is possible to identify these samples early during training, which can greatly reduce the training time for these models.
APA, Harvard, Vancouver, ISO, and other styles
47

Karlsson, Daniel. "Classifying sport videos with deep neural networks." Thesis, Umeå universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-130654.

Full text
Abstract:
This project aims to apply deep neural networks to classify video clips in applications used to streamline advertisements on the web. The system focuses on sport clips but can be expanded into other advertisement fields with lower accuracy and longer training times as a consequence. The main task was to find the neural network model best suited for classifying videos. To achieve this the field was researched and three network models were introduced to see how they could handle the videos. It was proposed that applying a recurrent LSTM structure at the end of an image classification network could make it well adapted to work with videos. The most popular image classification architectures are mostly convolutional neural networks and these structures are also the foundation of all three models. The results from the evaluation of the models as well as the research suggests that using a convolutional LSTM can bean efficient and powerful way of classifying videos. Further this project shows that by reducing the size of the input data with 25%, the training and evaluation time can be cut with around 50%. This comes at the cost of lower accuracy. However it is demonstrated that the performance loss can be compensated by considering more frames from the same videos during evaluation.
APA, Harvard, Vancouver, ISO, and other styles
48

Peng, Zeng. "Pedestrian Tracking by using Deep Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302107.

Full text
Abstract:
This project aims at using deep learning to solve the pedestrian tracking problem for Autonomous driving usage. The research area is in the domain of computer vision and deep learning. Multi-Object Tracking (MOT) aims at tracking multiple targets simultaneously in a video data. The main application scenarios of MOT are security monitoring and autonomous driving. In these scenarios, we often need to track many targets at the same time which is not possible with only object detection or single object tracking algorithms for their lack of stability and usability. Therefore we need to explore the area of multiple object tracking. The proposed method breaks the MOT into different stages and utilizes the motion and appearance information of targets to track them in the video data. We used three different object detectors to detect the pedestrians in frames, a person re-identification model as appearance feature extractor and Kalman filter as motion predictor. Our proposed model achieves 47.6% MOT accuracy and 53.2% in IDF1 score while the results obtained by the model without person re-identification module is only 44.8% and 45.8% respectively. Our experiment results indicate the fact that a robust multiple object tracking algorithm can be achieved by splitted tasks and improved by the representative DNN based appearance features.<br>Detta projekt syftar till att använda djupinlärning för att lösa problemet med att följa fotgängare för autonom körning. For ligger inom datorseende och djupinlärning. Multi-Objekt-följning (MOT) syftar till att följa flera mål samtidigt i videodata. de viktigaste applikationsscenarierna för MOT är säkerhetsövervakning och autonom körning. I dessa scenarier behöver vi ofta följa många mål samtidigt, vilket inte är möjligt med endast objektdetektering eller algoritmer för enkel följning av objekt för deras bristande stabilitet och användbarhet, därför måste utforska området för multipel objektspårning. Vår metod bryter MOT i olika steg och använder rörelse- och utseendinformation för mål för att spåra dem i videodata, vi använde tre olika objektdetektorer för att upptäcka fotgängare i ramar en personidentifieringsmodell som utseendefunktionsavskiljare och Kalmanfilter som rörelsesprediktor. Vår föreslagna modell uppnår 47,6 % MOT-noggrannhet och 53,2 % i IDF1 medan resultaten som erhållits av modellen utan personåteridentifieringsmodul är endast 44,8%respektive 45,8 %. Våra experimentresultat visade att den robusta algoritmen för multipel objektspårning kan uppnås genom delade uppgifter och förbättras av de representativa DNN-baserade utseendefunktionerna.
APA, Harvard, Vancouver, ISO, and other styles
49

Milner, Rosanna Margaret. "Using deep neural networks for speaker diarisation." Thesis, University of Sheffield, 2016. http://etheses.whiterose.ac.uk/16567/.

Full text
Abstract:
Speaker diarisation answers the question “who spoke when?” in an audio recording. The input may vary, but a system is required to output speaker labelled segments in time. Typical stages are Speech Activity Detection (SAD), speaker segmentation and speaker clustering. Early research focussed on Conversational Telephone Speech (CTS) and Broadcast News (BN) domains before the direction shifted to meetings and, more recently, broadcast media. The British Broadcasting Corporation (BBC) supplied data through the Multi-Genre Broadcast (MGB) Challenge in 2015 which showed the difficulties speaker diarisation systems have on broadcast media data. Diarisation is typically an unsupervised task which does not use auxiliary data or information to enhance a system. However, methods which do involve supplementary data have shown promise. Five semi-supervised methods are investigated which use a combination of inputs: different channel types and transcripts. The methods involve Deep Neural Networks (DNNs) for SAD, DNNs trained for channel detection, transcript alignment, and combinations of these approaches. However, the methods are only applicable when datasets contain the required inputs. Therefore, a method involving a pretrained Speaker Separation Deep Neural Network (ssDNN) is investigated which is applicable to every dataset. This technique performs speaker clustering and speaker segmentation using DNNs successfully for meeting data and with mixed results for broadcast media. The task of diarisation focuses on two aspects: accurate segments and speaker labels. The Diarisation Error Rate (DER) does not evaluate the segmentation quality as it does not measure the number of correctly detected segments. Other metrics exist, such as boundary and purity measures, but these also mask the segmentation quality. An alternative metric is presented based on the F-measure which considers the number of hypothesis segments correctly matched to reference segments. A deeper insight into the segment quality is shown through this metric.
APA, Harvard, Vancouver, ISO, and other styles
50

Karlsson, Jonas. "Auditory Classification of Carsby Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-355673.

Full text
Abstract:
This thesis explores the challenge of using deep neural networks to classify traits incars through sound recognition. These traits could include type of engine, model, or manufacturer of the car. The problem was approached by creating three different neural networks and evaluating their performance in classifying sounds of three different cars. The top scoring neural network achieved an accuracy of 61 percent, which is far from reaching the standard accuracy of modern speech recognition systems. The results do, however, show that there are some tendencies to the data that neural networks can learn. If the methods and networks presented in this report are further built upon, a greater classification performance may be achieved.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography