To see the other types of publications on this topic, follow the link: Convolutional Deep Belief Networks.

Dissertations / Theses on the topic 'Convolutional Deep Belief Networks'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Convolutional Deep Belief Networks.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Liu, Ye. "Application of Convolutional Deep Belief Networks to Domain Adaptation." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1397728737.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nassar, Alaa S. N. "A Hybrid Multibiometric System for Personal Identification Based on Face and Iris Traits. The Development of an automated computer system for the identification of humans by integrating facial and iris features using Localization, Feature Extraction, Handcrafted and Deep learning Techniques." Thesis, University of Bradford, 2018. http://hdl.handle.net/10454/16917.

Full text
Abstract:
Multimodal biometric systems have been widely applied in many real-world applications due to its ability to deal with a number of significant limitations of unimodal biometric systems, including sensitivity to noise, population coverage, intra-class variability, non-universality, and vulnerability to spoofing. This PhD thesis is focused on the combination of both the face and the left and right irises, in a unified hybrid multimodal biometric identification system using different fusion approaches at the score and rank level. Firstly, the facial features are extracted using a novel multimodal local feature extraction approach, termed as the Curvelet-Fractal approach, which based on merging the advantages of the Curvelet transform with Fractal dimension. Secondly, a novel framework based on merging the advantages of the local handcrafted feature descriptors with the deep learning approaches is proposed, Multimodal Deep Face Recognition (MDFR) framework, to address the face recognition problem in unconstrained conditions. Thirdly, an efficient deep learning system is employed, termed as IrisConvNet, whose architecture is based on a combination of Convolutional Neural Network (CNN) and Softmax classifier to extract discriminative features from an iris image. Finally, The performance of the unimodal and multimodal systems has been evaluated by conducting a number of extensive experiments on large-scale unimodal databases: FERET, CAS-PEAL-R1, LFW, CASIA-Iris-V1, CASIA-Iris-V3 Interval, MMU1 and IITD and MMU1, and SDUMLA-HMT multimodal dataset. The results obtained have demonstrated the superiority of the proposed systems compared to the previous works by achieving new state-of-the-art recognition rates on all the employed datasets with less time required to recognize the person’s identity.Multimodal biometric systems have been widely applied in many real-world applications due to its ability to deal with a number of significant limitations of unimodal biometric systems, including sensitivity to noise, population coverage, intra-class variability, non-universality, and vulnerability to spoofing. This PhD thesis is focused on the combination of both the face and the left and right irises, in a unified hybrid multimodal biometric identification system using different fusion approaches at the score and rank level. Firstly, the facial features are extracted using a novel multimodal local feature extraction approach, termed as the Curvelet-Fractal approach, which based on merging the advantages of the Curvelet transform with Fractal dimension. Secondly, a novel framework based on merging the advantages of the local handcrafted feature descriptors with the deep learning approaches is proposed, Multimodal Deep Face Recognition (MDFR) framework, to address the face recognition problem in unconstrained conditions. Thirdly, an efficient deep learning system is employed, termed as IrisConvNet, whose architecture is based on a combination of Convolutional Neural Network (CNN) and Softmax classifier to extract discriminative features from an iris image. Finally, The performance of the unimodal and multimodal systems has been evaluated by conducting a number of extensive experiments on large-scale unimodal databases: FERET, CAS-PEAL-R1, LFW, CASIA-Iris-V1, CASIA-Iris-V3 Interval, MMU1 and IITD and MMU1, and SDUMLA-HMT multimodal dataset. The results obtained have demonstrated the superiority of the proposed systems compared to the previous works by achieving new state-of-the-art recognition rates on all the employed datasets with less time required to recognize the person’s identity.
Higher Committee for Education Development in Iraq
APA, Harvard, Vancouver, ISO, and other styles
3

Mancevo, del Castillo Ayala Diego. "Compressing Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217316.

Full text
Abstract:
Deep Convolutional Neural Networks and "deep learning" in general stand at the cutting edge on a range of applications, from image based recognition and classification to natural language processing, speech and speaker recognition and reinforcement learning. Very deep models however are often large, complex and computationally expensive to train and evaluate. Deep learning models are thus seldom deployed natively in environments where computational resources are scarce or expensive. To address this problem we turn our attention towards a range of techniques that we collectively refer to as "model compression" where a lighter student model is trained to approximate the output produced by the model we wish to compress. To this end, the output from the original model is used to craft the training labels of the smaller student model. This work contains some experiments on CIFAR-10 and demonstrates how to use the aforementioned techniques to compress a people counting model whose precision, recall and F1-score are improved by as much as 14% against our baseline.
APA, Harvard, Vancouver, ISO, and other styles
4

Faulkner, Ryan. "Dyna learning with deep belief networks." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=97177.

Full text
Abstract:
The objective of reinforcement learning is to find "good" actions in an environment where feedback is provided through a numerical reward, and the current state (i.e. sensory input) is assumed to be available at each time step. The notion of "good" is defined as maximizing the expected cumulative returns over time. Sometimes it is useful to construct models of the environment to aid in solving the problem. We investigate Dyna-style reinforcement learning, a powerful approach for problems where not much real data is available. The main idea is to supplement real trajectories with simulated ones sampled from a learned model of the environment. However, in large state spaces, the problem of learning a good generative model of the environment has been open so far. We propose to use deep belief networks to learn an environment model. Deep belief networks (Hinton, 2006) are generative models that have been effective in learning the time dependency relationships among complex data. It has been shown that such models can be learned in a reasonable amount of time when they are built using energy models. We present our algorithm for using deep belief networks as a generative model for simulating the environment within the Dyna architecture, along with very promising empirical results.
L'objectif de l'apprentissage par renforcement est de choisir de bonnes actions dansun environnement où les informations sont fournies par une récompense numérique, etl'état actuel (données sensorielles) est supposé être disponible à chaque pas de temps. Lanotion de "correct" est définie comme étant la maximisation des rendements attendus cumulatifsdans le temps. Il est parfois utile de construire des modèles de l'environnementpour aider à résoudre le problème. Nous étudions l'apprentissage par renforcement destyleDyna, une approche performante dans les situations où les données réelles disponiblesne sont pas nombreuses. L'idée principale est de compléter les trajectoires réelles aveccelles simulées échantillonnées partir d'un modèle appri de l'environnement. Toutefois,dans les domaines à plusieurs états, le problème de l'apprentissage d'un bon modèlegénératif de l'environnement est jusqu'à présent resté ouvert. Nous proposons d'utiliserles réseaux profonds de croyance pour apprendre un modèle de l'environnement. Lesréseaux de croyance profonds (Hinton, 2006) sont des modèles génératifs qui sont efficaces pourl'apprentissage des relations de dépendance temporelle parmi des données complexes. Ila été démontré que de tels modèles peuvent être appris dans un laps de temps raisonnablequand ils sont construits en utilisant des modèles de l'énergie. Nous présentons notre algorithmepour l'utilisation des réseaux de croyance profonds en tant que modèle génératifpour simuler l'environnement dans l'architecture Dyna, ainsi que des résultats empiriquesprometteurs.
APA, Harvard, Vancouver, ISO, and other styles
5

Avramova, Vanya. "Curriculum Learning with Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-178453.

Full text
Abstract:
Curriculum learning is a machine learning technique inspired by the way humans acquire knowledge and skills: by mastering simple concepts first, and progressing through information with increasing difficulty to grasp more complex topics. Curriculum Learning, and its derivatives Self Paced Learning (SPL) and Self Paced Learning with Diversity (SPLD), have been previously applied within various machine learning contexts: Support Vector Machines (SVMs), perceptrons, and multi-layer neural networks, where they have been shown to improve both training speed and model accuracy. This project ventured to apply the techniques within the previously unexplored context of deep learning, by investigating how they affect the performance of a deep convolutional neural network (ConvNet) trained on a large labeled image dataset. The curriculum was formed by presenting the training samples to the network in order of increasing difficulty, measured by the sample's loss value based on the network's objective function. The project evaluated SPL and SPLD, and proposed two new curriculum learning sub-variants, p-SPL and p-SPLD, which allow for a smooth progresson of sample inclusion during training. The project also explored the "inversed" versions of the SPL, SPLD, p-SPL and p-SPLD techniques, where the samples were selected for the curriculum in order of decreasing difficulty. The experiments demonstrated that all learning variants perform fairly similarly, within ≈1% average test accuracy margin, based on five trained models per variant. Surprisingly, models trained with the inversed version of the algorithms performed slightly better than the standard curriculum training variants. The SPLD-Inversed, SPL-Inversed and SPLD networks also registered marginally higher accuracy results than the network trained with the usual random sample presentation. The results suggest that while sample ordering does affect the training process, the optimal order in which samples are presented may vary based on the data set and algorithm used. The project also investigated whether some samples were more beneficial for the training process than others. Based on sample difficulty, subsets of samples were removed from the training data set. The models trained on the remaining samples were compared to a default model trained on all samples. On the data set used, removing the “easiest” 10% of samples had no effect on the achieved test accuracy compared to the default model, and removing the “easiest” 40% of samples reduced model accuracy by only ≈1% (compared to ≈6% loss when 40% of the "most difficult" samples were removed, and ≈3% loss when 40% of samples were randomly removed). Taking away the "easiest" samples first (up to a certain percentage of the data set) affected the learning process less negatively than removing random samples, while removing the "most difficult" samples first had the most detrimental effect. The results suggest that the networks derived most learning value from the "difficult" samples, and that a large subset of the "easiest" samples can be excluded from training with minimal impact on the attained model accuracy. Moreover, it is possible to identify these samples early during training, which can greatly reduce the training time for these models.
APA, Harvard, Vancouver, ISO, and other styles
6

Ayoub, Issa. "Multimodal Affective Computing Using Temporal Convolutional Neural Network and Deep Convolutional Neural Networks." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39337.

Full text
Abstract:
Affective computing has gained significant attention from researchers in the last decade due to the wide variety of applications that can benefit from this technology. Often, researchers describe affect using emotional dimensions such as arousal and valence. Valence refers to the spectrum of negative to positive emotions while arousal determines the level of excitement. Describing emotions through continuous dimensions (e.g. valence and arousal) allows us to encode subtle and complex affects as opposed to discrete emotions, such as the basic six emotions: happy, anger, fear, disgust, sad and neutral. Recognizing spontaneous and subtle emotions remains a challenging problem for computers. In our work, we employ two modalities of information: video and audio. Hence, we extract visual and audio features using deep neural network models. Given that emotions are time-dependent, we apply the Temporal Convolutional Neural Network (TCN) to model the variations in emotions. Additionally, we investigate an alternative model that combines a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). Given our inability to fit the latter deep model into the main memory, we divide the RNN into smaller segments and propose a scheme to back-propagate gradients across all segments. We configure the hyperparameters of all models using Gaussian processes to obtain a fair comparison between the proposed models. Our results show that TCN outperforms RNN for the recognition of the arousal and valence emotional dimensions. Therefore, we propose the adoption of TCN for emotion detection problems as a baseline method for future work. Our experimental results show that TCN outperforms all RNN based models yielding a concordance correlation coefficient of 0.7895 (vs. 0.7544) on valence and 0.8207 (vs. 0.7357) on arousal on the validation dataset of SEWA dataset for emotion prediction.
APA, Harvard, Vancouver, ISO, and other styles
7

Härenstam-Nielsen, Linus. "Deep Convolutional Networks with Recurrence for Eye-Tracking." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-240608.

Full text
Abstract:
This thesis explores the use of temporally recurrent connections in convolutional neural networks for eye-tracking. We specifically investigate the impact of replacing the convolutional layers in a regular CNN with convolutional LSTMs and replacing the fully connected feature layers with regular RNNs and LSTMs. This requires us to transition from a static single-frame input model to a time-dependent multipleframe input model. Doing so naturally introduces extra complexity to the eye-tracking pipeline, so we highlight the advantages and disadvantages. Our results show that adding LSTM-cells to the convolutional layers and RNN-cells to the feature layers can increase eyetracking performance, but also that LSTM-recurrence in the featurelayers can be detrimental to performance.
Denna uppsats utforskar användandet av minnesceller i faltningsbaserade neuralnätverk för ögonföljning. Vi undersöker specifikt inverkan av att byta ut faltningslager med faltningsbaserade LSTMer och att byta ut de fullt sammankopplade feature-lagren med vanliga RNNer och LSTMer. Vi beskriver hur man bör gå från en statisk modell som tar en bild i taget som input till en tidsberoende modell som tar flera bilder som input. Vi understryker även fördelar och nackdelar med en sådan övergång. Vi visar att LSTM-celler i faltningslagren och RNNceller i featurelagren kan förbättra eye-trackingprestandan, men ävenatt LSTM-celler i featurelagren kan försämra prestandan.
APA, Harvard, Vancouver, ISO, and other styles
8

Larsson, Susanna. "Monocular Depth Estimation Using Deep Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159981.

Full text
Abstract:
For a long time stereo-cameras have been deployed in visual Simultaneous Localization And Mapping (SLAM) systems to gain 3D information. Even though stereo-cameras show good performance, the main disadvantage is the complex and expensive hardware setup it requires, which limits the use of the system. A simpler and cheaper alternative are monocular cameras, however monocular images lack the important depth information. Recent works have shown that having access to depth maps in monocular SLAM system is beneficial since they can be used to improve the 3D reconstruction. This work proposes a deep neural network that predicts dense high-resolution depth maps from monocular RGB images by casting the problem as a supervised regression task. The network architecture follows an encoder-decoder structure in which multi-scale information is captured and skip-connections are used to recover details. The network is trained and evaluated on the KITTI dataset achieving results comparable to state-of-the-art methods. With further development, this network shows good potential to be incorporated in a monocular SLAM system to improve the 3D reconstruction.
APA, Harvard, Vancouver, ISO, and other styles
9

Imbulgoda, Liyangahawatte Gihan Janith Mendis. "Hardware Implementation and Applications of Deep Belief Networks." University of Akron / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=akron1476707730643462.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Caron, Mathilde. "Unsupervised Representation Learning with Clustering in Deep Convolutional Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-227926.

Full text
Abstract:
This master thesis tackles the problem of unsupervised learning of visual representations with deep Convolutional Neural Networks (CNN). This is one of the main actual challenges in image recognition to close the gap between unsupervised and supervised representation learning. We propose a novel and simple way of training CNN on fully unlabeled datasets. Our method jointly optimizes a grouping of the representations and trains a CNN using the groups as supervision. We evaluate the models trained with our method on standard transfer learning experiments from the literature. We find out that our method outperforms all self-supervised and unsupervised state-of-the-art approaches. More importantly, our method outperforms those methods even when the unsupervised training set is not ImageNet but an arbitrary subset of images from Flickr.
Detta examensarbete behandlar problemet med oövervakat lärande av visuella representationer med djupa konvolutionella neurala nätverk (CNN). Detta är en av de viktigaste faktiska utmaningarna i datorseende för att överbrygga klyftan mellan oövervakad och övervakad representationstjänst. Vi föreslår ett nytt och enkelt sätt att träna CNN på helt omärkta dataset. Vår metod består i att tillsammans optimera en gruppering av representationerna och träna ett CNN med hjälp av grupperna som tillsyn. Vi utvärderar modellerna som tränats med vår metod på standardöverföringslärande experiment från litteraturen. Vi finner att vår metod överträffar alla självövervakade och oövervakade, toppmoderna tillvägagångssätt, hur sofistikerade de än är. Ännu viktigare är att vår metod överträffar de metoderna även när den oövervakade träningsuppsättningen inte är ImageNet men en godtycklig delmängd av bilder från Flickr.
APA, Harvard, Vancouver, ISO, and other styles
11

Jonnarth, Arvi. "Camera-Based Friction Estimation with Deep Convolutional Neural Networks." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-355618.

Full text
Abstract:
During recent years, great progress has been made within the field of deep learning, and more specifically, within neural networks. Deep convolutional neural networks (CNN) have been especially successful within image processing in tasks such as image classification and object detection. Car manufacturers, amongst other actors, are starting to realize the potential of deep learning and have begun applying it to autonomous driving. This is not a simple task, and many challenges still lie ahead. A sub-problem, that needs to be solved, is a way of automatically determining the road conditions, including the friction. Since many modern cars are equipped with cameras these days, it is only natural to approach this problem with CNNs. This is what has been done in this thesis. First, a data set is gathered which consists of 37,000 labeled road images that are taken through the front window of a car. Second, CNNs are trained on this data set to classify the friction of a given road. Gathering road images and labeling them with the correct friction is a time consuming and difficult process, and requires human supervision. For this reason, experiments are made on a second data set, which consist of 54,000 simulated images. These images are captured from the racing game World Rally Championship 7 and are used in addition to the real images, to investigate what can be gained from this. Experiments conducted during this thesis show that CNNs are a good approach for the problem of estimating the road friction. The limiting factor, however, is the data set. Not only does the data set need to be much bigger, but it also has to include a much wider variety of driving conditions. Friction is a complex property and depends on many variables, and CNNs are only effective on the type of data that they have been trained on. For these reasons, new data has to be gather by actively seeking different driving conditions in order for this approach to be deployable in practice.
Under de senaste åren har det gjorts stora framsteg inom maskininlärning, särskilt gällande neurala nätverk. Djupa neurala närverk med faltningslager, eller faltningsnätverk (eng. convolutional neural network) har framför allt varit framgångsrika inom bildbehandling i problem så som bildklassificering och objektdetektering. Biltillverkare, bland andra aktörer, har nu börjat att inse potentialen av maskininlärning och påbörjat dess tillämpning inom autonom körning. Detta är ingen enkel uppgift och många utmaningar finns fortfarande framöver. Ett delproblem som måste lösas är ett sätt att automatiskt avgöra väglaget, där friktionen ingår. Eftersom många nya bilar är utrustade med kameror är det naturligt att försöka tackla detta problem med faltningsnätverk, vilket är varför detta har gjorts under detta examensarbete. Först samlar vi in en datamängd beståendes av 37 000 bilder tagna på vägar genom framrutan av en bil. Dessa bilder kategoriseras efter friktionen på vägen. Sedan tränar vi faltningsnätverk på denna datamängd för att klassificera friktionen. Att samla in vägbilder och att kategorisera dessa är en tidskrävande och svår process och kräver mänsklig övervakning. Av denna anledning utförs experiment på en andra datamängd beståendes av 54 000 simulerade bilder. Dessa har blivit insamlade genom spelet World Rally Championship 7 där syftet är att undersöka om prestandan på nätverken kan ökas genom simulerat data och därmed minska kravet på storleken av den riktiga datamängden. De experiment som har utförts under examensarbetet visar på att faltningsnätverk är ett bra tillvägagångssätt för att skatta vägfriktionen. Den begränsande faktorn i det här fallet är datamängden. Datamängden behöver inte bara vara större, men den måste framför allt täcka in ett bredare urval av väglag och väderförhållanden. Friktion är en komplex egenskap och beror på många variabler, och faltningsnätverk är endast effektiva på den typen av data som de har tränats på. Av dessa anledningar behöver ny data samlas in genom att aktivt söka efter nya körförhållanden om detta tillvägagångssätt ska vara tillämpbart i praktiken.
APA, Harvard, Vancouver, ISO, and other styles
12

Uličný, Matej. "Methods for Increasing Robustness of Deep Convolutional Neural Networks." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-29734.

Full text
Abstract:
Recent discoveries uncovered flaws in machine learning algorithms such as deep neural networks. Deep neural networks seem vulnerable to small amounts of non-random noise, created by exploiting the input to output mapping of the network. Applying this noise to an input image drastically decreases classication performance. Such image is referred to as an adversarial example. The purpose of this thesis is to examine how known regularization/robustness methods perform on adversarial examples. The robustness methods: dropout, low-pass filtering, denoising autoencoder, adversarial training and committees have been implemented, combined and tested. For the well-known benchmark, the MNIST (Mixed National Institute of Standards and Technology) dataset, the best combination of robustness methods has been found. Emerged from the results of the experiments, ensemble of models trained on adversarial examples is considered to be the best approach for MNIST. Harmfulness of the adversarial noise and some robustness experiments are demonstrated on CIFAR10 (The Canadian Institute for Advanced Research) dataset as well. Apart from robustness tests, the thesis describes experiments with human classification performance on noisy images and the comparison with performance of deep neural network.
APA, Harvard, Vancouver, ISO, and other styles
13

Oyallon, Edouard. "Analyzing and introducing structures in deep convolutional neural networks." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEE060.

Full text
Abstract:
Cette thèse étudie des propriétés empiriques des réseaux de neurones convolutifs profonds, et en particulier de la transformée en Scattering. En effet, l’analyse théorique de ces derniers est difficile et représente jusqu’à ce jour un défi : les couches successives de neurones ont la capacité de réaliser des opérations complexes, dont la nature est encore inconnue, via des algorithmes d’apprentissages dont les garanties de convergences ne sont pas bien comprises. Pourtant, ces réseaux de neurones sont de formidables outils pour s’attaquer à une grande variété de tâches difficiles telles la classification d’images, ou plus simplement effectuer des prédictions. La transformée de Scattering est un opérateur mathématique, non-linéaire dont les spécifications sont inspirées par les réseaux convolutifs. Dans ce travail, elle est appliquée sur des images naturelles et obtient des résultats compétitifs avec les architectures non-supervisées. En plaçant un réseau de neurones convolutifs supervisés à la suite du Scattering, on obtient des performances compétitives sur ImageNet2012, qui est le plus grand jeux de donnée d’images étiquetées accessibles aux chercheurs. Cela nécessite d’implémenter un algorithme efficace sur carte graphique. Dans un second temps, cette thèse s’intéresse aux propriétés des couches à différentes profondeurs. On montre qu’un phénomène de réduction de dimensionnalité progressif à lieu et on s’intéresse aux propriétés de classifications supervisées lorsqu’on varie des hyper paramètres de ces réseaux. Finalement, on introduit une nouvelle classe de réseaux convolutifs, dont les opérateurs sont structurés par des groupes de symétries du problème de classification
This thesis studies empirical properties of deep convolutional neural networks, and in particular the Scattering Transform. Indeed, the theoretical analysis of the latter is hard and until now remains a challenge: successive layers of neurons have the ability to produce complex computations, whose nature is still unknown, thanks to learning algorithms whose convergence guarantees are not well understood. However, those neural networks are outstanding tools to tackle a wide variety of difficult tasks, like image classification or more formally statistical prediction. The Scattering Transform is a non-linear mathematical operator whose properties are inspired by convolutional networks. In this work, we apply it to natural images, and obtain competitive accuracies with unsupervised architectures. Cascading a supervised neural networks after the Scattering permits to compete on ImageNet2012, which is the largest dataset of labeled images available. An efficient GPU implementation is provided. Then, this thesis focuses on the properties of layers of neurons at various depths. We show that a progressive dimensionality reduction occurs and we study the numerical properties of the supervised classification when we vary the hyper parameters of the network. Finally, we introduce a new class of convolutional networks, whose linear operators are structured by the symmetry groups of the classification task
APA, Harvard, Vancouver, ISO, and other styles
14

Sångberg, Dennis. "Automated Glioma Segmentation in MRI using Deep Convolutional Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-171046.

Full text
Abstract:
Manual segmentation of brain tumours is a time consuming process, results often show high variability, and there is a call for automation in clinical practice. In this thesis the use of deep convolutional networks for automatic glioma segmentation in MRI is investigated. The implemented networks are evaluated on data used in the brain tumor segmentation challenge (BraTS). It is found that 3D convolutional networks generally outperform 2D convolutional networks, and that the best networks can produce segmentations that closely resemble human segmentations. Convolutional networks are also evaluated as feature extractors with linear SVM classifiers on top, and although the sensitivity is improved considerably, the segmentations are heavily oversegmented. The importance of the amount of data available is investigated as well by comparing results from networks trained on both 2013 and the greatly extended 2014 data set, but it is found that the method of producing ground-truth was also a contributing factor. The networks does not beat the previous high-scores on the BraTS data, but several simple improvement areas are identified to take the networks further.
Manuell segmentering av hjärntumörer är en tidskrävande process, segmenteringarna är ofta varierade mellan experter, och automatisk segmentering skulle vara användbart för kliniskt bruk. Den här rapporten undersöker användningen av deep convolutional networks (ConvNets) för automatisk segmentering av gliom i MR-bilder. De implementerade nätverken utvärderas med hjälp av data från brain tumor segmentation challenge (BraTS). Studien finner att 3D-nätverk har generellt bättre resultat än 2D-nätverk, och att de bästa nätverken har förmågan att ge segmenteringar som liknar mänskliga segmenteringar. ConvNets utvärderas också som feature extractors, med linjära SVM som klassificerare. Den här metoden ger segmenteringar med hög känslighet, men är också till hög grad översegmenterade. Vikten av att ha mer träningsdata undersöks också genom att träna på två olika stora dataset, men metoden för att få fram de riktiga segmenteringarna har troligen också stor påverkan på resultatet. Nätverken slår inte de tidigare rekorden på BraTS, men flera viktiga men enkla förbättringsområden är identifierade som potentiellt skulle förbättra resultaten.
APA, Harvard, Vancouver, ISO, and other styles
15

Mattsson, Niklas. "Classification Performance of Convolutional Neural Networks." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-305342.

Full text
Abstract:
The purpose of this thesis is to determine the performance of convolutional neural networks in classifications per millisecond, not training or accuracy, for the GTX960 and the TegraX1. This is done through varying parameters of the convolutional neural networks and using the Python framework Theano's function profiler to measure the time taken for different networks. The results show that increasing any parameter of the convolutional neural network also increases the time required for the classification of an image. The parameters do not punish the network equally, however. Convolutional layers and their depth have a far bigger negative impact on the network's performance than fully-connected layers and the amount of neurons in them. Additionally, the time needed for training the networks does not appear to correlate with the time needed for classification.
APA, Harvard, Vancouver, ISO, and other styles
16

Julin, Fredrik. "Vision based facial emotion detection using deep convolutional neural networks." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-42622.

Full text
Abstract:
Emotion detection, also known as Facial expression recognition, is the art of mapping an emotion to some sort of input data taken from a human. This is a powerful tool to extract valuable information from individuals which can be used as data for many different purposes, ranging from medical conditions such as depression to customer feedback. To be able to solve the problem of facial expression recognition, smaller subtasks are required and all of them together form the complete system to the problem. Breaking down the bigger task at hand, one can think of these smaller subtasks in the form of a pipeline that implements the necessary steps for classification of some input to then give an output in the form of emotion. In recent time with the rise of the art of computer vision, images are often used as input for these systems and have shown great promise to assist in the task of facial expression recognition as the human face conveys the subjects emotional state and contain more information than other inputs, such as text or audio. Many of the current state-of-the-art systems utilize computer vision in combination with another rising field, namely AI, or more specifically deep learning. These proposed methods for deep learning are in many cases using a special form of neural network called convolutional neural network that specializes in extracting information from images. Then performing classification using the SoftMax function, acting as the last part before the output in the facial expression pipeline. This thesis work has explored these methods of utilizing convolutional neural networks to extract information from images and builds upon it by exploring a set of machine learning algorithms that replace the more commonly used SoftMax function as a classifier, in attempts to further increase not only the accuracy but also optimize the use of computational resources. The work also explores different techniques for the face detection subtask in the pipeline by comparing two approaches. One of these approaches is more frequently used in the state-of-the-art and is said to be more viable for possible real-time applications, namely the Viola-Jones algorithm. The other is a deep learning approach using a state-of-the-art convolutional neural network to perform the detection, in many cases speculated to be too computationally intense to run in real-time. By applying a state-of-the-art inspired new developed convolutional neural network together with the SoftMax classifier, the final performance did not reach state-of-the-art accuracy. However, the machine-learning classifiers used shows promise and bypass the SoftMax function in performance in several cases when given a massively smaller number of samples as training. Furthermore, the results given from implementing and testing a pure deep learning approach, using deep learning algorithms for both the detection and classification stages of the pipeline, shows that deep learning might outperform the classic Viola-Jones algorithm in terms of both detection rate and frames per second.
APA, Harvard, Vancouver, ISO, and other styles
17

Jangblad, Markus. "Object Detection in Infrared Images using Deep Convolutional Neural Networks." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-355221.

Full text
Abstract:
In the master thesis about object detection(OD) using deep convolutional neural network(DCNN), the area of OD is being tested when being applied to infrared images(IR). In this thesis the, goal is to use both long wave infrared(LWIR) images and short wave infrared(SWIR) images taken from an airplane in order to train a DCNN to detect runways, Precision Approach Path Indicator(PAPI) lights, and approaching lights. The purpose for detecting these objects in IR images is because IR light transmits better than visible light under certain weather conditions, for example, fog. This system could then help the pilot detect the runway in bad weather. The RetinaNet model architecture was used and modified in different ways to find the best performing model. The models contain parameters that are found during the training process but some parameters, called hyperparameters, need to be determined in advance. A way to automatically find good values of these hyperparameters was also tested. In hyperparameter optimization, the Bayesian optimization method proved to create a model with equally good performance as the best performance acieved by the author using manual hyperparameter tuning. The OD system was implemented using Keras with Tensorflow backend and received a high perfomance (mAP=0.9245) on the test data. The system manages to detect the wanted objects in the images but is expected to perform worse in a general situation since the training data and test data are very similar. In order to further develop this system and to improve performance under general conditions more data is needed from other airfields and under different weather conditions.
APA, Harvard, Vancouver, ISO, and other styles
18

Inkiläinen, V. (Valtteri). "Clustering image sets with features from deep convolutional neural networks." Master's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201910313044.

Full text
Abstract:
Abstract. This thesis compares the results of clustering image sets by features extracted using different layers of a convolutional neural network. The image features were extracted with layers of a pre-trained image classification network which layer weights were trained with ImageNet dataset. Eight image sets were used to test which extracted features achieve the best clustering accuracies. Features from the test image sets were extracted with the layers of the network architecture, and the features were clustered on a layer by layer basis. The clustering accuracies were measured with normalized mutual information (NMI). The results show that the clustering accuracies depend on the characteristic of the image set being clustered. The image sets with more than two image categories had the best NMI scores with the features from the second last layer in the architecture, while the image sets with two categories had different layers give the best NMI scores. Moreover, the image set with blurred images had the best result come from few of the first layers, showing that the current method of selecting the second last layer for feature extraction in pre-trained CNNs is not always optimal.Piirteiden vaikutus kuvaryhmän klusterointiin käyttäen konvoluutioverkolla irroitettuja piirteitä. Tiivistelmä. Tässä työssä vertaillaan kuvajoukkojen klusterointituloksia eri piirteillä. Piirteiden irrotukseen kuvista käytettiin valmiiksi koulutetun konvoluutio neuroverkon eri tasoja. Neuroverkko oli koulutettu kuvaluokitteluun ImageNet datajoukolla. Kahdeksan kuvajoukkoa klusteroitiin eri piirteillä, jotka oli irrotettu neuroverkon eri tasoilla. Näiden kuvajoukkojen klusterointitarkkuus mitattiin parhaan piirreirrotus tason löytämiseksi kullekin kuvajoukolle. Klusteroinnin tulos mitattiin normalisoidulla yhteisen informaation metriikalla (normalized mutual information). Työn tulos osoitti, että klusterointitulos taso tasolta mitatessa riippuu klusteroitavasta kuvajoukosta. Kuvajoukot, jotka sisälsivät kuvia useammasta kuin kahdesta kategoriasta, klusteroituvat parhaiten verkon toiseksi viimeisellä tasolla irrotetuilla piirteillä. Kahden kategorian kuvajoukkojen parhaat klusterointi tulokset tulivat eri tasoilla. Kuvajoukko joka sisälsi kuvia sumeista ja tarkoista kuvista, saavutti parhaat klusterointitulokset piirteillä, jotka oli irrotettu verkon ylemmiltä tasoilta. Tulokset osoittavat, että yleisesti käytetty menetelmä valita valmiiksi koulutetun verkon toiseksi viimeinen taso piirreirrotukseen ei aina anna optimaalista tulosta.
APA, Harvard, Vancouver, ISO, and other styles
19

Schilling, Fabian. "The Effect of Batch Normalization on Deep Convolutional Neural Networks." Thesis, KTH, Centrum för Autonoma System, CAS, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-191222.

Full text
Abstract:
Batch normalization is a recently popularized method for accelerating the training of deep feed-forward neural networks. Apart from speed improvements, the technique reportedly enables the use of higher learning rates, less careful parameter initialization, and saturating nonlinearities. The authors note that the precise effect of batch normalization on neural networks remains an area of further study, especially regarding their gradient propagation. Our work compares the convergence behavior of batch normalized networks with ones that lack such normalization. We train both a small multi-layer perceptron and a deep convolutional neural network on four popular image datasets. By systematically altering critical hyperparameters, we isolate the effects of batch normalization both in general and with respect to these hyperparameters. Our experiments show that batch normalization indeed has positive effects on many aspects of neural networks but we cannot confirm significant convergence speed improvements, especially when wall time is taken into account. Overall, batch normalized models achieve higher validation and test accuracies on all datasets, which we attribute to its regularizing effect and more stable gradient propagation. Due to these results, the use of batch normalization is generally advised since it prevents model divergence and may increase convergence speeds through higher learning rates. Regardless of these properties, we still recommend the use of variance-preserving weight initialization, as well as rectifiers over saturating nonlinearities.
Batch normalization är en metod för att påskynda träning av djupa framåtmatande neuronnnätv som nyligt blivit populär. Förutom hastighetsförbättringar så tillåter metoden enligt uppgift högre träningshastigheter, mindre noggrann parameterinitiering och mättande olinjäriteter. Författarna noterar att den exakta effekten av batch normalization på neuronnät fortfarande är ett område som kräver ytterligare studier, särskilt när det gäller deras gradient-fortplantning. Vårt arbete jämför konvergensbeteende mellan nätverk med och utan batch normalization. Vi träner både en liten flerlagersperceptron och ett djupt faltningsneuronnät på fyra populära bilddatamängder. Genom att systematiskt ändra kritiska hyperparametrar isolerar vi effekterna från batch normalization både i allmänhet och med avseende på dessa hyperparametrar. Våra experiment visar att batch normalization har positiva effekter på många aspekter av neuronnät, men vi kan inte bekräfta att det ger betydelsefullt snabbare konvergens, speciellt när väggtiden beaktas. Allmänt så uppnår modeller med batch normalization högre validerings- och testträffsäkerhet på alla datamängder, vilket vi tillskriver till dess reglerande effekt och mer stabil gradientfortplantning. På grund av dessa resultat är användningen av batch normalization generellt rekommenderat eftersom det förhindrar modelldivergens och kan öka konvergenshastigheter genom högre träningshastigheter. Trots dessa egenskaper rekommenderar vi fortfarande användning av varians-bevarande viktinitiering samt likriktare istället för mättande olinjäriteter.
APA, Harvard, Vancouver, ISO, and other styles
20

Losch, Max. "Detection and Segmentation of Brain Metastases with Deep Convolutional Networks." Thesis, KTH, Datorseende och robotik, CVAP, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-173519.

Full text
Abstract:
As deep convolutional networks (ConvNets) reach spectacular results on a multitude of computer vision tasks and perform almost as well as a human rater on the task of segmenting gliomas in the brain, I investigated the applicability for detecting and segmenting brain metastases. I trained networks with increasing depth to improve the detection rate and introduced a border-pair-scheme to reduce oversegmentation. A constraint on the time for segmenting a complete brain scan required the utilization of fully convolutional networks which reduced the time from 90 minutes to 40 seconds. Despite some present noise and label errors in the 490 full brain MRI scans, the final network achieves a true positive rate of 82.8% and 0.05 misclassifications per slice where all lesions greater than 3 mm have a perfect detection score. This work indicates that ConvNets are a suitable approach to both detect and segment metastases, especially as further architectural extensions might improve the predictive performance even more.
APA, Harvard, Vancouver, ISO, and other styles
21

Zhewei, Wang. "Fully Convolutional Networks (FCNs) for Medical Image Segmentation." Ohio University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1605199701509179.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Andriolo, Stefano. "Convolutional Neural Networks in Tomographic Image Enhancement." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22843/.

Full text
Abstract:
Convolutional Neural Networks have seen a huge rise in popularity in image applications. They have been used in medical imaging contexts to enhance the overall quality of the digital representation of the patient's scanned body region and have been very useful when dealing with limited-angle tomographic data. In this thesis, a particular type of convolutional neural network called Unet will be used as the starting point to explore the effectiveness of different networks in enhancing tomographic image reconstructions. We will first make minor tweaks to the 2-dimensional convolutional network and train it on two different datasets. After that, we will take advantage of the shape of the reconstructions we are considering to extend the convolutions to the third dimension. The scanner layout that has been considered for projecting and reconstructing volumes in this thesis indeed consits of a cone-beam geometry, whose output is a volume that approximates the original scanned object. We will then discuss the results in order to try to understand if the proposed solutions could be viable approaches for enhancing tomographic images.
APA, Harvard, Vancouver, ISO, and other styles
23

Emmot, Sebastian. "Characterizing Video Compression Using Convolutional Neural Networks." Thesis, Luleå tekniska universitet, Datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-79430.

Full text
Abstract:
Can compression parameters used in video encoding be estimated, given only the visual information of the resulting compressed video? If so, these parameters could potentially improve existing parametric video quality estimation models. Today, parametric models use information like bitrate to estimate the quality of a given video. This method is inaccurate since it does not consider the coding complexity of a video. The constant rate factor (CRF) parameter for h.264 encoding aims to keep the quality constant while varying the bitrate, if the CRF for a video is known together with bitrate, a better quality estimate could potentially be achieved. In recent years, artificial neural networks and specifically convolutional neural networks have shown great promise in the field of image processing. In this thesis, convolutional neural networks are investigated as a way of estimating the constant rate factor parameter for a degraded video by identifying the compression artifacts and their relation to the CRF used. With the use of ResNet, a model for estimating the CRF for each frame of a video can be derived, these per-frame predictions are further used in a video classification model which performs a total CRF prediction for a given video. The results show that it is possible to find a relation between the visual encoding artifacts and CRF used. The top-5 accuracy achieved for the model is at 61.9% with the use of limited training data. Given that today’s parametric bitrate based models for quality have no information about coding complexity, even a rough estimate of the CRF could improve the precision of them.
APA, Harvard, Vancouver, ISO, and other styles
24

Reiche, Myrgård Martin. "Acceleration of deep convolutional neural networks on multiprocessor system-on-chip." Thesis, Uppsala universitet, Avdelningen för datorteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385904.

Full text
Abstract:
In this master thesis some of the most promising existing frameworks and implementations of deep convolutional neural networks on multiprocessor system-on-chips (MPSoCs) are researched and evaluated. The thesis’ starting point was a previousthesis which evaluated possible deep learning models and frameworks for object detection on infra-red images conducted in the spring of 2018. In order to fit an existing deep convolutional neural network (DCNN) on a Multiple-Processor-System on Chip it needs modifications. Most DCNNs are trained on Graphic processing units (GPUs) with a bit width of 32 bit. This is not optimal for a platform with hard memory constraints such as the MPSoC which means it needs to be shortened. The optimal bit width depends on the network structure and requirements in terms of throughput and accuracy although most of the currently available object detection networks drop significantly when reduced below 6 bits width. After reducing the bit width, the network needs to be quantized and pruned for better memory usage. After quantization it can be implemented using one of many existing frameworks. This thesis focuses on Xilinx CHaiDNN and DNNWeaver V2 though it touches a little on revision, HLS4ML and DNNWeaver V1 as well. In conclusion the implementation of two network models on Xilinx Zynq UltraScale+ ZCU102 using CHaiDNN were evaluated. Conversion of existing network were done and quantization tested though not fully working. The results were a two to six times more power efficient implementation in comparison to GPU inference.
APA, Harvard, Vancouver, ISO, and other styles
25

Wieslander, Håkan, and Gustav Forslid. "Deep Convolutional Neural Networks For Detecting Cellular Changes Due To Malignancy." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-326160.

Full text
Abstract:
Discovering cancer at an early stage is an effective way to increase the chance of survival. However, since most screening processes are done manually it is time inefficient and thus costly. One way of automizing the screening process could be to classify cells using Convolutional Neural Networks. Convolutional Neural Networks have been proven to produce high accuracy for image classification tasks. This thesis investigates if Convolutional Neural Networks can be used as a tool to detect cellular changes due to malignancy in the oral cavity and uterine cervix. Two datasets containing oral cells and two datasets containing cervical cells were used. The cells were divided into normal and abnormal cells for a binary classification. The performance was evaluated for two different network architectures, ResNet and VGG. For the oral datasets the accuracy varied between 78-82% correctly classified cells depending on the dataset and network. For the cervical datasets the accuracy varied between 84-86% correctly classified cells depending on the dataset and network. These results indicates a high potential for classifying abnormalities for oral and cervical cells. ResNet was shown to be the preferable network, with a higher accuracy and a smaller standard deviation.
APA, Harvard, Vancouver, ISO, and other styles
26

Leuchowius, Karl-Johan. "Classification of High Content Screening Data by Deep Convolutional Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-334362.

Full text
Abstract:
In drug discovery, high content screening (HCS) is an imaging-based method forcell-based screening of large libraries of drug compounds. HCS generates enormous amounts of images that need to be analysed and quantified by automated image analysis. This analysis is typically performed by a variety of algorithms segmenting cells and sub-cellular compartments and quantifying properties such as fluorescence intensities, morphological features, and textural characteristics. These quantified data can then be used to train a classifier to classify the imaged cells according to the phenotypic effects of the compounds. Recent developments in machine learning have enabled a new kind of image analysis in which classifiers based on convolutional neural networks can be trained on the image data directly, by passing the image quantification step. This has been shown to produce highly accurate predictions and simplify the analysis process. In this study, convolutional neural networks (CNNs) were used to classify HCS images of cells treated with a set of different drug compounds. A set of network architectures and hyper-parameters were explored in order to optimise the classification performance. The results were compared with the accuracies achieved with a classical image analysis pipeline in combination with a classifier. With this data set, the best CNN-based classifier achieved an accuracy of 91.3 %, where as classical image analysis combined with a random forest classifier achieved a classification accuracy of 78.8 %. In addition to the large increase in classification accuracy, CNNs have benefits such as being less biased when it comes to image quantification algorithm selection, and require less hands-on time during optimisation.
APA, Harvard, Vancouver, ISO, and other styles
27

Gnacek, Matthew. "Convolutional Neural Networks for Enhanced Compression Techniques." University of Dayton / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1620139118743853.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Houmadi, Sherri F. "THE APPLICATION OF CONVOLUTIONAL NEURAL NETWORKS TO CLASSIFY PAINT DEFECTS." OpenSIUC, 2020. https://opensiuc.lib.siu.edu/dissertations/1807.

Full text
Abstract:
AN ABSTRACT OF THE DISSERTATION OFSherri Houmadi, for the Doctor of Philosophy degree in Engineering Science, presented on March 27, 2020, at Southern Illinois University Carbondale. TITLE: THE APPLICATION OF CONVOLUTIONAL NEURAL NETWORKS TO CLASSIFY PAINT DEFECTSMAJOR PROFESSOR: Dr. Julie DunstonDespite all of the technological advancements in computer vision, many companies still utilize human visual inspection to determine whether parts are good or bad. It is particularly challenging for humans to inspect parts in a fast-moving manufacturing environment. Such is the case at Aisin Manufacturing Illinois where this study will be testing the use of convolutional neural networks (CNNs) to classify paint defects on painted outside door handles and caps for automobiles. Widespread implementation of vision systems has resulted in advancements in machine learning. As the field of artificial intelligence (AI) evolves and improvement are made, diverse industries are adopting AI models for use in their applications. Medical imaging classification using neural networks has exploded in recent years. Convolutional neural networks have proven to scale very well for image classification models by extracting various features from the images. A goal of this study is to create a low-cost machine learning model that is able to quickly classify paint defects in order to identify rework parts that can be repaired and shipped. The central thesis of this doctoral work is to test a machine learning model that can classify the paint defects based on a very small dataset of images, where the images are taken with a smartphone camera in a manufacturing setting. The end goal is to train the model for an overall accuracy rate of at least 80%. By using transfer learning and balancing the class datasets, the model was trained to achieve an overall accuracy rate of 82%.
APA, Harvard, Vancouver, ISO, and other styles
29

Neri, Mattia. "Segmentazione di immagini mammografiche con convolutional neural networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/6681/.

Full text
Abstract:
Il tumore al seno si colloca al primo posto per livello di mortalità tra le patologie tumorali che colpiscono la popolazione femminile mondiale. Diversi studi clinici hanno dimostrato come la diagnosi da parte del radiologo possa essere aiutata e migliorata dai sistemi di Computer Aided Detection (CAD). A causa della grande variabilità di forma e dimensioni delle masse tumorali e della somiglianza di queste con i tessuti che le ospitano, la loro ricerca automatizzata è un problema estremamente complicato. Un sistema di CAD è generalmente composto da due livelli di classificazione: la detection, responsabile dell’individuazione delle regioni sospette presenti sul mammogramma (ROI) e quindi dell’eliminazione preventiva delle zone non a rischio; la classificazione vera e propria (classification) delle ROI in masse e tessuto sano. Lo scopo principale di questa tesi è lo studio di nuove metodologie di detection che possano migliorare le prestazioni ottenute con le tecniche tradizionali. Si considera la detection come un problema di apprendimento supervisionato e lo si affronta mediante le Convolutional Neural Networks (CNN), un algoritmo appartenente al deep learning, nuova branca del machine learning. Le CNN si ispirano alle scoperte di Hubel e Wiesel riguardanti due tipi base di cellule identificate nella corteccia visiva dei gatti: le cellule semplici (S), che rispondono a stimoli simili ai bordi, e le cellule complesse (C) che sono localmente invarianti all’esatta posizione dello stimolo. In analogia con la corteccia visiva, le CNN utilizzano un’architettura profonda caratterizzata da strati che eseguono sulle immagini, alternativamente, operazioni di convoluzione e subsampling. Le CNN, che hanno un input bidimensionale, vengono solitamente usate per problemi di classificazione e riconoscimento automatico di immagini quali oggetti, facce e loghi o per l’analisi di documenti.
APA, Harvard, Vancouver, ISO, and other styles
30

Battilana, Pietro. "Convolutional Neural Networks for Image Style Transfer." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16770/.

Full text
Abstract:
In this thesis we will use deep learning tools to tackle an interesting and complex problem of image processing called style transfer. Given a content image and a style image as inputs, the aim is to create a new image preserving the global structure of the content image but showing the artistic patterns of the style image. Before the renaissance of Arti�cial Neural Networks, early work in the �field called texture synthesis, only transferred limited and repeatitive geometric patterns of textures. Due to the avaibility of large amounts of data and cheap computational resources in the last decade Convolutional Neural Networks and Graphics Processing Units have been at the core of a paradigm shift in computer vision research. In the seminal work of Neural Style Transfer, Gatys et al. consistently disentangled style and content from different images to combine them in artistic compositions of high perceptual quality. This was done using the image representation derived from Convolutional Neural Networks trained for large-scale object recognition, which make high level image informations explicit. In this thesis, inspired by the work of Li et al., we build an efficient neural style transfer method able to transfer arbitrary styles. Existing optimisation-based methods (Gatys et al.), produce visually pleasing results but are limited because of the time consuming optimisation procedure. More recent feedforward based methods, while enjoying the inference efficiency, are mainly limited by inability of generalizing to unseen styles. The key ingredients of our approach are a Convolutional Autoencoder and a pair of feature transforms, Whitening and Coloring, reflecting a direct matching of feature covariance of the content image to the given style image. The algorithm allows us to produce images of high perceptual quality that combine the content of an arbitrary photograph with the appearance of arbitrary well known artworks.
APA, Harvard, Vancouver, ISO, and other styles
31

Gerard, Alex Michael. "Iterative cerebellar segmentation using convolutional neural networks." Thesis, University of Iowa, 2018. https://ir.uiowa.edu/etd/6579.

Full text
Abstract:
Convolutional neural networks (ConvNets) have quickly become the most widely used tool for image perception and interpretation tasks over the past several years. The single most important resource needed for training a ConvNet that will successfully generalize to unseen examples is an adequately sized labeled dataset. In many interesting medical imaging cases, the necessary size or quality of training data is not suitable for directly training a ConvNet. Furthermore, access to the expertise to manually label such datasets is often infeasible. To address these barriers, we investigate a method for iterative refinement of the ConvNet training. Initially, unlabeled images are attained, minimal labeling is performed, and a model is trained on the sparse manual labels. At the end of each training iteration, full images are predicted, and additional manual labels are identified to improve the training dataset. In this work, we show how to utilize patch-based ConvNets to iteratively build a training dataset for automatically segmenting MRI images of the human cerebellum. We construct this training dataset using a small collection of high-resolution 3D images and transfer the resulting model to a much larger, much lower resolution, collection of images. Both T1-weighted and T2-weighted MRI modalities are utilized to capture the additional features that arise from the differences in contrast between modalities. The objective is to perform tissue-level segmentation, classifying each volumetric pixel (voxel) in an image as white matter, gray matter, or cerebrospinal fluid (CSF). We will present performance results on the lower resolution dataset, and report achieving a 12.7% improvement in accuracy over the existing segmentation method, expectation maximization. Further, we will present example segmentations from our iterative approach that demonstrate it’s ability to detect white matter branching near the outer regions of the anatomy, which agrees with the known biological structure of the cerebellum and has typically eluded traditional segmentation algorithms.
APA, Harvard, Vancouver, ISO, and other styles
32

Bosello, Michael. "Integrating BDI and Reinforcement Learning: the Case Study of Autonomous Driving." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/21467/.

Full text
Abstract:
Recent breakthroughs in machine learning are paving the way to the vision of software 2.0 era, which foresees the replacement of traditional software development with such techniques for many applications. In the context of agent-oriented programming, we believe that mixing together cognitive architectures like the BDI one and learning techniques could trigger new interesting scenarios. In that view, our previous work presents Jason-RL, a framework that integrates BDI agents and Reinforcement Learning (RL) more deeply than what has been already proposed so far in the literature. The framework allows the development of BDI agents having both explicitly programmed plans and plans learned by the agent using RL. The two kinds of plans are seamlessly integrated and can be used without differences. Here, we take autonomous driving as a case study to verify the advantages of the proposed approach and framework. The BDI agent has hard-coded plans that define high-level directions while fine-grained navigation is learned by trial and error. This approach – compared to plain RL – is encouraging as RL struggles in temporally extended planning. We defined and trained an agent able to drive in a track with an intersection, at which it has to choose the correct path to reach the assigned target. A first step towards porting the system in the real-world has been done by building a 1/10 scale racecar prototype which learned how to drive in a simple track.
APA, Harvard, Vancouver, ISO, and other styles
33

Schennings, Jacob. "Deep Convolutional Neural Networks for Real-Time Single Frame Monocular Depth Estimation." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-336923.

Full text
Abstract:
Vision based active safety systems have become more frequently occurring in modern vehicles to estimate depth of the objects ahead and for autonomous driving (AD) and advanced driver-assistance systems (ADAS). In this thesis a lightweight deep convolutional neural network performing real-time depth estimation on single monocular images is implemented and evaluated. Many of the vision based automatic brake systems in modern vehicles only detect pre-trained object types such as pedestrians and vehicles. These systems fail to detect general objects such as road debris and roadside obstacles. In stereo vision systems the problem is resolved by calculating a disparity image from the stereo image pair to extract depth information. The distance to an object can also be determined using radar and LiDAR systems. By using this depth information the system performs necessary actions to avoid collisions with objects that are determined to be too close. However, these systems are also more expensive than a regular mono camera system and are therefore not very common in the average consumer car. By implementing robust depth estimation in mono vision systems the benefits from active safety systems could be utilized by a larger segment of the vehicle fleet. This could drastically reduce human error related traffic accidents and possibly save many lives. The network architecture evaluated in this thesis is more lightweight than other CNN architectures previously used for monocular depth estimation. The proposed architecture is therefore preferable to use on computationally lightweight systems. The network solves a supervised regression problem during the training procedure in order to produce a pixel-wise depth estimation map. The network was trained using a sparse ground truth image with spatially incoherent and discontinuous data and output a dense spatially coherent and continuous depth map prediction. The spatially incoherent ground truth posed a problem of discontinuity that was addressed by a masked loss function with regularization. The network was able to predict a dense depth estimation on the KITTI dataset with close to state-of-the-art performance.
APA, Harvard, Vancouver, ISO, and other styles
34

Buratti, Luca. "Visualisation of Convolutional Neural Networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text
Abstract:
Le Reti Neurali, e in particolare le Reti Neurali Convoluzionali, hanno recentemente dimostrato risultati straordinari in vari campi. Purtroppo, comunque, non vi è ancora una chiara comprensione del perchè queste architetture funzionino così bene e soprattutto è difficile spiegare il comportamento nel caso di fallimenti. Questa mancanza di chiarezza è quello che separa questi modelli dall’essere applicati in scenari concreti e critici della vita reale, come la sanità o le auto a guida autonoma. Per questa ragione, durante gli ultimi anni sono stati portati avanti diversi studi in modo tale da creare metodi che siano capaci di spiegare al meglio cosa sta succedendo dentro una rete neurale oppure dove la rete sta guardando per predire in un certo modo. Proprio queste tecniche sono il centro di questa tesi e il ponte tra i due casi di studio che sono presentati sotto. Lo scopo di questo lavoro è quindi duplice: per prima cosa, usare questi metodi per analizzare e quindi capire come migliorare applicazioni basate su reti neurali convoluzionali e in secondo luogo, per investigare la capacità di generalizzazione di queste architetture, sempre grazie a questi metodi.
APA, Harvard, Vancouver, ISO, and other styles
35

Song, Weilian. "Image-Based Roadway Assessment Using Convolutional Neural Networks." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/78.

Full text
Abstract:
Road crashes are one of the main causes of death in the United States. To reduce the number of accidents, roadway assessment programs take a proactive approach, collecting data and identifying high-risk roads before crashes occur. However, the cost of data acquisition and manual annotation has restricted the effect of these programs. In this thesis, we propose methods to automate the task of roadway safety assessment using deep learning. Specifically, we trained convolutional neural networks on publicly available roadway images to predict safety-related metrics: the star rating score and free-flow speed. Inference speeds for our methods are mere milliseconds, enabling large-scale roadway study at a fraction of the cost of manual approaches.
APA, Harvard, Vancouver, ISO, and other styles
36

Fong, Vivian Lin. "Software Requirements Classification Using Word Embeddings and Convolutional Neural Networks." DigitalCommons@CalPoly, 2018. https://digitalcommons.calpoly.edu/theses/1851.

Full text
Abstract:
Software requirements classification, the practice of categorizing requirements by their type or purpose, can improve organization and transparency in the requirements engineering process and thus promote requirement fulfillment and software project completion. Requirements classification automation is a prominent area of research as automation can alleviate the tediousness of manual labeling and loosen its necessity for domain-expertise. This thesis explores the application of deep learning techniques on software requirements classification, specifically the use of word embeddings for document representation when training a convolutional neural network (CNN). As past research endeavors mainly utilize information retrieval and traditional machine learning techniques, we entertain the potential of deep learning on this particular task. With the support of learning libraries such as TensorFlow and Scikit-Learn and word embedding models such as word2vec and fastText, we build a Python system that trains and validates configurations of Naïve Bayes and CNN requirements classifiers. Applying our system to a suite of experiments on two well-studied requirements datasets, we recreate or establish the Naïve Bayes baselines and evaluate the impact of CNNs equipped with word embeddings trained from scratch versus word embeddings pre-trained on Big Data.
APA, Harvard, Vancouver, ISO, and other styles
37

Bearzotti, Riccardo. "Structural damage detection using deep learning networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text
Abstract:
Research on damage detection of structures using image process- ing techniques has been actively conducted, specially on infrastruc- tures as road pavements, achieving considerably high detection accu- racies. These techniques are more and more studied all over the world cause seems be a powerful method able to replace, in some conditions, the experience and the visual ability of humans. This thesis has the purpose to introduce how the development in the last few years of the image processing can be useful to avoid some costs on structure monitoring and predict some disaster, that the most of times we listened call them as announced disasters that could be avoided. This thesis introduce the deep learning method implemented on Mat- lab to solve this problems trying to understand, in the first part, what machine learning and deep learning consist of, which is the best way to use the convolution neural networks and in which parameters work on. This we the purpose to give some background about this tech- nique in order to implement it on a large number of problems. There will be also some examples of basic codes and the outcomes are discussed, in order to figure out which is the best tool or combi- nation of tool to solve a problem of more complexity. At the end there are some consideration about useful future works that can be studied in order to help in structure monitoring in lab tests, during the life cycle and in case of collapse.
APA, Harvard, Vancouver, ISO, and other styles
38

Pons, Puig Jordi. "Deep neural networks for music and audio tagging." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/668036.

Full text
Abstract:
Automatic music and audio tagging can help increase the retrieval and re-use possibilities of many audio databases that remain poorly labeled. In this dissertation, we tackle the task of music and audio tagging from the deep learning perspective and, within that context, we address the following research questions: (i) Which deep learning architectures are most appropriate for (music) audio signals? (ii) In which scenarios is waveform-based end-to-end learning feasible? (iii) How much data is required for carrying out competitive deep learning research? In pursuit of answering research question (i), we propose to use musically motivated convolutional neural networks as an alternative to designing deep learning models that is based on domain knowledge, and we evaluate several deep learning architectures for audio at a low computational cost with a novel methodology based on non-trained (randomly weighted) convolutional neural networks. Throughout our work, we find that employing music and audio domain knowledge during the model’s design can help improve the efficiency, interpretability, and performance of spectrogram-based deep learning models. For research questions (ii) and (iii), we perform a study with the SampleCNN, a recently proposed end-to-end learning model, to assess its viability for music audio tagging when variable amounts of training data —ranging from 25k to 1.2M songs— are available. We compare the SampleCNN against a spectrogram-based architecture that is musically motivated and conclude that, given enough data, end-to-end learning models can achieve better results. Finally, throughout our quest for answering research question (iii), we also investigate whether a naive regularization of the solution space, prototypical networks, transfer learning, or their combination, can foster deep learning models to better leverage a small number of training examples. Results indicate that transfer learning and prototypical networks are powerful strategies in such low-data regimes.
L’etiquetatge automàtic d’àudio i de música pot augmentar les possibilitats de reutilització de moltes de les bases de dades d’àudio que romanen pràcticament sense etiquetar. En aquesta tesi, abordem la tasca de l’etiquetatge automàtic d’àudio i de música des de la perspectiva de l’aprenentatge profund i, en aquest context, abordem les següents qüestions cientı́fiques: (i) Quines arquitectures d’aprenentatge profund són les més adients per a senyals d’àudio (musicals)? (ii) En quins escenaris és viable que els models d’aprenentatge profund processin directament formes d’ona? (iii) Quantes dades es necessiten per dur a terme estudis d’investigació en aprenentatge profund? Per tal de respondre a la primera pregunta (i), proposem utilitzar xarxes neuronals convolucionals motivades musicalment i avaluem diverses arquitectures d’aprenentatge profund per a àudio a un baix cost computacional. Al llarg de les nostres investigacions, trobem que els coneixements previs que tenim sobre la música i l’àudio ens poden ajudar a millorar l’eficiència, la interpretabilitat i el rendiment dels models d’aprenentatge basats en espectrogrames. Per a les preguntes (ii – iii) estudiem com el SampleCNN, un model d’aprenentatge profund que processa formes d’ona, funciona quan disposem de quantitats variables de dades d’entrenament — des de 25k cançons fins a 1’2M cançons. En aquest estudi, comparem el SampleCNN amb una arquitectura basada en espectrogrames que està motivada musicalment. Els resultats experimentals que obtenim indiquen que, en escenaris on disposem de suficients dades, els models d’aprenentatge profund que processen formes d’ona (com el SampleCNN) poden aconseguir millors resultats que els que processen espectrogrames. Finalment, per tal d’intentar respondre a la pregunta (iii), també investiguem si una regularització severa de l’espai de solucions, les xarxes prototipades, l’aprenentatge per transferència de coneixement, o la seva combinació, poden permetre als models d’aprenentatge profund obtenir més bons resultats en escenaris on no hi ha gaires dades d’entrenament. Els resultats dels nostres experiments indiquen que l’aprenentatge per transferència de coneixement i les xarxes prototipades són estratègies útils quan les dades d’entrenament no són abundants.
APA, Harvard, Vancouver, ISO, and other styles
39

de, Giorgio Andrea. "A study on the similarities of Deep Belief Networks and Stacked Autoencoders." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-174341.

Full text
Abstract:
Restricted Boltzmann Machines (RBMs) and autoencoders have been used - in several variants - for similar tasks, such as reducing dimensionality or extracting features from signals. Even though their structures are quite similar, they rely on different training theories. Lately, they have been largely used as building blocks in deep learning architectures that are called deep belief networks (instead of stacked RBMs) and stacked autoencoders. In light of this, the student has worked on this thesis with the aim to understand the extent of the similarities and the overall pros and cons of using either RBMs, autoencoders or denoising autoencoders in deep networks. Important characteristics are tested, such as the robustness to noise, the influence on training of the availability of data and the tendency to overtrain. The author has then dedicated part of the thesis to study how the three deep networks in exam form their deep internal representations and how similar these can be to each other. In result of this, a novel approach for the evaluation of internal representations is presented with the name of F-Mapping. Results are reported and discussed.
APA, Harvard, Vancouver, ISO, and other styles
40

Johansson, Philip. "Incremental Learning of Deep Convolutional Neural Networks for Tumour Classification in Pathology Images." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158225.

Full text
Abstract:
Medical doctors understaffing is becoming a compelling problem in many healthcare systems. This problem can be alleviated by utilising Computer-Aided Diagnosis (CAD) systems to substitute doctors in different tasks, for instance, histopa-thological image classification. The recent surge of deep learning has allowed CAD systems to perform this task at a very competitive performance. However, a major challenge with this task is the need to periodically update the models with new data and/or new classes or diseases. These periodical updates will result in catastrophic forgetting, as Convolutional Neural Networks typically requires the entire data set beforehand and tend to lose knowledge about old data when trained on new data. Incremental learning methods were proposed to alleviate this problem with deep learning. In this thesis, two incremental learning methods, Learning without Forgetting (LwF) and a generative rehearsal-based method, are investigated. They are evaluated on two criteria: The first, capability of incrementally adding new classes to a pre-trained model, and the second is the ability to update the current model with an new unbalanced data set. Experiments shows that LwF does not retain knowledge properly for the two cases. Further experiments are needed to draw any definite conclusions, for instance using another training approach for the classes and try different combinations of losses. On the other hand, the generative rehearsal-based method tends to work for one class, showing a good potential to work if better quality images were generated. Additional experiments are also required in order to investigating new architectures and approaches for a more stable training.
APA, Harvard, Vancouver, ISO, and other styles
41

Viswavarapu, Lokesh Kumar. "Real-Time Finger Spelling American Sign Language Recognition Using Deep Convolutional Neural Networks." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1404616/.

Full text
Abstract:
This thesis presents design and development of a gesture recognition system to recognize finger spelling American Sign Language hand gestures. We developed this solution using the latest deep learning technique called convolutional neural networks. This system uses blink detection to initiate the recognition process, Convex Hull-based hand segmentation with adaptive skin color filtering to segment hand region, and a convolutional neural network to perform gesture recognition. An ensemble of four convolutional neural networks are trained with a dataset of 25254 images for gesture recognition and a feedback unit called head pose estimation is implemented to validate the correctness of predicted gestures. This entire system was developed using Python programming language and other supporting libraries like OpenCV, Tensor flow and Dlib to perform various image processing and machine learning tasks. This entire application can be deployed as a web application using Flask to make it operating system independent.
APA, Harvard, Vancouver, ISO, and other styles
42

Vainigli, Lorenzo. "Registrazioni vocali per la diagnosi di COVID-19 con Deep Convolutional Neural Networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22896/.

Full text
Abstract:
La pandemia da COVID-19 ha portato il mondo scientifico a cercare i metodi migliori per contrastare il veloce diffondersi della malattia. In ambito medico da anni sono in uso tecniche per diagnosticare patologie esaminando i suoni emanati dal corpo come la voce ed il respiro; altre invece sono basate sul riconoscimento di immagini. In questo studio sono state utilizzate le Deep Convolutional Neural Networks per riconoscere pazienti affetti da COVID-19 utilizzando gli spettrogrammi generati dalle registrazioni di colpi di tosse e respiri, raccolti in modalità crowdsourcing attraverso applicazioni mobili e web. I risultati sono promettenti e riescono a pareggiare lo stato dell’arte, certificando che le tecnologie di deep learning per la classificazione di immagini sono un ottimo strumento di supporto alla diagnosi di COVID-19.
APA, Harvard, Vancouver, ISO, and other styles
43

Lenc, Karel. "Representation of spatial transformations in deep neural networks." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:87a16dc2-9d77-49c3-8096-cf3416fa6893.

Full text
Abstract:
This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection. Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well. Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived covariance constraint can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification. The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.
APA, Harvard, Vancouver, ISO, and other styles
44

Farabet, Clément. "Towards real-time image understanding with convolutional networks." Thesis, Paris Est, 2013. http://www.theses.fr/2013PEST1083/document.

Full text
Abstract:
One of the open questions of artificial computer vision is how to produce good internal representations of the visual world. What sort of internal representation would allow an artificial vision system to detect and classify objects into categories, independently of pose, scale, illumination, conformation, and clutter ? More interestingly, how could an artificial vision system {em learn} appropriate internal representations automatically, the way animals and humans seem to learn by simply looking at the world ? Another related question is that of computational tractability, and more precisely that of computational efficiency. Given a good visual representation, how efficiently can it be trained, and used to encode new sensorial data. Efficiency has several dimensions: power requirements, processing speed, and memory usage. In this thesis I present three new contributions to the field of computer vision:(1) a multiscale deep convolutional network architecture to easily capture long-distance relationships between input variables in image data, (2) a tree-based algorithm to efficiently explore multiple segmentation candidates, to produce maximally confident semantic segmentations of images,(3) a custom dataflow computer architecture optimized for the computation of convolutional networks, and similarly dense image processing models. All three contributions were produced with the common goal of getting us closer to real-time image understanding. Scene parsing consists in labeling each pixel in an image with the category of the object it belongs to. In the first part of this thesis, I propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features. In parallel to feature extraction, a tree of segments is computed from a graph of pixel dissimilarities. The feature vectors associated with the segments covered by each node in the tree are aggregated and fed to a classifier which produces an estimate of the distribution of object categories contained in the segment. A subset of tree nodes that cover the image are then selected so as to maximize the average "purity" of the class distributions, hence maximizing the overall likelihood that each segment contains a single object (...)
One of the open questions of artificial computer vision is how to produce good internal representations of the visual world. What sort of internal representation would allow an artificial vision system to detect and classify objects into categories, independently of pose, scale, illumination, conformation, and clutter ? More interestingly, how could an artificial vision system {em learn} appropriate internal representations automatically, the way animals and humans seem to learn by simply looking at the world ? Another related question is that of computational tractability, and more precisely that of computational efficiency. Given a good visual representation, how efficiently can it be trained, and used to encode new sensorial data. Efficiency has several dimensions: power requirements, processing speed, and memory usage. In this thesis I present three new contributions to the field of computer vision:(1) a multiscale deep convolutional network architecture to easily capture long-distance relationships between input variables in image data, (2) a tree-based algorithm to efficiently explore multiple segmentation candidates, to produce maximally confident semantic segmentations of images,(3) a custom dataflow computer architecture optimized for the computation of convolutional networks, and similarly dense image processing models. All three contributions were produced with the common goal of getting us closer to real-time image understanding. Scene parsing consists in labeling each pixel in an image with the category of the object it belongs to. In the first part of this thesis, I propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features. In parallel to feature extraction, a tree of segments is computed from a graph of pixel dissimilarities. The feature vectors associated with the segments covered by each node in the tree are aggregated and fed to a classifier which produces an estimate of the distribution of object categories contained in the segment. A subset of tree nodes that cover the image are then selected so as to maximize the average "purity" of the class distributions, hence maximizing the overall likelihood that each segment contains a single object. The system yields record accuracies on several public benchmarks. The computation of convolutional networks, and related models heavily relies on a set of basic operators that are particularly fit for dedicated hardware implementations. In the second part of this thesis I introduce a scalable dataflow hardware architecture optimized for the computation of general-purpose vision algorithms, neuFlow, and a dataflow compiler, luaFlow, that transforms high-level flow-graph representations of these algorithms into machine code for neuFlow. This system was designed with the goal of providing real-time detection, categorization and localization of objects in complex scenes, while consuming 10 Watts when implemented on a Xilinx Virtex 6 FPGA platform, or about ten times less than a laptop computer, and producing speedups of up to 100 times in real-world applications (results from 2011)
APA, Harvard, Vancouver, ISO, and other styles
45

Chen, Xiaoran. "Image enhancement effect on the performance of convolutional neural networks." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18523.

Full text
Abstract:
Context. Image enhancement algorithms can be used to enhance the visual effects of images in the field of human vision. So can image enhancement algorithms be used in the field of computer vision? The convolutional neural network, as the most powerful image classifier at present, has excellent performance in the field of image recognition. This paper explores whether image enhancement algorithms can be used to improve the performance of convolutional neural networks. Objectives. The purpose of this paper is to explore the effect of image enhancement algorithms on the performance of CNN models in deep learning and transfer learning, respectively. The article selected five different image enhancement algorithms, they are the contrast limited adaptive histogram equalization (CLAHE), the successive means of the quantization transform (SMQT), the adaptive gamma correction, the wavelet transform, and the Laplace operator. Methods. In this paper, experiments are used as research methods. Three groups of experiments are designed; they respectively explore whether the enhancement of grayscale images can improve the performance of CNN in deep learning, whether the enhancement of color images can improve the performance of CNN in deep learning and whether the enhancement of RGB images can improve the performance of CNN in transfer learning?Results. In the experiment, in deep learning, when training a complete CNN model, using the Laplace operator to enhance the gray image can improve the recall rate of CNN. However, the remaining image enhancement algorithms cannot improve the performance of CNN in both grayscale image datasets and color image datasets. In addition, in transfer learning, when fine-tuning the pre-trained CNN model, using contrast limited adaptive histogram equalization (CLAHE), successive means quantization transform (SMQT), Wavelet transform, and Laplace operator will reduce the performance of CNN. Conclusions. Experiments show that in deep learning, using image enhancement algorithms may improve CNN performance when training complete CNN models, but not all image enhancement algorithms can improve CNN performance; in transfer learning, when fine-tuning the pre- trained CNN model, image enhancement algorithms may reduce the performance of CNN.
APA, Harvard, Vancouver, ISO, and other styles
46

Ramesh, Shreyas. "Deep Learning for Taxonomy Prediction." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/89752.

Full text
Abstract:
The last decade has seen great advances in Next-Generation Sequencing technologies, and, as a result, there has been a rise in the number of genomes sequenced each year. In 2017, there were as many as 10,000 new organisms sequenced and added into the RefSeq Database. Taxonomy prediction is a science involving the hierarchical classification of DNA fragments up to the rank species. In this research, we introduce Predicting Linked Organisms, Plinko, for short. Plinko is a fully-functioning, state-of-the-art predictive system that accurately captures DNA - Taxonomy relationships where other state-of-the-art algorithms falter. Plinko leverages multi-view convolutional neural networks and the pre-defined taxonomy tree structure to improve multi-level taxonomy prediction. In the Plinko strategy, each network takes advantage of different word usage patterns corresponding to different levels of evolutionary divergence. Plinko has the advantages of relatively low storage, GPGPU parallel training and inference, making the solution portable, and scalable with anticipated genome database growth. To the best of our knowledge, Plinko is the first to use multi-view convolutional neural networks as the core algorithm in a compositional,alignment-free approach to taxonomy prediction.
Master of Science
Taxonomy prediction is a science involving the hierarchical classification of DNA fragments up to the rank species. Given species diversity on Earth, taxonomy prediction gets challenging with (i) increasing number of species (labels) to classify and (ii) decreasing input (DNA) size. In this research, we introduce Predicting Linked Organisms, Plinko, for short. Plinko is a fully-functioning, state-of-the-art predictive system that accurately captures DNA - Taxonomy relationships where other state-of-the-art algorithms falter. Three major challenges in taxonomy prediction are (i) large dataset sizes (order of 109 sequences) (ii) large label spaces (order of 103 labels) and (iii) low resolution inputs (100 base pairs or less). Plinko leverages multi-view convolutional neural networks and the pre-defined taxonomy tree structure to improve multi-level taxonomy prediction for hard to classify sequences under the three conditions stated above. Plinko has the advantage of relatively low storage footprint, making the solution portable, and scalable with anticipated genome database growth. To the best of our knowledge, Plinko is the first to use multi-view convolutional neural networks as the core algorithm in a compositional, alignment-free approach to taxonomy prediction.
APA, Harvard, Vancouver, ISO, and other styles
47

Chen, Kuan-Ting, and 陳冠廷. "Warping of Human Face View using Convolutional Deep Belief Networks." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/80694902748248773004.

Full text
Abstract:
碩士
國立交通大學
電子工程學系 電子研究所
103
In this thesis, we aim at finding a better way of representing and connecting related human face images using the learning approach in convolutional deep belief networks (DBN). Since images are connecting with corresponding representations, it is possible for the convolutional DBN to infer a human face image with a view angle by a given image with the same human face from another view angle. The proposed methods are shown to work well due to the fact that the features detected on an image of an object in different movements are highly correlated. If patterns of feature changes could be modeled by the deep architecture, warping of human face view can be realized. Besides, the reason of using the convolutional deep belief nets as the feature extractor is that they have translated representation and translation invariant properties, which renders a more robust model to translated data. The proposed training algorithm is an unsupervised learning called pre-training. After pre-training, the model becomes a generative model specifying a joint distribution of all data and hidden states. Therefore, with a given image of human face, the model can infer a warping of the face from correlation in hidden states.
APA, Harvard, Vancouver, ISO, and other styles
48

Chu, Joseph Lin. "Using Support Vector Machines, Convolutional Neural Networks and Deep Belief Networks for Partially Occluded Object Recognition." Thesis, 2014. http://spectrum.library.concordia.ca/978484/1/Chu_MCompSc_S2014.pdf.

Full text
Abstract:
Artificial neural networks have been widely used for machine learning tasks such as object recognition. Recent developments have made use of biologically inspired architectures, such as the Convolutional Neural Network, and the Deep Belief Network. A theoretical method for estimating the optimal number of feature maps for a Convolutional Neural Network maps using the dimensions of the receptive field or convolutional kernel is proposed. Empirical experiments are performed that show that the method works to an extent for extremely small receptive fields, but doesn't generalize as clearly to all receptive field sizes. We then test the hypothesis that generative models such as the Deep Belief Network should perform better on occluded object recognition tasks than purely discriminative models such as Convolutional Neural Networks. We find that the data does not support this hypothesis when the generative models are run in a partially discriminative manner. We also find that the use of Gaussian visible units in a Deep Belief Network trained on occluded image data allows it to also learn to classify non-occluded images.
APA, Harvard, Vancouver, ISO, and other styles
49

Vojt, Ján. "Deep neural networks and their implementation." Master's thesis, 2016. http://www.nusl.cz/ntk/nusl-345228.

Full text
Abstract:
Deep neural networks represent an effective and universal model capable of solving a wide variety of tasks. This thesis is focused on three different types of deep neural networks - the multilayer perceptron, the convolutional neural network, and the deep belief network. All of the discussed network models are implemented on parallel hardware, and thoroughly tested for various choices of the network architecture and its parameters. The implemented system is accompanied by a detailed documentation of the architectural decisions and proposed optimizations. The efficiency of the implemented framework is confirmed by the results of the performed tests. A significant part of this thesis represents also additional testing of other existing frameworks which support deep neural networks. This comparison indicates superior performance to the tested rival frameworks of multilayer perceptrons and convolutional neural networks. The deep belief network implementation performs slightly better for RBM layers with up to 1000 hidden neurons, but has a noticeably inferior performance for more robust RBM layers when compared to the tested rival framework. Powered by TCPDF (www.tcpdf.org)
APA, Harvard, Vancouver, ISO, and other styles
50

Švaralová, Monika. "DRESS & GO: Deep belief networks and Rule Extraction Supported by Simple Genetic Optimization." Master's thesis, 2018. http://www.nusl.cz/ntk/nusl-383249.

Full text
Abstract:
Recent developments in social media and web technologies offer new opportunities to access, analyze and process ever-increasing amounts of fashion-related data. In the appealing context of design and fashion, our main goal is to automatically suggest fashionable outfits based on the preferences extracted from real-world data provided either by individual users or gathered from the internet. In our case, the clothing items have the form of 2D-images. Especially for visual data processing tasks, recent models of deep neural networks are known to surpass human performance. This fact inspired us to apply the idea of transfer learning to understand the actual variability in clothing items. The principle of transfer learning consists in extracting the internal representa- tions formed in large convolutional networks pre-trained on general datasets, e.g., ImageNet, and visualizing its (similarity) structure. Together with transfer learn- ing, clustering algorithms and the image color schemes can be, namely, utilized when searching for related outfit items. Viable means applicable to generating new out- fits include deep belief networks and genetic algorithms enhanced by a convolutional network that models the outfit fitness. Although fashion-related recommendations remain highly subjective, the results we have achieved...
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography