Log in

Relevant bibliographies by topics / Generative Adversarial Neural Networks (GAN's) / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Generative Adversarial Neural Networks (GAN's).

Dissertations / Theses on the topic 'Generative Adversarial Neural Networks (GAN's)'

Author: Grafiati

Published: 5 June 2025

Last updated: 25 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Generative Adversarial Neural Networks (GAN's).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Delacruz, Gian P. "Using Generative Adversarial Networks to Classify Structural Damage Caused by Earthquakes." DigitalCommons@CalPoly, 2020. https://digitalcommons.calpoly.edu/theses/2158.

Full text

Abstract:

The amount of structural damage image data produced in the aftermath of an earthquake can be staggering. It is challenging for a few human volunteers to efficiently filter and tag these images with meaningful damage information. There are several solution to automate post-earthquake reconnaissance image tagging using Machine Learning (ML) solutions to classify each occurrence of damage per building material and structural member type. ML algorithms are data driven; improving with increased training data. Thanks to the vast amount of data available and advances in computer architectures, ML and in particular Deep Learning (DL) has become one of the most popular image classification algorithms producing results comparable to and in some cases superior to human experts. These kind of algorithms need the input images used for the training to be labeled, and even if there is a large amount of images most of them are not labeled and it takes structural engineers a large amount of time to do it. The current data earthquakes image data bases do not contain the label information or is incomplete slowing significantly the advance of a solution and are incredible difficult to search. To be able to train a ML algorithm to classify one of the structural damages it took the architecture school an entire year to gather 200 images of the specific damage. That number is clearly not enough to avoid overfitting so for this thesis we decided to generate synthetic images for the specific structural damage. In particular we attempt to use Generative Adversarial Neural Networks (GANs) to generate the synthetic images and enable the fast classification of rail and road damage caused by earthquakes. Fast classification of rail and road damage can allow for the safety of people and to better prepare the reconnaissance teams that manage recovery tasks. GANs combine classification neural networks with generative neural networks. For this thesis we will be combining a convolutional neural network (CNN) with a generative neural network. By taking a classifier trained in a GAN and modifying it to classify other images the classifier can take advantage of the GAN training without having to find more training data. The classifier trained in this way was able to achieve an 88\% accuracy score when classifying images of structural damage caused by earthquakes.

APA, Harvard, Vancouver, ISO, and other styles

2

Kryściński, Wojciech. "Training Neural Models for Abstractive Text Summarization." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-236973.

Full text

Abstract:

Abstractive text summarization aims to condense long textual documents into a short, human-readable form while preserving the most important information from the source document. A common approach to training summarization models is by using maximum likelihood estimation with the teacher forcing strategy. Despite its popularity, this method has been shown to yield models with suboptimal performance at inference time. This work examines how using alternative, task-specific training signals affects the performance of summarization models. Two novel training signals are proposed and evaluated as part of this work. One, a novelty metric, measuring the overlap between n-grams in the summary and the summarized article. The other, utilizing a discriminator model to distinguish human-written summaries from generated ones on a word-level basis. Empirical results show that using the mentioned metrics as rewards for policy gradient training yields significant performance gains measured by ROUGE scores, novelty scores and human evaluation.<br>Abstraktiv textsammanfattning syftar på att korta ner långa textdokument till en förkortad, mänskligt läsbar form, samtidigt som den viktigaste informationen i källdokumentet bevaras. Ett vanligt tillvägagångssätt för att träna sammanfattningsmodeller är att använda maximum likelihood-estimering med teacher-forcing-strategin. Trots dess popularitet har denna metod visat sig ge modeller med suboptimal prestanda vid inferens. I det här arbetet undersöks hur användningen av alternativa, uppgiftsspecifika träningssignaler påverkar sammanfattningsmodellens prestanda. Två nya träningssignaler föreslås och utvärderas som en del av detta arbete. Den första, vilket är en ny metrik, mäter överlappningen mellan n-gram i sammanfattningen och den sammanfattade artikeln. Den andra använder en diskrimineringsmodell för att skilja mänskliga skriftliga sammanfattningar från genererade på ordnivå. Empiriska resultat visar att användandet av de nämnda mätvärdena som belöningar för policygradient-träning ger betydande prestationsvinster mätt med ROUGE-score, novelty score och mänsklig utvärdering.

APA, Harvard, Vancouver, ISO, and other styles

3

Nilsson, Mårten. "Augmenting High-Dimensional Data with Deep Generative Models." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233969.

Full text

Abstract:

Data augmentation is a technique that can be performed in various ways to improve the training of discriminative models. The recent developments in deep generative models offer new ways of augmenting existing data sets. In this thesis, a framework for augmenting annotated data sets with deep generative models is proposed together with a method for quantitatively evaluating the quality of the generated data sets. Using this framework, two data sets for pupil localization was generated with different generative models, including both well-established models and a novel model proposed for this purpose. The unique model was shown both qualitatively and quantitatively to generate the best data sets. A set of smaller experiments on standard data sets also revealed cases where this generative model could improve the performance of an existing discriminative model. The results indicate that generative models can be used to augment or replace existing data sets when training discriminative models.<br>Dataaugmentering är en teknik som kan utföras på flera sätt för att förbättra träningen av diskriminativa modeller. De senaste framgångarna inom djupa generativa modeller har öppnat upp nya sätt att augmentera existerande dataset. I detta arbete har ett ramverk för augmentering av annoterade dataset med hjälp av djupa generativa modeller föreslagits. Utöver detta så har en metod för kvantitativ evaulering av kvaliteten hos genererade data set tagits fram. Med hjälp av detta ramverk har två dataset för pupillokalisering genererats med olika generativa modeller. Både väletablerade modeller och en ny modell utvecklad för detta syfte har testats. Den unika modellen visades både kvalitativt och kvantitativt att den genererade de bästa dataseten. Ett antal mindre experiment på standardiserade dataset visade exempel på fall där denna generativa modell kunde förbättra prestandan hos en existerande diskriminativ modell. Resultaten indikerar att generativa modeller kan användas för att augmentera eller ersätta existerande dataset vid träning av diskriminativa modeller.

APA, Harvard, Vancouver, ISO, and other styles

4

Aftab, Nadeem. "Disocclusion Inpainting using Generative Adversarial Networks." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-40502.

Full text

Abstract:

The old methods used for images inpainting of the Depth Image Based Rendering (DIBR) process are inefficient in producing high-quality virtual views from captured data. From the viewpoint of the original image, the generated data’s structure seems less distorted in the virtual view obtained by translation but when then the virtual view involves rotation, gaps and missing spaces become visible in the DIBR generated data. The typical approaches for filling the disocclusion tend to be slow, inefficient, and inaccurate. In this project, a modern technique Generative Adversarial Network (GAN) is used to fill the disocclusion. GAN consists of two or more neural networks that compete against each other and get trained. This study result shows that GAN can inpaint the disocclusion with a consistency of the structure. Additionally, another method (Filling) is used to enhance the quality of GAN and DIBR images. The statistical evaluation of results shows that GAN and filling method enhance the quality of DIBR images.

APA, Harvard, Vancouver, ISO, and other styles

5

Käll, Viktor, and Erik Piscator. "Particle Filter Bridge Interpolation in GANs." Thesis, KTH, Matematisk statistik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301733.

Full text

Abstract:

Generative adversarial networks (GANs), a type of generative modeling framework, has received much attention in the past few years since they were discovered for their capacity to recover complex high-dimensional data distributions. These provide a compressed representation of the data where all but the essential features of a sample is extracted, subsequently inducing a similarity measure on the space of data. This similarity measure gives rise to the possibility of interpolating in the data which has been done successfully in the past. Herein we propose a new stochastic interpolation method for GANs where the interpolation is forced to adhere to the data distribution by implementing a sequential Monte Carlo algorithm for data sampling. The results show that the new method outperforms previously known interpolation methods for the data set LINES; compared to the results of other interpolation methods there was a significant improvement measured through quantitative and qualitative evaluations. The developed interpolation method has met its expectations and shown promise, however it needs to be tested on a more complex data set in order to verify that it also scales well.<br>Generative adversarial networks (GANs) är ett slags generativ modell som har fått mycket uppmärksamhet de senaste åren sedan de upptäcktes för sin potential att återskapa komplexa högdimensionella datafördelningar. Dessa förser en komprimerad representation av datan där enbart de karaktäriserande egenskaperna är bevarade, vilket följdaktligen inducerar ett avståndsmått på datarummet. Detta avståndsmått möjliggör interpolering inom datan vilket har åstadkommits med framgång tidigare. Häri föreslår vi en ny stokastisk interpoleringsmetod för GANs där interpolationen tvingas följa datafördelningen genom att implementera en sekventiell Monte Carlo algoritm för dragning av datapunkter. Resultaten för studien visar att metoden ger bättre interpolationer för datamängden LINES som användes; jämfört med resultaten av tidigare kända interpolationsmetoder syntes en märkbar förbättring genom kvalitativa och kvantitativa utvärderingar. Den framtagna interpolationsmetoden har alltså mött förväntningarna och är lovande, emellertid fordras att den testas på en mer komplex datamängd för att bekräfta att den fungerar väl även under mer generella förhållanden.

APA, Harvard, Vancouver, ISO, and other styles

6

Daley, Jr John. "Generating Synthetic Schematics with Generative Adversarial Networks." Thesis, Högskolan Kristianstad, Fakulteten för naturvetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hkr:diva-20901.

Full text

Abstract:

This study investigates synthetic schematic generation using conditional generative adversarial networks, specifically the Pix2Pix algorithm was implemented for the experimental phase of the study. With the increase in deep neural network’s capabilities and availability, there is a demand for verbose datasets. This in combination with increased privacy concerns, has led to synthetic data generation utilization. Analysis of synthetic images was completed using a survey. Blueprint images were generated and were successful in passing as genuine images with an accuracy of 40%. This study confirms the ability of generative neural networks ability to produce synthetic blueprint images.

APA, Harvard, Vancouver, ISO, and other styles

7

Yamazaki, Hiroyuki Vincent. "On Depth and Complexity of Generative Adversarial Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217293.

Full text

Abstract:

Although generative adversarial networks (GANs) have achieved state-of-the-art results in generating realistic look- ing images, they are often parameterized by neural net- works with relatively few learnable weights compared to those that are used for discriminative tasks. We argue that this is suboptimal in a generative setting where data is of- ten entangled in high dimensional space and models are ex- pected to benefit from high expressive power. Additionally, in a generative setting, a model often needs to extrapo- late missing information from low dimensional latent space when generating data samples while in a typical discrimina- tive task, the model only needs to extract lower dimensional features from high dimensional space. We evaluate different architectures for GANs with varying model capacities using shortcut connections in order to study the impacts of the capacity on training stability and sample quality. We show that while training tends to oscillate and not benefit from additional capacity of naively stacked layers, GANs are ca- pable of generating samples with higher quality, specifically for images, samples of higher visual fidelity given proper regularization and careful balancing.<br>Trots att Generative Adversarial Networks (GAN) har lyckats generera realistiska bilder består de än idag av neurala nätverk som är parametriserade med relativt få tränbara vikter jämfört med neurala nätverk som används för klassificering. Vi tror att en sådan modell är suboptimal vad gäller generering av högdimensionell och komplicerad data och anser att modeller med högre kapaciteter bör ge bättre estimeringar. Dessutom, i en generativ uppgift så förväntas en modell kunna extrapolera information från lägre till högre dimensioner medan i en klassificeringsuppgift så behöver modellen endast att extrahera lågdimensionell information från högdimensionell data. Vi evaluerar ett flertal GAN med varierande kapaciteter genom att använda shortcut connections för att studera hur kapaciteten påverkar träningsstabiliteten, samt kvaliteten av de genererade datapunkterna. Resultaten visar att träningen blir mindre stabil för modeller som fått högre kapaciteter genom naivt tillsatta lager men visar samtidigt att datapunkternas kvaliteter kan öka, specifikt för bilder, bilder med hög visuell fidelitet. Detta åstadkoms med hjälp utav regularisering och noggrann balansering.

APA, Harvard, Vancouver, ISO, and other styles

8

Castillo, Araújo Victor. "Ensembles of Single Image Super-Resolution Generative Adversarial Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290945.

Full text

Abstract:

Generative Adversarial Networks have been used to obtain state-of-the-art results for low-level computer vision tasks like single image super-resolution, however, they are notoriously difficult to train due to the instability related to the competing minimax framework. Additionally, traditional ensembling mechanisms cannot be effectively applied with these types of networks due to the resources they require at inference time and the complexity of their architectures. In this thesis an alternative method to create ensembles of individual, more stable and easier to train, models by using interpolations in the parameter space of the models is found to produce better results than those of the initial individual models when evaluated using perceptual metrics as a proxy of human judges. This method can be used as a framework to train GANs with competitive perceptual results in comparison to state-of-the-art alternatives.<br>Generative Adversarial Networks (GANs) har använts för att uppnå state-of-the- art resultat för grundläggande bildanalys uppgifter, som generering av högupplösta bilder från bilder med låg upplösning, men de är notoriskt svåra att träna på grund av instabiliteten relaterad till det konkurrerande minimax-ramverket. Dessutom kan traditionella mekanismer för att generera ensembler inte tillämpas effektivt med dessa typer av nätverk på grund av de resurser de behöver vid inferenstid och deras arkitekturs komplexitet. I det här projektet har en alternativ metod för att samla enskilda, mer stabila och modeller som är lättare att träna genom interpolation i parameterrymden visat sig ge bättre perceptuella resultat än de ursprungliga enskilda modellerna och denna metod kan användas som ett ramverk för att träna GAN med konkurrenskraftig perceptuell prestanda jämfört med toppmodern teknik.

APA, Harvard, Vancouver, ISO, and other styles

9

Garcia, Torres Douglas. "Generation of Synthetic Data with Generative Adversarial Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254366.

Full text

Abstract:

The aim of synthetic data generation is to provide data that is not real for cases where the use of real data is somehow limited. For example, when there is a need for larger volumes of data, when the data is sensitive to use, or simply when it is hard to get access to the real data. Traditional methods of synthetic data generation use techniques that do not intend to replicate important statistical properties of the original data. Properties such as the distribution, the patterns or the correlation between variables, are often omitted. Moreover, most of the existing tools and approaches require a great deal of user-defined rules and do not make use of advanced techniques like Machine Learning or Deep Learning. While Machine Learning is an innovative area of Artificial Intelligence and Computer Science that uses statistical techniques to give computers the ability to learn from data, Deep Learning is a closely related field based on learning data representations, which may serve useful for the task of synthetic data generation. This thesis focuses on one of the most interesting and promising innovations of the last years in the Machine Learning community: Generative Adversarial Networks. An approach for generating discrete, continuous or text synthetic data with Generative Adversarial Networks is proposed, tested, evaluated and compared with a baseline approach. The results prove the feasibility and show the advantages and disadvantages of using this framework. Despite its high demand for computational resources, a Generative Adversarial Networks framework is capable of generating quality synthetic data that preserves the statistical properties of a given dataset.<br>Syftet med syntetisk datagenerering är att tillhandahålla data som inte är verkliga i fall där användningen av reella data på något sätt är begränsad. Till exempel, när det finns behov av större datamängder, när data är känsliga för användning, eller helt enkelt när det är svårt att få tillgång till den verkliga data. Traditionella metoder för syntetiska datagenererande använder tekniker som inte avser att replikera viktiga statistiska egenskaper hos de ursprungliga data. Egenskaper som fördelningen, mönstren eller korrelationen mellan variabler utelämnas ofta. Dessutom kräver de flesta av de befintliga verktygen och metoderna en hel del användardefinierade regler och använder inte avancerade tekniker som Machine Learning eller Deep Learning. Machine Learning är ett innovativt område för artificiell intelligens och datavetenskap som använder statistiska tekniker för att ge datorer möjlighet att lära av data. Deep Learning ett närbesläktat fält baserat på inlärningsdatapresentationer, vilket kan vara användbart för att generera syntetisk data. Denna avhandling fokuserar på en av de mest intressanta och lovande innovationerna från de senaste åren i Machine Learning-samhället: Generative Adversarial Networks. Generative Adversarial Networks är ett tillvägagångssätt för att generera diskret, kontinuerlig eller textsyntetisk data som föreslås, testas, utvärderas och jämförs med en baslinjemetod. Resultaten visar genomförbarheten och visar fördelarna och nackdelarna med att använda denna metod. Trots dess stora efterfrågan på beräkningsresurser kan ett generativt adversarialnätverk skapa generell syntetisk data som bevarar de statistiska egenskaperna hos ett visst dataset.

APA, Harvard, Vancouver, ISO, and other styles

10

Nistal, Hurlé Javier. "Exploring generative adversarial networks for controllable musical audio synthesis." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAT009.

Full text

Abstract:

Les synthétiseurs audio sont des instruments de musique électroniques qui génèrent des sons artificiels sous un certain contrôle paramétrique. Alors que les synthétiseurs ont évolué depuis leur popularisation dans les années 70, deux défis fondamentaux restent encore non résolus: 1) le développement de systèmes de synthèse répondant à des paramètres sémantiquement intuitifs; 2) la conception de techniques de synthèse «universelles», indépendantes de la source à modéliser. Cette thèse étudie l’utilisation des réseaux adversariaux génératifs (ou GAN) pour construire de tels systèmes. L’objectif principal est de rechercher et de développer de nouveaux outils pour la production musicale, qui oﬀrent des moyens intuitifs de manipulation du son, par exemple en contrôlant des paramètres qui répondent aux propriétés perceptives du son et à d’autres caractéristiques. Notre premier travail étudie les performances des GAN lorsqu’ils sont entraînés sur diverses représentations de signaux audio. Ces expériences comparent différentes formes de données audio dans le contexte de la synthèse sonore tonale. Les résultats montrent que la représentation magnitude-fréquence instantanée et la transformée de Fourier à valeur complexe obtiennent les meilleurs résultats. En s’appuyant sur ce résultat, notre travail suivant présente DrumGAN, un synthétiseur audio de sons percussifs. En conditionnant le modèle sur des caractéristiques perceptives décrivant des propriétés timbrales de haut niveau, nous démontrons qu’un contrôle intuitif peut être obtenu sur le processus de génération. Ce travail aboutit au développement d’un plugin VST générant de l’audio haute résolution. La rareté des annotations dans les ensembles de données audio musicales remet en cause l’application de méthodes supervisées pour la génération conditionnelle. On utilise une approche de distillation des connaissances pour extraire de telles annotations à partir d’un système d’étiquetage audio préentraîné. DarkGAN est un synthétiseur de sons tonaux qui utilise les probabilités de sortie d’un tel système (appelées « étiquettes souples ») comme informations conditionnelles. Les résultats montrent que DarkGAN peut répondre modérément à de nombreux attributs intuitifs, même avec un conditionnement d’entrée hors distribution. Les applications des GAN à la synthèse audio apprennent généralement à partir de données de spectrogramme de taille fixe. Nous abordons cette limitation en exploitant une méthode auto-supervisée pour l’apprentissage de caractéristiques discrètes à partir de données séquentielles. De telles caractéristiques sont utilisées comme entrée conditionnelle pour fournir au modèle des informations dépendant du temps par étapes. La cohérence globale est assurée en fixant le bruit d’entrée z (caractéristique en GANs). Les résultats montrent que, tandis que les modèles entraînés sur un schéma de taille fixe obtiennent une meilleure qualité et diversité audio, les nôtres peuvent générer avec compétence un son de n’importe quelle durée. Une direction de recherche intéressante est la génération d’audio conditionnée par du matériel musical préexistant. Nous étudions si un générateur GAN, conditionné sur des signaux audio musicaux hautement compressés, peut générer des sorties ressemblant à l’audio non compressé d’origine. Les résultats montrent que le GAN peut améliorer la qualité des signaux audio par rapport aux versions MP3 pour des taux de compression très élevés (16 et 32 kbit/s). En conséquence directe de l’application de techniques d’intelligence artificielle dans des contextes musicaux, nous nous demandons comment la technologie basée sur l’IA peut favoriser l’innovation dans la pratique musicale. Par conséquent, nous concluons cette thèse en offrant une large perspective sur le développement d’outils d’IA pour la production musicale, éclairée par des considérations théoriques et des rapports d’utilisation d’outils d’IA dans le monde réel par des artistes professionnels<br>Audio synthesizers are electronic musical instruments that generate artificial sounds under some parametric control. While synthesizers have evolved since they were popularized in the 70s, two fundamental challenges are still unresolved: 1) the development of synthesis systems responding to semantically intuitive parameters; 2) the design of "universal," source-agnostic synthesis techniques. This thesis researches the use of Generative Adversarial Networks (GAN) towards building such systems. The main goal is to research and develop novel tools for music production that afford intuitive and expressive means of sound manipulation, e.g., by controlling parameters that respond to perceptual properties of the sound and other high-level features. Our first work studies the performance of GANs when trained on various common audio signal representations (e.g., waveform, time-frequency representations). These experiments compare different forms of audio data in the context of tonal sound synthesis. Results show that the Magnitude and Instantaneous Frequency of the phase and the complex-valued Short-Time Fourier Transform achieve the best results. Building on this, our following work presents DrumGAN, a controllable adversarial audio synthesizer of percussive sounds. By conditioning the model on perceptual features describing high-level timbre properties, we demonstrate that intuitive control can be gained over the generation process. This work results in the development of a VST plugin generating full-resolution audio and compatible with any Digital Audio Workstation (DAW). We show extensive musical material produced by professional artists from Sony ATV using DrumGAN. The scarcity of annotations in musical audio datasets challenges the application of supervised methods to conditional generation settings. Our third contribution employs a knowledge distillation approach to extract such annotations from a pre-trained audio tagging system. DarkGAN is an adversarial synthesizer of tonal sounds that employs the output probabilities of such a system (so-called “soft labels”) as conditional information. Results show that DarkGAN can respond moderately to many intuitive attributes, even with out-of-distribution input conditioning. Applications of GANs to audio synthesis typically learn from fixed-size two-dimensional spectrogram data analogously to the "image data" in computer vision; thus, they cannot generate sounds with variable duration. In our fourth paper, we address this limitation by exploiting a self-supervised method for learning discrete features from sequential data. Such features are used as conditional input to provide step-wise time-dependent information to the model. Global consistency is ensured by fixing the input noise z (characteristic in adversarial settings). Results show that, while models trained on a fixed-size scheme obtain better audio quality and diversity, ours can competently generate audio of any duration. One interesting direction for research is the generation of audio conditioned on preexisting musical material, e.g., the generation of some drum pattern given the recording of a bass line. Our fifth paper explores a simple pretext task tailored at learning such types of complex musical relationships. Concretely, we study whether a GAN generator, conditioned on highly compressed MP3 musical audio signals, can generate outputs resembling the original uncompressed audio. Results show that the GAN can improve the quality of the audio signals over the MP3 versions for very high compression rates (16 and 32 kbit/s). As a direct consequence of applying artificial intelligence techniques in musical contexts, we ask how AI-based technology can foster innovation in musical practice. Therefore, we conclude this thesis by providing a broad perspective on the development of AI tools for music production, informed by theoretical considerations and reports from real-world AI tool usage by professional artists

APA, Harvard, Vancouver, ISO, and other styles

11

Oskarsson, Joel. "Probabilistic Regression using Conditional Generative Adversarial Networks." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166637.

Full text

Abstract:

Regression is a central problem in statistics and machine learning with applications everywhere in science and technology. In probabilistic regression the relationship between a set of features and a real-valued target variable is modelled as a conditional probability distribution. There are cases where this distribution is very complex and not properly captured by simple approximations, such as assuming a normal distribution. This thesis investigates how conditional Generative Adversarial Networks (GANs) can be used to properly capture more complex conditional distributions. GANs have seen great success in generating complex high-dimensional data, but less work has been done on their use for regression problems. This thesis presents experiments to better understand how conditional GANs can be used in probabilistic regression. Different versions of GANs are extended to the conditional case and evaluated on synthetic and real datasets. It is shown that conditional GANs can learn to estimate a wide range of different distributions and be competitive with existing probabilistic regression models.

APA, Harvard, Vancouver, ISO, and other styles

12

Adhikari, Aakriti. "Skin Cancer Detection using Generative Adversarial Networkand an Ensemble of deep Convolutional Neural Networks." University of Toledo / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1574383625473665.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Albertazzi, Riccardo. "A study on the application of generative adversarial networks to industrial OCR." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text

Abstract:

High performance and nearly perfect accuracy are the standards required by OCR algorithms for industrial applications. In the last years research on Deep Learning has proven that Convolutional Neural Networks (CNNs) are a very powerful and robust tool for image analysis and classification; when applied to OCR tasks, CNNs are able to perform much better than previously adopted techniques and reach easily 99% accuracy. However, Deep Learning models' effectiveness relies on the quality of the data used to train them; this can become a problem since OCR tools can run for months without interruption, and during this period unpredictable variations (printer errors, background modifications, light conditions) could affect the accuracy of the trained system. We cannot expect that the final user who trains the tool will take thousands of training pictures under different conditions until all imaginable variations have been captured; we then have to be able to generate these variations programmatically. Generative Adversarial Networks (GANs) are a recent breakthrough in machine learning; these networks are able to learn the distribution of the input data and therefore generate realistic samples belonging to that distribution. This thesis' objective is learning how GANs work in detail and perform experiments on generative models that allow to create unseen variations of OCR training characters, thus allowing the whole OCR system to be more robust to future character variations.

APA, Harvard, Vancouver, ISO, and other styles

14

Singh, Vivek Kumar. "Segmentation and classification of multimodal medical images based on generative adversarial learning and convolutional neural networks." Doctoral thesis, Universitat Rovira i Virgili, 2019. http://hdl.handle.net/10803/668445.

Full text

Abstract:

L’objectiu principal d’aquesta tesi és crear un sistema CAD avançat per a qualsevol tipus de modalitat d’imatge mèdica amb altes taxes de sensibilitat i especificitat basades en tècniques d’aprenentatge profund. Més concretament, volem millorar el mètode automàtic de detecció de les regions d’interès (ROI), que són àrees de la imatge que contenen possibles teixits malalts, així com la segmentació de les troballes (delimitació de la frontera) i, en definitiva, una predicció del diagnosi més adequat (classificació). En aquesta tesi ens centrem en diversos camps, que inclouen mamografies i ecografies per diagnosticar un càncer de mama, anàlisi de lesions de la pell en imatges dermoscòpiques i inspecció del fons de la retina per evitar la retinopatia diabètica.<br>El objetivo principal de esta tesis es crear un sistema CAD avanzado para cualquier tipo de modalidad de imagen médica con altas tasas de sensibilidad y especificidad basadas en técnicas de aprendizaje profundo. Más concretamente, queremos mejorar el método automático de detección de las regiones de interés (ROI), que son áreas de la imagen que contienen posibles tejidos enfermos, así como la segmentación de los hallazgos (delimitación de la frontera) y, en definitiva, una predicción del diagnóstico más adecuado (clasificación). En esta tesis nos centramos en diversos campos, que incluyen mamografías y ecografías para diagnosticar un cáncer de mama, análisis de lesiones de la piel en imágenes dermoscòpiques y inspección del fondo de la retina para evitar la retinopatía diabética<br>The main aim of this thesis is to create an advanced CAD system for any type of medical image modality with high sensitivity and specificity rates based on deep learning techniques. More specifically, we want to improve the automatic method of detection of Regions of Interest (ROI), which are areas of the image that contain possible ill tissues, as well as segmentation of the findings (delimitation with a boundary), and ultimately, a prediction of a most suitable diagnose (classification). In this thesis, we focus on several topics including mammograms and ultrasound images to diagnose breast cancer, skin lesions analysis in dermoscopic images and retinal fundus images examination to avoid diabetic retinopathy.

APA, Harvard, Vancouver, ISO, and other styles

15

Kalainathan, Diviyan. "Generative Neural Networks to infer Causal Mechanisms : algorithms and applications." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS516.

Full text

Abstract:

La découverte de relations causales est primordiale pour la planification, le raisonnement et la décision basée sur des données d'observations ; confondre corrélation et causalité ici peut mener à des conséquences indésirables. La référence pour la découverte de relations causales est d'effectuer des expériences contrôlées. Mais dans la majorité des cas, ces expériences sont coûteuses, immorales ou même impossible à réaliser. Dans ces cas, il est nécessaire d'effectuer la découverte causale seulement sur des données d'observations. Dans ce contexte de causalité observationnelle, retrouver des relations causales introduit traditionellement des hypothèses considérables sur les données et sur le modèle causal sous-jacent. Cette thèse vise à relaxer certaines de ces hypothèses en exploitant à la fois la modularité et l'expressivité des réseaux de neurones pour la causalité, en exploitant à la fois et indépendences conditionnelles et la simplicité des méchanismes causaux, à travers deux algorithmes. Des expériences extensives sur des données simulées et sur des données réelles ainsi qu'une analyse théorique approfondie prouvent la cohérence et bonne performance des approches proposées<br>Causal discovery is of utmost importance for agents who must plan, reason and decide based on observations; where mistaking correlation with causation might lead to unwanted consequences. The gold standard to discover causal relations is to perform experiments.However, experiments are in many cases expensive, unethical, or impossible to realize. In these situations, there is a need for observational causal discovery, that is, the estimation of causal relations from observations alone.Causal discovery in the observational data setting traditionally involves making significant assumptions on the data and on the underlying causal model.This thesis aims to alleviate some of the assumptions made on the causal models by exploiting the modularity and expressiveness of neural networks for causal discovery, leveraging both conditional independences and simplicity of the causal mechanisms through two algorithms.Extensive experiments on both simulated and real-world data and a throughout theoretical anaylsis prove the good performance and the soundness of the proposed approaches

APA, Harvard, Vancouver, ISO, and other styles

16

Johansson, Philip. "Incremental Learning of Deep Convolutional Neural Networks for Tumour Classification in Pathology Images." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-158225.

Full text

Abstract:

Medical doctors understaffing is becoming a compelling problem in many healthcare systems. This problem can be alleviated by utilising Computer-Aided Diagnosis (CAD) systems to substitute doctors in different tasks, for instance, histopa-thological image classification. The recent surge of deep learning has allowed CAD systems to perform this task at a very competitive performance. However, a major challenge with this task is the need to periodically update the models with new data and/or new classes or diseases. These periodical updates will result in catastrophic forgetting, as Convolutional Neural Networks typically requires the entire data set beforehand and tend to lose knowledge about old data when trained on new data. Incremental learning methods were proposed to alleviate this problem with deep learning. In this thesis, two incremental learning methods, Learning without Forgetting (LwF) and a generative rehearsal-based method, are investigated. They are evaluated on two criteria: The first, capability of incrementally adding new classes to a pre-trained model, and the second is the ability to update the current model with an new unbalanced data set. Experiments shows that LwF does not retain knowledge properly for the two cases. Further experiments are needed to draw any definite conclusions, for instance using another training approach for the classes and try different combinations of losses. On the other hand, the generative rehearsal-based method tends to work for one class, showing a good potential to work if better quality images were generated. Additional experiments are also required in order to investigating new architectures and approaches for a more stable training.

APA, Harvard, Vancouver, ISO, and other styles

17

Karlsson, Simon, and Per Welander. "Generative Adversarial Networks for Image-to-Image Translation on Street View and MR Images." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148475.

Full text

Abstract:

Generative Adversarial Networks (GANs) is a deep learning method that has been developed for synthesizing data. One application for which it can be used for is image-to-image translations. This could prove to be valuable when training deep neural networks for image classification tasks. Two areas where deep learning methods are used are automotive vision systems and medical imaging. Automotive vision systems are expected to handle a broad range of scenarios which demand training data with a high diversity. The scenarios in the medical field are fewer but the problem is instead that it is difficult, time consuming and expensive to collect training data. This thesis evaluates different GAN models by comparing synthetic MR images produced by the models against ground truth images. A perceptual study is also performed by an expert in the field. It is shown by the study that the implemented GAN models can synthesize visually realistic MR images. It is also shown that models producing more visually realistic synthetic images not necessarily have better results in quantitative error measurements, when compared to ground truth data. Along with the investigations on medical images, the thesis explores the possibilities of generating synthetic street view images of different resolution, light and weather conditions. Different GAN models have been compared, implemented with our own adjustments, and evaluated. The results show that it is possible to create visually realistic images for different translations and image resolutions.

APA, Harvard, Vancouver, ISO, and other styles

18

Gustafsson, Alexander, and Jonatan Linberg. "Investigation of generative adversarial network training : The effect of hyperparameters on training time and stability." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19847.

Full text

Abstract:

Generative Adversarial Networks (GAN) is a technique used to learn the distribution of some dataset in order to generate similar data. GAN models are notoriously difficult to train, which has caused limited deployment in the industry. The results of this study can be used to accelerate the process of making GANs production ready. An experiment was conducted where multiple GAN models were trained, with the hyperparameters Leaky ReLU alpha, convolutional filters, learning rate and batch size as independent variables. A Mann-Whitney U-test was used to compare the training time and training stability of each model to the others’. Except for the Leaky ReLU alpha, changes to the investigated hyperparameters had a significant effect on the training time and stability. This study is limited to a few hyperparameters and values, a single dataset and few data points, further research in the area could look at the generalisability of the results or investigate more hyperparameters.

APA, Harvard, Vancouver, ISO, and other styles

19

Shmelkov, Konstantin. "Approches pour l'apprentissage incrémental et la génération des images." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM010/document.

Full text

Abstract:

Cette thèse explore deux sujets liés dans le contexte de l'apprentissage profond : l'apprentissage incrémental et la génération des images. L'apprentissage incrémental étudie l'entrainement des modèles dont la fonction objective évolue avec le temps (exemple : Ajout de nouvelles catégories à une tâche de classification). La génération d'images cherche à apprendre une distribution d'images naturelles pour générer de nouvelles images ressemblant aux images de départ.L’apprentissage incrémental est un problème difficile dû au phénomène appelé l'oubli catastrophique : tout changement important de l’objectif au cours de l'entrainement provoque une grave dégradation des connaissances acquises précédemment. Nous présentons un cadre d'apprentissage permettant d'introduire de nouvelles classes dans un réseau de détection d'objets. Il est basé sur l’idée de la distillation du savoir pour lutter les effets de l’oubli catastrophique : une copie fixe du réseau évalue les anciens échantillons et sa sortie est réutilisée dans un objectif auxiliaire pour stabiliser l’apprentissage de nouvelles classes. Notre framework extrait ces échantillons d'anciennes classes à la volée à partir d'images entrantes, contrairement à d'autres solutions qui gardent un sous-ensemble d'échantillons en mémoire.Pour la génération d’images, nous nous appuyons sur le modèle du réseau adverse génératif (en anglais generative adversarial network ou GAN). Récemment, les GANs ont considérablement amélioré la qualité des images générées. Cependant, ils offrent une pauvre couverture de l'ensemble des données : alors que les échantillons individuels sont de grande qualité, certains modes de la distribution d'origine peuvent ne pas être capturés. De plus, contrairement à la mesure de vraisemblance couramment utilisée pour les modèles génératives, les méthodes existantes d'évaluation GAN sont axées sur la qualité de l'image et n'évaluent donc pas la qualité de la couverture du jeu de données. Nous présentons deux approches pour résoudre ces problèmes.La première approche évalue les GANs conditionnels à la classe en utilisant deux mesures complémentaires basées sur la classification d'image - GAN-train et GAN-test, qui approchent respectivement le rappel (diversité) et la précision (qualité d'image) des GANs. Nous évaluons plusieurs approches GANs récentes en fonction de ces deux mesures et démontrons une différence de performance importante. De plus, nous observons que la difficulté croissante du jeu de données, de CIFAR10 à ImageNet, indique une corrélation inverse avec la qualité des GANs, comme le montre clairement nos mesures.Inspirés par notre étude des modèles GANs, la seconde approche applique explicitement la couverture d'un jeux de données pendant la phase d'entrainement de GAN. Nous développons un modèle génératif combinant la qualité d'image GAN et l'architecture VAE dans l'espace latente engendré par un modèle basé sur le flux, Real-NVP. Cela nous permet d’évaluer une vraisemblance correcte et d’assouplir simultanément l’hypothèse d’indépendance dans l’espace RVB qui est courante pour les VAE. Nous obtenons le score Inception et la FID en concurrence avec les GANs à la pointe de la technologie, tout en maintenant une bonne vraisemblance pour cette classe de modèles<br>This dissertation explores two related topics in the context of deep learning: incremental learning and image generation. Incremental learning studies training of models with the objective function evolving over time, eg, addition of new categories to a classification task. Image generation seeks to learn a distribution of natural images for generating new images resembling original ones.Incremental learning is a challenging problem due to the phenomenon called catastrophic forgetting: any significant change to the objective during training causes a severe degradation of previously learned knowledge. We present a learning framework to introduce new classes to an object detection network. It is based on the idea of knowledge distillation to counteract catastrophic forgetting effects: fixed copy of the network evaluates old samples and its output is reused in an auxiliary loss to stabilize learning of new classes. Our framework mines these samples of old classes on the fly from incoming images, in contrast to other solutions that keep a subset of samples in memory.On the second topic of image generation, we build on the Generative Adversarial Network (GAN) model. Recently, GANs significantly improved the quality of generated images. However, they suffer from poor coverage of the dataset: while individual samples have great quality, some modes of the original distribution may not be captured. In addition, existing GAN evaluation methods are focused on image quality, and thus do not evaluate how well the dataset is covered, in contrast to the likelihood measure commonly used for generative models. We present two approaches to address these problems.The first method evaluates class-conditional GANs using two complementary measures based on image classification - GAN-train and GAN-test, which approximate recall (diversity) and precision (quality of the image) of GANs respectively. We evaluate several recent GAN approaches based on these two measures, and demonstrate a clear difference in performance. Furthermore, we observe that the increasing difficulty of the dataset, from CIFAR10 over CIFAR100 to ImageNet, shows an inverse correlation with the quality of the GANs, as clearly evident from our measures.Inspired by our study of GAN models, we present a method to explicitly enforce dataset coverage during the GAN training phase. We develop a generative model that combines GAN image quality with VAE architecture in the feature space engendered by a flow-based model Real-NVP. This allows us to evaluate a valid likelihood and simultaneously relax the independence assumption in RGB space which is common for VAEs. We achieve Inception score and FID competitive with state-of-the-art GANs, while maintaining good likelihood for this class of models

APA, Harvard, Vancouver, ISO, and other styles

20

Flaherty, Drew. "Artistic approaches to machine learning." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/200191/1/Drew_Flaherty_Thesis.pdf.

Full text

Abstract:

This research is about how Artificial Intelligence and Machine Learning may impact creative practice. The thesis looks at various implementations and models related to the subject from different cultural and technical viewpoints. The project also provides experimental creative outcomes from my personal practice along with a qualitative study into attitudes and perspectives from other creative practitioners.

APA, Harvard, Vancouver, ISO, and other styles

21

Šagát, Martin. "Návrh generativní kompetitivní neuronové sítě pro generování umělých EKG záznamů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413114.

Full text

Abstract:

The work deals with the generation of ECG signals using generative adversarial networks (GAN). It examines in detail the basics of artificial neural networks and the principles of their operation. It theoretically describes the use and operation and the most common types of failures of generative adversarial networks. In this work, a general procedure of signal preprocessing suitable for GAN training was derived, which was used to compile a database. In this work, a total of 3 different GAN models were designed and implemented. The results of the models were visually displayed and analyzed in detail. Finally, the work comments on the achieved results and suggests further research direction of methods dealing with the generation of ECG signals.

APA, Harvard, Vancouver, ISO, and other styles

22

Pakdaman, Hesam. "Updating the generator in PPGN-h with gradients flowing through the encoder." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-224867.

Full text

Abstract:

The Generative Adversarial Network framework has shown success in implicitly modeling data distributions and is able to generate realistic samples. Its architecture is comprised of a generator, which produces fake data that superficially seem to belong to the real data distribution, and a discriminator which is to distinguish fake from genuine samples. The Noiseless Joint Plug & Play model offers an extension to the framework by simultaneously training autoencoders. This model uses a pre-trained encoder as a feature extractor, feeding the generator with global information. Using the Plug & Play network as baseline, we design a new model by adding discriminators to the Plug & Play architecture. These additional discriminators are trained to discern real and fake latent codes, which are the output of the encoder using genuine and generated inputs, respectively. We proceed to investigate whether this approach is viable. Experiments conducted for the MNIST manifold show that this indeed is the case.<br>Generative Adversarial Network är ett ramverk vilket implicit modellerar en datamängds sannolikhetsfördelning och är kapabel till att producera realistisk exempel. Dess arkitektur utgörs av en generator, vilken kan fabricera datapunkter liggandes nära den verkliga sannolikhetsfördelning, och en diskriminator vars syfte är att urskilja oäkta punkter från genuina. Noiseless Joint Plug & Play modellen är en vidareutveckling av ramverket som samtidigt tränar autoencoders. Denna modell använder sig utav en inlärd enkoder som förser generatorn med data. Genom att använda Plug & Play modellen som referens, skapar vi en ny modell genom att addera diskriminatorer till Plug & Play architekturen. Dessa diskriminatorer är tränade att särskilja genuina och falska latenta koder, vilka har producerats av enkodern genom att ha använt genuina och oäkta datapunkter som inputs. Vi undersöker huruvida denna metod är gynnsam. Experiment utförda för MNIST datamängden visar att så är fallet.

APA, Harvard, Vancouver, ISO, and other styles

23

Tirumaladasu, Sai Subhakar, and Shirdi Manjunath Adigarla. "Autonomous Driving: Traffic Sign Classification." Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17783.

Full text

Abstract:

Autonomous Driving and Advance Driver Assistance Systems (ADAS) are revolutionizing the way we drive and the future of mobility. Among ADAS, Traffic Sign Classification is an important technique which assists the driver to easily interpret traffic signs on the road. In this thesis, we used the powerful combination of Image Processing and Deep Learning to pre-process and classify the traffic signs. Recent studies in Deep Learning show us how good a Convolutional Neural Network (CNN) is for image classification and there are several state-of-the-art models with classification accuracies over 99 % existing out there. This shaped our thesis to focus more on tackling the current challenges and some open-research cases. We focussed more on performance tuning by modifying the existing architectures with a trade-off between computations and accuracies. Our research areas include enhancement in low light/noisy conditions by adding Recurrent Neural Network (RNN) connections, and contribution to a universal-regional dataset with Generative Adversarial Networks (GANs). The results obtained on the test data are comparable to the state-of-the-art models and we reached accuracies above 98% after performance evaluation in different frameworks

APA, Harvard, Vancouver, ISO, and other styles

24

Lidberg, Love. "Object Detection using deep learning and synthetic data." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-150555.

Full text

Abstract:

This thesis investigates how synthetic data can be utilized when training convolutional neural networks to detect flags with threatening symbols. The synthetic data used in this thesis consisted of rendered 3D flags with different textures and flags cut out from real images. The synthetic data showed that it can achieve an accuracy above 80% compared to 88% accuracy achieved by a data set containing only real images. The highest accuracy scored was achieved by combining real and synthetic data showing that synthetic data can be used as a complement to real data. Some attempts to improve the accuracy score was made using generative adversarial networks without achieving any encouraging results.

APA, Harvard, Vancouver, ISO, and other styles

25

Nord, Sofia. "Multivariate Time Series Data Generation using Generative Adversarial Networks : Generating Realistic Sensor Time Series Data of Vehicles with an Abnormal Behaviour using TimeGAN." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302644.

Full text

Abstract:

Large datasets are a crucial requirement to achieve high performance, accuracy, and generalisation for any machine learning task, such as prediction or anomaly detection, However, it is not uncommon for datasets to be small or imbalanced since gathering data can be difficult, time-consuming, and expensive. In the task of collecting vehicle sensor time series data, in particular when the vehicle has an abnormal behaviour, these struggles are present and may hinder the automotive industry in its development. Synthetic data generation has become a growing interest among researchers in several fields to handle the struggles with data gathering. Among the methods explored for generating data, generative adversarial networks (GANs) have become a popular approach due to their wide application domain and successful performance. This thesis focuses on generating multivariate time series data that are similar to vehicle sensor readings from the air pressures in the brake system of vehicles with an abnormal behaviour, meaning there is a leakage somewhere in the system. A novel GAN architecture called TimeGAN was trained to generate such data and was then evaluated using both qualitative and quantitative evaluation metrics. Two versions of this model were tested and compared. The results obtained proved that both models learnt the distribution and the underlying information within the features of the real data. The goal of the thesis was achieved and can become a foundation for future work in this field.<br>När man applicerar en modell för att utföra en maskininlärningsuppgift, till exempel att förutsäga utfall eller upptäcka avvikelser, är det viktigt med stora dataset för att uppnå hög prestanda, noggrannhet och generalisering. Det är dock inte ovanligt att dataset är små eller obalanserade eftersom insamling av data kan vara svårt, tidskrävande och dyrt. När man vill samla tidsserier från sensorer på fordon är dessa problem närvarande och de kan hindra bilindustrin i dess utveckling. Generering av syntetisk data har blivit ett växande intresse bland forskare inom flera områden som ett sätt att hantera problemen med datainsamling. Bland de metoder som undersökts för att generera data har generative adversarial networks (GANs) blivit ett populärt tillvägagångssätt i forskningsvärlden på grund av dess breda applikationsdomän och dess framgångsrika resultat. Denna avhandling fokuserar på att generera flerdimensionell tidsseriedata som liknar fordonssensoravläsningar av lufttryck i bromssystemet av fordon med onormalt beteende, vilket innebär att det finns ett läckage i systemet. En ny GAN modell kallad TimeGAN tränades för att genera sådan data och utvärderades sedan både kvalitativt och kvantitativt. Två versioner av denna modell testades och jämfördes. De erhållna resultaten visade att båda modellerna lärde sig distributionen och den underliggande informationen inom de olika signalerna i den verkliga datan. Målet med denna avhandling uppnåddes och kan lägga grunden för framtida arbete inom detta område.

APA, Harvard, Vancouver, ISO, and other styles

26

Oquab, Maxime. "Convolutional neural networks : towards less supervision for visual recognition." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE061.

Full text

Abstract:

Les réseaux de neurones à convolution sont des algorithmes d’apprentissage flexibles qui tirent efficacement parti des importantes masses de données qui leur sont fournies pour l’entraînement. Malgré leur utilisation dans des applications industrielles dès les années 90, ces algorithmes n’ont pas été utilisés pour la reconnaissance d’image à cause de leurs faibles performances avec les images naturelles. C’est finalement grâce a l’apparition d’importantes quantités de données et de puissance de calcul que ces algorithmes ont pu révéler leur réel potentiel lors de la compétition ImageNet, menant à un changement de paradigme en reconnaissance d’image. La première contribution de cette thèse est une méthode de transfert d’apprentissage dans les réseaux à convolution pour la classification d’image. À l’aide d’une procédure de pré-entraînement, nous montrons que les représentations internes d’un réseau à convolution sont assez générales pour être utilisées sur d’autres tâches, et meilleures lorsque le pré-entraînement est réalisé avec plus de données. La deuxième contribution de cette thèse est un système faiblement supervisé pour la classification d’images, pouvant prédire la localisation des objets dans des scènes complexes, en utilisant, lors de l’entraînement, seulement l’indication de la présence ou l’absence des objets dans les images. La troisième contribution de cette thèse est une recherche de pistes de progression en apprentissage non-supervisé. Nous étudions l’algorithme récent des réseaux génératifs adversariaux et proposons l’utilisation d’un test statistique pour l’évaluation de ces modèles. Nous étudions ensuite les liens avec le problème de la causalité, et proposons un test statistique pour la découverte causale. Finalement, grâce a un lien établi récemment avec les problèmes de transport optimal, nous étudions ce que ces réseaux apprennent des données dans le cas non-supervisé<br>Convolutional Neural Networks are flexible learning algorithms for computer vision that scale particularly well with the amount of data that is provided for training them. Although these methods had successful applications already in the ’90s, they were not used in visual recognition pipelines because of their lesser performance on realistic natural images. It is only after the amount of data and the computational power both reached a critical point that these algorithms revealed their potential during the ImageNet challenge of 2012, leading to a paradigm shift in visual recogntion. The first contribution of this thesis is a transfer learning setup with a Convolutional Neural Network for image classification. Using a pre-training procedure, we show that image representations learned in a network generalize to other recognition tasks, and their performance scales up with the amount of data used in pre-training. The second contribution of this thesis is a weakly supervised setup for image classification that can predict the location of objects in complex cluttered scenes, based on a dataset indicating only with the presence or absence of objects in training images. The third contribution of this thesis aims at finding possible paths for progress in unsupervised learning with neural networks. We study the recent trend of Generative Adversarial Networks and propose two-sample tests for evaluating models. We investigate possible links with concepts related to causality, and propose a two-sample test method for the task of causal discovery. Finally, building on a recent connection with optimal transport, we investigate what these generative algorithms are learning from unlabeled data

APA, Harvard, Vancouver, ISO, and other styles

27

Mennborg, Alexander. "AI-Driven Image Manipulation : Image Outpainting Applied on Fashion Images." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-85148.

Full text

Abstract:

The e-commerce industry frequently has to deal with displaying product images in a website where the images are provided by the selling partners. The images in question can have drastically different aspect ratios and resolutions which makes it harder to present them while maintaining a coherent user experience. Manipulating images by cropping can sometimes result in parts of the foreground (i.e. product or person within the image) to be cut off. Image outpainting is a technique that allows images to be extended past its boundaries and can be used to alter the aspect ratio of images. Together with object detection for locating the foreground makes it possible to manipulate images without sacrificing parts of the foreground. For image outpainting a deep learning model was trained on product images that can extend images by at least 25%. The model achieves 8.29 FID score, 44.29 PSNR score and 39.95 BRISQUE score. For testing this solution in practice a simple image manipulation pipeline was created which uses image outpainting when needed and it shows promising results. Images can be manipulated in under a second running on ZOTAC GeForce RTX 3060 (12GB) GPU and a few seconds running on a Intel Core i7-8700K (16GB) CPU. There is also a special case of images where the background has been digitally replaced with a solid color and they can be outpainted even faster without deep learning.

APA, Harvard, Vancouver, ISO, and other styles

28

Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.

Full text

Abstract:

La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images<br>In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain

APA, Harvard, Vancouver, ISO, and other styles

29

Hermansson, Adam, and Stefan Generalao. "Interpretable Superhuman Machine Learning Systems: An explorative study focusing on interpretability and detecting Unknown Knowns using GAN." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20857.

Full text

Abstract:

I en framtid där förutsägelser och beslut som tas av maskininlärningssystem överträffar människors förmåga behöver systemen att vara tolkbara för att vi skall kunna lita på och förstå dem. Vår studie utforskar världen av tolkbar maskininlärning genom att designa och undersöka artefakter. Vi genomför experiment för att utforska förklarbarhet, tolkbarhet samt tekniska utmaningar att skapa maskininlärningsmodeller för att identifiera liknande men unika objekt. Slutligen genomför vi ett användartest för att utvärdera toppmoderna förklaringsverktyg i ett direkt mänskligt sammanhang. Med insikter från dessa experiment diskuterar vi den potentiella framtiden för detta fält<br>In a future where predictions and decisions made by machine learning systems outperform humans we need the systems to be interpretable in order for us to trust and understand them. Our study explore the realm of interpretable machine learning through designing artifacts. We conduct experiments to explore explainability, interpretability as well as technical challenges of creating machine learning models to identify objects that appear similar to humans. Lastly, we conduct a user test to evaluate current state-of-the-art visual explanatory tools in a human setting. From these insights, we discuss the potential future of this field.

APA, Harvard, Vancouver, ISO, and other styles

30

Rončka, Martin. "Material Artefact Generation." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-399191.

Full text

Abstract:

Ne vždy je jednoduché získání dostatečně velké a kvalitní datové sady s obrázky zřetelných artefaktů, ať už kvůli nedostatku ze strany zdroje dat nebo složitosti tvorby anotací. To platí například pro radiologii, nebo také strojírenství. Abychom mohli využít moderní uznávané metody strojového učení které se využívají pro klasifikaci, segmentaci a detekci defektů, je potřeba aby byla datová sada dostatečně velká a vyvážená. Pro malé datové sady čelíme problémům jako je přeučení a slabost dat, které způsobují nesprávnou klasifikaci na úkor málo reprezentovaných tříd. Tato práce se zabývá prozkoumáváním využití generativních sítí pro rozšíření a vyvážení datové sady o nové vygenerované obrázky. Za použití sítí typu Conditional Generative Adversarial Networks (CGAN) a heuristického generátoru anotací jsme schopni generovat velké množství nových snímků součástek s defekty. Pro experimenty s generováním byla použita datová sada závitů. Dále byly použity dvě další datové sady keramiky a snímků z MRI (BraTS). Nad těmito dvěma datovými sadami je provedeno zhodnocení vlivu generovaných dat na učení a zhodnocení přínosu pro zlepšení klasifikace a segmentace.

APA, Harvard, Vancouver, ISO, and other styles

31

Chowdhury, Muhammad Iqbal Hasan. "Question-answering on image/video content." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/205096/1/Muhammad%20Iqbal%20Hasan_Chowdhury_Thesis.pdf.

Full text

Abstract:

This thesis explores a computer's ability to understand multimodal data where the correspondence between image/video content and natural language text are utilised to answer open-ended natural language questions through question-answering tasks. Static image data consisting of both indoor and outdoor scenes, where complex textual questions are arbitrarily posed to a machine to generate correct answers, was examined. Dynamic videos consisting of both single-camera and multi-camera settings for the exploration of more challenging and unconstrained question-answering tasks were also considered. In exploring these challenges, new deep learning processes were developed to improve a computer's ability to understand and consider multimodal data.

APA, Harvard, Vancouver, ISO, and other styles

32

Wang, Qi. "Statistical Models for Human Motion Synthesis." Thesis, Ecole centrale de Marseille, 2018. http://www.theses.fr/2018ECDM0005/document.

Full text

Abstract:

Cette thèse porte sur la synthèse de séquences de motion capture avec des modèles statistiques. La synthèse de ce type de séquences est une tâche pertinente pour des domaines d'application divers tels que le divertissement, l'interaction homme-machine, la robotique, etc. Du point de vue de l'apprentissage machine, la conception de modèles de synthèse consiste à apprendre des modèles génératifs, ici pour des données séquentielles. Notre point de départ réside dans deux problèmes principaux rencontrés lors de la synthèse de données de motion capture, assurer le réalisme des positions et des mouvements, et la gestion de la grande variabilité dans ces données. La variabilité vient d'abord des caractéristiques individuelles, nous ne bougeons pas tous de la même manière mais d'une façon qui dépend de notre personnalité, de notre sexe, de notre âge de notre morphologie, et de facteurs de variation plus court terme tels que notre état émotionnel, que nous soyons fatigués, etc.Une première partie présente des travaux préliminaires que nous avons réalisés en étendant des approches de l'état de l'art basées sur des modèles de Markov cachés et des processus gaussiens pour aborder les deux problèmes principaux liés au réalisme et à la variabilité. Nous décrivons d'abord une variante de modèles de Markov cachés contextuels pour gérer la variabilité dans les données en conditionnant les paramètres des modèles à une information contextuelle supplémentaire telle que l'émotion avec laquelle un mouvement a été effectué. Nous proposons ensuite une variante d'une méthode de l'état de l'art utilisée pour réaliser une tâche de synthèse de mouvement spécifique appelée Inverse Kinematics, où nous exploitons les processus gaussiens pour encourager le réalisme de chacune des postures d'un mouvement généré. Nos résultats montrent un certain potentiel de ces modèles statistiques pour la conception de systèmes de synthèse de mouvement humain. Pourtant, aucune de ces technologies n'offre la flexibilité apportée par les réseaux de neurones et la récente révolution de l'apprentissage profond et de l'apprentissage Adversarial que nous abordons dans la deuxième partie.La deuxième partie de la thèse décrit les travaux que nous avons réalisés avec des réseaux de neurones et des architectures profondes. Nos travaux s'appuient sur la capacité des réseaux neuronaux récurrents à traiter des séquences complexes et sur l'apprentissage Adversarial qui a été introduit très récemment dans la communauté du Deep Learning pour la conception de modèles génératifs performants pour des données complexes, notamment images. Nous proposons une première architecture simple qui combine l'apprentissage Adversarial et des autoencodeurs de séquences, qui permet de mettre au point des systèmes performants de génération aléatoire de séquences réalistes de motion capture. A partir de cette architecture de base, nous proposons plusieurs variantes d'architectures neurales conditionnelles qui permettent de concevoir des systèmes de synthèse que l'on peut contrôler dans une certaine mesure en fournissant une information de haut niveau à laquelle la séquence générée doit correspondre, par exemple l'émotion avec laquelle une activité est réalisée. Pour terminer nous décrivons une dernière variante qui permet de réaliser de l'édition de séquences de motion capture, où le système construit permet de générer une séquence dans le style d'une autre séquence, réelle<br>This thesis focuses on the synthesis of motion capture data with statistical models. Motion synthesis is a task of interest for important application fields such as entertainment, human-computer interaction, robotics, etc. It may be used to drive a virtual character that can be involved in the applications of the virtual reality, animation films or computer games. This thesis focuses on the use of statistical models for motion synthesis with a strong focus on neural networks. From the machine learning point of view designing synthesis models consists in learning generative models. Our starting point lies in two main problems one encounters when dealing with motion capture data synthesis, ensuring realism of postures and motion, and handling the large variability in the synthesized motion. The variability in the data comes first from core individual features, we do not all move the same way but accordingly to our personality, our gender, age, and morphology etc. Moreover there are other short term factors of variation like our emotion, the fact that we are interacting with somebody else, that we are tired etc. Data driven models have been studied for generating human motion for many years. Models are learned from labelled datasets where motion capture data are recorded while actors are performed various activities like walking, dancing, running, etc. Traditional statistical models such as Hidden Markov Models, Gaussian Processes have been investigated for motion synthesis, demonstrating strengths but also weaknesses. Our work focuses in this line of research and concerns the design of generative models for sequences able to take into account some contextual information, which will represent the factors of variation. A first part of the thesis present preliminary works that we realised by extending previous approaches relying on Hidden Markov Models and Gaussian Processes to tackle the two main problems related to realism and variability. We first describe an attempt to extend contextual Hidden Markov Models for handling variability in the data by conditioning the parameters of the models to an additional contextual information such as the emotion which which a motion was performed. We then propose a variant of a traditional method for performing a specific motion synthesis task called Inverse Kinematics, where we exploit Gaussian Processes to enforce realism of each of the postures of a generated motion. These preliminary results show some potential of statistical models for designing human motion synthesis systems. Yet none of these technologies offers the flexibility brought by neural networks and the recent deep learning revolution.The second part of the thesis describes the works we realized with neural networks and deep architectures. It builds on recurrent neural networks for dealing with sequences and on adversarial learning which was introduced very recently in the deep learning community for designing accurate generative models for complex data. We propose a simple system as a basis synthesis architecture, which combines adversarial learning with sequence autoencoders, and that allows randomly generating realistic motion capture sequences. Starting from this architecture we design few conditional neural models that allow to design synthesis systems that one can control up to some extent by either providing a high level information that the generated sequence should match (e.g. the emotion) or by providing a sequence in the style of which a sequence should be generated

APA, Harvard, Vancouver, ISO, and other styles

33

Yedroudj, Mehdi. "Steganalysis and steganography by deep learning." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS095.

Full text

Abstract:

La stéganographie d'image est l'art de la communication secrète dans le but d'échanger un message de manière furtive. La stéganalyse d'image a elle pour objectif de détecter la présence d'un message caché en recherchant les artefacts présent dans l'image. Pendant une dizaine d'années, l'approche classique en stéganalyse a été d'utiliser un ensemble classifieur alimenté par des caractéristiques extraites "à la main". Au cours des dernières années, plusieurs études ont montré que les réseaux de neurones convolutionnels peuvent atteindre des performances supérieures à celles des approches conventionnelles d'apprentissage machine.Le sujet de cette thèse traite des techniques d'apprentissage profond utilisées pour la stéganographie d'images et la stéganalyse dans le domaine spatial.La première contribution est un réseau de neurones convolutionnel rapide et efficace pour la stéganalyse, nommé Yedroudj-Net. Comparé aux méthodes modernes de steganalyse basées sur l'apprentissage profond, Yedroudj-Net permet d'obtenir des résultats de détection performants, mais prend également moins de temps à converger, ce qui permet l'utilisation des bases d'apprentissage de grandes dimensions. De plus, Yedroudj-Net peut facilement être amélioré en ajoutant des compléments ou des modules bien connus. Parmi les amélioration possibles, nous avons évalué l'augmentation de la base de données d'entraînement, et l'utilisation d'un ensemble de CNN. Les deux modules complémentaires permettent d'améliorer les performances de notre réseau.La deuxième contribution est l'application des techniques d'apprentissage profond à des fins stéganographiques i.e pour l'insertion. Parmi les techniques existantes, nous nous concentrons sur l'approche du "jeu-à-3-joueurs". Nous proposons un algorithme d'insertion qui apprend automatiquement à insérer un message secrètement. Le système de stéganographie que nous proposons est basé sur l'utilisation de réseaux adverses génératifs. L'entraînement de ce système stéganographique se fait à l'aide de trois réseaux de neurones qui se font concurrence : le stéganographeur, l'extracteur et le stéganalyseur. Pour le stéganalyseur nous utilisons Yedroudj-Net, pour sa petite taille, et le faite que son entraînement ne nécessite pas l'utilisation d'astuces qui pourrait augmenter le temps de calcul.Cette deuxième contribution donne des premiers éléments de réflexion tout en donnant des résultats prometteurs, et pose ainsi les bases pour de futurs recherches<br>Image steganography is the art of secret communication in order to exchange a secret message. In the other hand, image steganalysis attempts to detect the presence of a hidden message by searching artefacts within an image. For about ten years, the classic approach for steganalysis was to use an Ensemble Classifier fed by hand-crafted features. In recent years, studies have shown that well-designed convolutional neural networks (CNNs) can achieve superior performance compared to conventional machine-learning approaches.The subject of this thesis deals with the use of deep learning techniques for image steganography and steganalysis in the spatialdomain.The first contribution is a fast and very effective convolutional neural network for steganalysis, named Yedroudj-Net. Compared tomodern deep learning based steganalysis methods, Yedroudj-Net can achieve state-of-the-art detection results, but also takes less time to converge, allowing the use of a large training set. Moreover,Yedroudj-Net can easily be improved by using well known add-ons. Among these add-ons, we have evaluated the data augmentation, and the the use of an ensemble of CNN; Both increase our CNN performances.The second contribution is the application of deep learning techniques for steganography i.e the embedding. Among the existing techniques, we focus on the 3-player game approach.We propose an embedding algorithm that automatically learns how to hide a message secretly. Our proposed steganography system is based on the use of generative adversarial networks. The training of this steganographic system is conducted using three neural networks that compete against each other: the embedder, the extractor, and the steganalyzer. For the steganalyzer we use Yedroudj-Net, this for its affordable size, and for the fact that its training does not require the use of any tricks that could increase the computational time.This second contribution defines a research direction, by giving first reflection elements while giving promising first results

APA, Harvard, Vancouver, ISO, and other styles

34

Hassini, Houda. "Automatic analysis of blood smears images : contribution of phase modality in Fourier Ptychographic Microscopy." Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAS014.

Full text

Abstract:

La pathologie numérique constitue aujourd'hui un outil fondamental pour le diagnostic médical, exploitant les avancées technologiques en matière de numérisation pour transformer les échantillons biologiques en données numériques, facilitant ainsi leur visualisation et leur analyse. Cependant, ces méthodes, souvent basées sur la microscopie conventionnelle, rencontrent des limitations qui entravent parfois leur efficacité. Dans ce contexte, des méthodes d'imagerie non conventionnelles telles que la microscopie ptychographique de Fourier (FPM) offrent des perspectives prometteuses pour surmonter ces limitations. En effet, la FPM offre un accès à la phase en complément de l'intensité et permet d'examiner un large champ de vision à haute résolution à un coût de conception raisonnable. Cette thèse explore le potentiel de la microscopie ptychographique de Fourier dans l'analyse des frottis sanguins minces. Plusieurs résultats ont été obtenus grâce à une approche multidisciplinaire intégrant l'apprentissage en profondeur et la microscopie. Nous nous concentrons d'abord sur le problème limité de la détection des parasites pour le diagnostic du paludisme. L'exploitation conjointe de l'intensité et de la phase permet d'améliorer les performances d'un détecteur de réseau neuronal profond. À cette fin, un CNN à valeurs complexes a été introduit dans l'architecture Faster-RCNN pour une extraction efficace des caractéristiques. Ensuite, nous examinons une application plus complexe, à savoir la classification des globules blancs, où les avantages de l'exploitation conjointe de l'intensité et de la phase ont également été confirmés. Nous nous intéressons également au problème du déséquilibre des classes rencontré dans cette tâche, nous proposons un nouveau modèle GAN informé par la physique dédié à la génération d'images d'intensité et de phase. Ce modèle évite le problème de mode collapse rencontré avec l'implémentation habituelle des GAN. Enfin, nous considérons l'optimisation de la conception du microscope FPM. À cette fin, nous explorons des stratégies combinant simulations, réseaux neuronaux et modélisation de la formation d'images. Nous démontrons que la FPM peut utiliser des résolutions faibles sans compromettre significativement les performances. Cette thèse souligne l'intérêt d'adapter l'apprentissage automatique en lien avec les principes de la microscopie et met en évidence le potentiel de la microscopie ptychographique de Fourier pour les futurs systèmes de diagnostic automatisés<br>Digital pathology presents today a fundamental tool for medical diagnosis, exploiting technological advances in digitalization to transform biological samples into digital data, thus facilitating their visualization and analysis. However, these methods, often based on conventional microscopy, encounter limitations that sometimes hinder their effectiveness.From this perspective, unconventional imaging methods such as Fourier ptychographic microscopy offer promising prospects for overcoming these limitations. Indeed, FPM offers access to the phase in complement of the intensity and allows examining a large Field of View at a high resolution at a reasonable design cost.This thesis explores Fourier ptychographic microscopy (FPM) 's potential in thin blood smear analysis. Several results have been obtained thanks to a multidisciplinary approach integrating deep learning and microscopy.We have first focused our attention on the problem of limited complexity of parasite detection for malaria diagnosis. The joint exploitation of intensity and phase is shown to improve the performance of a deep network detector. To this end, a complex-valued CNN has been introduced in Faster-RCNN architecture for efficient feature extraction.Secondly, we have considered a more complex application, namely the classification of white blood cells, where the benefits of joint exploitation of intensity and phase were also confirmed. Furthermore, to reduce the imbalance of classes encountered in this task, we propose a novel physics-informed GAN model dedicated to generating intensity and phase images. This model avoids the mode collapse problem faced with usual GAN implementation.Finally, we have considered optimizing the FPM microscope design. To this end, we explore strategies combining simulations, neural networks, and image formation modeling. We demonstrate that FPM can use low resolutions without significantly compromising performance.This thesis underscores the interest in tailoring machine learning in connection to microscopy principles and highlights the potential of Fourier ptychographic microscopy for future automated diagnosis systems

APA, Harvard, Vancouver, ISO, and other styles

35

Bak, Adam. "Simulace projevu kožního onemocnění s využitím GAN." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2021. http://www.nusl.cz/ntk/nusl-445569.

Full text

Abstract:

Cieľom tejto diplomovej práce je vygenerovanie datasetu syntetických snímkov odtlačkov prstov, ktoré vykazujú známky kožných ochorení. Práca sa zaoberá poškodením spôsobeným kožnými ochoreniami v odtlačkoch prstov a generovaním syntetických odtlačkov prstov. Odtlačky prstov s prejavom kožných ochorení boli generované s využitím modelu založeného na Wasserstein GAN s penalizáciou gradientu. Na trénovanie GAN modelu bola použitá unikátna databáza odtlačkov prstov s prejavom kožných ochorení vytvorená na FIT VUT. Daný model bol trénovaný na troch typoch kožných ochorení: atopický ekzém, psoriáza a dyshidrotický ekzém. Sieť generátoru z natrénovaného WGAN-GP modelu bola použitá na vygenerovanie datasetov syntetických odtlačkov prstov. Tieto syntetické odtlačky boli porovnané s reálnymi odtlačkami s využitím NFIQ a FiQiVi nástrojov na určenie kvality spoločne s porovnaním rozložení lokácií a orientácii markantov v snímkoch odtlačkov prstov.

APA, Harvard, Vancouver, ISO, and other styles

36

MAGGIOLO, LUCA. "Deep Learning and Advanced Statistical Methods for Domain Adaptation and Classiﬁcation of Remote Sensing Images". Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1070050.

Full text

Abstract:

In the recent years, remote sensing has faced a huge evolution. The constantly growing availability of remote sensing data has opened up new opportunities and laid the foundations for many new challenges. The continuous space missions and new constellations of satellites allow in fact more and more frequent acquisitions, at increasingly higher spatial resolutions, and at an almost total coverage of the globe. The availability of such an huge amount data has highlighted the need for automatic techniques capable of processing the data and exploiting all the available information. Meanwhile, the almost unlimited potential of machine learning has changed the world we live in. Artificial neural Networks have break trough everyday life, with applications that include computer vision, speech processing, autonomous driving but which are also the basis of commonly used tools such as online search engines. However, the vast majority of such models are of the supervised type and therefore their applicability rely on the availability of an enormous quantity of labeled data available to train the models themselves. Unfortunately, this is not the case with remote sensing, in which the enormous amounts of data are opposed to the almost total absence of ground truth. The purpose of this thesis is to find the way to exploit the most recent deep learning techniques, defining a common thread between two worlds, those of remote sensing and deep learning, which is often missing. In particular, this thesis proposes three novel contributions which face current issues in remote sensing. The first one is related to multisensor image registration and combines generative adversarial networks and non-linear optimization of crosscorrelation-like functionals to deal with the complexity of the setting. The proposed method was proved able to outperform state of the art approaches. The second novel contribution faces one of the main issues in deep learning for remote sensing: the scarcity of ground truth data for semantic segmentation. The proposed solution combines convolutional neural networks and probabilistic graphical models, two very active areas in machine learning for remote sensing, and approximate a fully connected conditional random field. The proposed method is capable of filling part of the gap which separate a densely trained model from a weakly trained one. Then, the third approach is aimed at the classification of high resolution satellite images for climate change purposes. It consist of a specific formulation of an energy minimization which allows to fuse multisensor information and the application a markov random field in a fast and efficient way for global scale applications. The results obtained in this thesis shows how deep learning methods based on artificial neural networks can be combined with statistical analysis to overcome their limitations, going beyond the classic benchmark environments and addressing practical, real and large-scale application cases.

APA, Harvard, Vancouver, ISO, and other styles

37

Wei, Wen. "Apprentissage automatique des altérations cérébrales causées par la sclérose en plaques en neuro-imagerie multimodale." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4021.

Full text

Abstract:

La sclérose en plaques (SEP) est la maladie neurologique évolutive la plus courante chez les jeunes adultes dans le monde et représente donc un problème de santé publique majeur avec environ 90 000 patients en France et plus de 500 000 personnes atteintes de SEP en Europe. Afin d'optimiser les traitements, il est essentiel de pouvoir mesurer et suivre les altérations cérébrales chez les patients atteints de SEP. En fait, la SEP est une maladie aux multiples facettes qui implique différents types d'altérations, telles que les dommages et la réparation de la myéline. Selon cette observation, la neuroimagerie multimodale est nécessaire pour caractériser pleinement la maladie. L'imagerie par résonance magnétique (IRM) est devenue un biomarqueur d'imagerie fondamental pour la sclérose en plaques en raison de sa haute sensibilité à révéler des anomalies tissulaires macroscopiques chez les patients atteints de SEP. L'IRM conventionnelle fournit un moyen direct de détecter les lésions de SEP et leurs changements, et joue un rôle dominant dans les critères diagnostiques de la SEP. De plus, l'imagerie par tomographie par émission de positons (TEP), une autre modalité d'imagerie, peut fournir des informations fonctionnelles et détecter les changements tissulaires cibles au niveau cellulaire et moléculaire en utilisant divers radiotraceurs. Par exemple, en utilisant le radiotraceur [11C]PIB, la TEP permet une mesure pathologique directe de l'altération de la myéline. Cependant, en milieu clinique, toutes les modalités ne sont pas disponibles pour diverses raisons. Dans cette thèse, nous nous concentrons donc sur l'apprentissage et la prédiction des altérations cérébrales dérivées des modalités manquantes dans la SEP à partir de données de neuroimagerie multimodale<br>Multiple Sclerosis (MS) is the most common progressive neurological disease of young adults worldwide and thus represents a major public health issue with about 90,000 patients in France and more than 500,000 people affected with MS in Europe. In order to optimize treatments, it is essential to be able to measure and track brain alterations in MS patients. In fact, MS is a multi-faceted disease which involves different types of alterations, such as myelin damage and repair. Under this observation, multimodal neuroimaging are needed to fully characterize the disease. Magnetic resonance imaging (MRI) has emerged as a fundamental imaging biomarker for multiple sclerosis because of its high sensitivity to reveal macroscopic tissue abnormalities in patients with MS. Conventional MR scanning provides a direct way to detect MS lesions and their changes, and plays a dominant role in the diagnostic criteria of MS. Moreover, positron emission tomography (PET) imaging, an alternative imaging modality, can provide functional information and detect target tissue changes at the cellular and molecular level by using various radiotracers. For example, by using the radiotracer [11C]PIB, PET allows a direct pathological measure of myelin alteration. However, in clinical settings, not all the modalities are available because of various reasons. In this thesis, we therefore focus on learning and predicting missing-modality-derived brain alterations in MS from multimodal neuroimaging data

APA, Harvard, Vancouver, ISO, and other styles

38

Hameed, Khurram. "Computer vision based classification of fruits and vegetables for self-checkout at supermarkets." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2022. https://ro.ecu.edu.au/theses/2519.

Full text

Abstract:

The field of machine learning, and, in particular, methods to improve the capability of machines to perform a wider variety of generalised tasks are among the most rapidly growing research areas in today’s world. The current applications of machine learning and artificial intelligence can be divided into many significant fields namely computer vision, data sciences, real time analytics and Natural Language Processing (NLP). All these applications are being used to help computer based systems to operate more usefully in everyday contexts. Computer vision research is currently active in a wide range of areas such as the development of autonomous vehicles, object recognition, Content Based Image Retrieval (CBIR), image segmentation and terrestrial analysis from space (i.e. crop estimation). Despite significant prior research, the area of object recognition still has many topics to be explored. This PhD thesis focuses on using advanced machine learning approaches to enable the automated recognition of fresh produce (i.e. fruits and vegetables) at supermarket self-checkouts. This type of complex classification task is one of the most recently emerging applications of advanced computer vision approaches and is a productive research topic in this field due to the limited means of representing the features and machine learning techniques for classification. Fruits and vegetables offer significant inter and intra class variance in weight, shape, size, colour and texture which makes the classification challenging. The applications of effective fruit and vegetable classification have significant importance in daily life e.g. crop estimation, fruit classification, robotic harvesting, fruit quality assessment, etc. One potential application for this fruit and vegetable classification capability is for supermarket self-checkouts. Increasingly, supermarkets are introducing self-checkouts in stores to make the checkout process easier and faster. However, there are a number of challenges with this as all goods cannot readily be sold with packaging and barcodes, for instance loose fresh items (e.g. fruits and vegetables). Adding barcodes to these types of items individually is impractical and pre-packaging limits the freedom of choice when selecting fruits and vegetables and creates additional waste, hence reducing customer satisfaction. The current situation, which relies on customers correctly identifying produce themselves leaves open the potential for incorrect billing either due to inadvertent error, or due to intentional fraudulent misclassification resulting in financial losses for the store. To address this identified problem, the main goals of this PhD work are: (a) exploring the types of visual and non-visual sensors that could be incorporated into a self-checkout system for classification of fruits and vegetables, (b) determining a suitable feature representation method for fresh produce items available at supermarkets, (c) identifying optimal machine learning techniques for classification within this context and (d) evaluating our work relative to the state-of-the-art object classification results presented in the literature. An in-depth analysis of related computer vision literature and techniques is performed to identify and implement the possible solutions. A progressive process distribution approach is used for this project where the task of computer vision based fruit and vegetables classification is divided into pre-processing and classification techniques. Different classification techniques have been implemented and evaluated as possible solution for this problem. Both visual and non-visual features of fruit and vegetables are exploited to perform the classification. Novel classification techniques have been carefully developed to deal with the complex and highly variant physical features of fruit and vegetables while taking advantages of both visual and non-visual features. The capability of classification techniques is tested in individual and ensemble manner to achieved the higher effectiveness. Significant results have been obtained where it can be concluded that the fruit and vegetables classification is complex task with many challenges involved. It is also observed that a larger dataset can better comprehend the complex variant features of fruit and vegetables. Complex multidimensional features can be extracted from the larger datasets to generalise on higher number of classes. However, development of a larger multiclass dataset is an expensive and time consuming process. The effectiveness of classification techniques can be significantly improved by subtracting the background occlusions and complexities. It is also worth mentioning that ensemble of simple and less complicated classification techniques can achieve effective results even if applied to less number of features for smaller number of classes. The combination of visual and nonvisual features can reduce the struggle of a classification technique to deal with higher number of classes with similar physical features. Classification of fruit and vegetables with similar physical features (i.e. colour and texture) needs careful estimation and hyper-dimensional embedding of visual features. Implementing rigorous classification penalties as loss function can achieve this goal at the cost of time and computational requirements. There is a significant need to develop larger datasets for different fruit and vegetables related computer vision applications. Considering more sophisticated loss function penalties and discriminative hyper-dimensional features embedding techniques can significantly improve the effectiveness of the classification techniques for the fruit and vegetables applications.

APA, Harvard, Vancouver, ISO, and other styles

39

Espindola, Tatiane Sander. "Generative Adversarial Networks applied to Telecom Data - Using GANs to generate synthetic features regarding Wi-Fi signal quality." Master's thesis, 2021. http://hdl.handle.net/10362/119708.

Full text

Abstract:

Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics<br>Wireless networks are, currently, one of the main technologies used to connect people. Considering the constant advancements in the field, the telecom operators must guarantee a high-quality service to keep their customer portfolio. To ensure this high-quality service, it is common the establishment of partnerships with specialized technology companies that deliver software services to monitor the networks and identify faults and respective solutions. Although, a common barrier faced for these specialized companies is the lack of data to develop and test their products. This project’s purpose was to better understand Generative Adversarial Networks (GANs), an algorithm considered state-of-theart between the generative models, and test its usage to generate synthetic telecommunication data that can fill this gap. To do that, it was developed, trained and compared two of the most used GAN’s architectures, the Vanilla GAN and the WGAN. Both the models presented good results and was able to simulate datasets very similar to the real ones. The WGAN was chosen as the final model, but just for presenting a slightly and subjective better result on the descriptive analysis. In fact, the two models had very similar outputs and both can be used.

APA, Harvard, Vancouver, ISO, and other styles

40

(8892395), Yao Chen. "Inferential GANs and Deep Feature Selection with Applications." Thesis, 2020.

Find full text

Abstract:

Deep nueral networks (DNNs) have become popular due to their predictive power and flexibility in model fitting. In unsupervised learning, variational autoencoders (VAEs) and generative adverarial networks (GANs) are two most popular and successful generative models. How to provide a unifying framework combining the best of VAEs and GANs in a principled way is a challenging task. In supervised learning, the demand for high-dimensional data analysis has grown significantly, especially in the applications of social networking, bioinformatics, and neuroscience. How to simultaneously approximate the true underlying nonlinear system and identify relevant features based on high-dimensional data (typically with the sample size smaller than the dimension, a.k.a. small-n-large-p) is another challenging task.<div><br></div><div>In this dissertation, we have provided satisfactory answers for these two challenges. In addition, we have illustrated some promising applications using modern machine learning methods.<br></div><div><br></div><div>In the first chapter, we introduce a novel inferential Wasserstein GAN (iWGAN) model, which is a principled framework to fuse auto-encoders and WGANs. GANs have been impactful on many problems and applications but suffer from unstable training. The Wasserstein GAN (WGAN) leverages the Wasserstein distance to avoid the caveats in the minmax two-player training of GANs but has other defects such as mode collapse and lack of metric to detect the convergence. The iWGAN model jointly learns an encoder network and a generator network motivated by the iterative primal dual optimization process. The encoder network maps the observed samples to the latent space and the generator network maps the samples from the latent space to the data space. We establish the generalization error bound of iWGANs to theoretically justify the performance of iWGANs. We further provide a rigorous probabilistic interpretation of our model under the framework of maximum likelihood estimation. The iWGAN, with a clear stopping criteria, has many advantages over other autoencoder GANs. The empirical experiments show that the iWGAN greatly mitigates the symptom of mode collapse, speeds up the convergence, and is able to provide a measurement of quality check for each individual sample. We illustrate the ability of iWGANs by obtaining a competitive and stable performance with state-of-the-art for benchmark datasets. <br></div><div><br></div><div>In the second chapter, we present a general framework for high-dimensional nonlinear variable selection using deep neural networks under the framework of supervised learning. The network architecture includes both a selection layer and approximation layers. The problem can be cast as a sparsity-constrained optimization with a sparse parameter in the selection layer and other parameters in the approximation layers. This problem is challenging due to the sparse constraint and the nonconvex optimization. We propose a novel algorithm, called Deep Feature Selection, to estimate both the sparse parameter and the other parameters. Theoretically, we establish the algorithm convergence and the selection consistency when the objective function has a Generalized Stable Restricted Hessian. This result provides theoretical justifications of our method and generalizes known results for high-dimensional linear variable selection. Simulations and real data analysis are conducted to demonstrate the superior performance of our method.<br></div><div><br></div><div><div>In the third chapter, we develop a novel methodology to classify the electrocardiograms (ECGs) to normal, atrial fibrillation and other cardiac dysrhythmias as defined by the Physionet Challenge 2017. More specifically, we use piecewise linear splines for the feature selection and a gradient boosting algorithm for the classifier. In the algorithm, the ECG waveform is fitted by a piecewise linear spline, and morphological features related to the piecewise linear spline coefficients are extracted. XGBoost is used to classify the morphological coefficients and heart rate variability features. The performance of the algorithm was evaluated by the PhysioNet Challenge database (3658 ECGs classified by experts). Our algorithm achieves an average F1 score of 81% for a 10-fold cross validation and also achieved 81% for F1 score on the independent testing set. This score is similar to the top 9th score (81%) in the official phase of the Physionet Challenge 2017.</div></div><div><br></div><div>In the fourth chapter, we introduce a novel region-selection penalty in the framework of image-on-scalar regression to impose sparsity of pixel values and extract active regions simultaneously. This method helps identify regions of interest (ROI) associated with certain disease, which has a great impact on public health. Our penalty combines the Smoothly Clipped Absolute Deviation (SCAD) regularization, enforcing sparsity, and the SCAD of total variation (TV) regularization, enforcing spatial contiguity, into one group, which segments contiguous spatial regions against zero-valued background. Efficient algorithm is based on the alternative direction method of multipliers (ADMM) which decomposes the non-convex problem into two iterative optimization problems with explicit solutions. Another virtue of the proposed method is that a divide and conquer learning algorithm is developed, thereby allowing scaling to large images. Several examples are presented and the experimental results are compared with other state-of-the-art approaches. <br></div>

APA, Harvard, Vancouver, ISO, and other styles

41

Santos, Beatriz de Jesus Pereira. "Drug Discovery with Generative Adversarial Networks." Master's thesis, 2021. http://hdl.handle.net/10316/96096.

Full text

Abstract:

Dissertação de Mestrado Integrado em Engenharia Biomédica apresentada à Faculdade de Ciências e Tecnologia<br>A descoberta de novos fármacos é um processo extremamente demorado, complexo, dispendioso e que apresenta taxas de sucesso muito baixas que podem ser atribuídas à elevada dimensionalidade do espaço químico. Estudar e avaliar o espaço químico de forma integral é simplesmente imprativável pelo que é importante encontrar novas formas de restringir o espaço de pesquisa. A utilização de algoritmos de Deep Learning tem surgido como uma possível solução para mitigar os problemas acima mencionados já que diminuem consideravelmente o tempo dispendido e, por conseguinte, as despesas associadas a todo o processo. As redes neuronais recorrentes (RNNs) e adversariais generativas (GANs) encontram-se entre os métodos mais promissores no que se refere à geração de novos potenciais fármacos.O trabalho desenvolvido deu origem a duas contribuições independentes. Foi efetuado um estudo extensivo das arquiteturas e parâmetros associados às redes recorrentes do qual resultou um modelo otimizado capaz de gerar até 98.7% de moléculas válidas mantendo elevados níveis de diversidade.Este estudo permitiu ainda demonstrar que a informação estereoquímica, que é de extrema importância no desenvolvimento de fármacos mas frequentemente ignorada, pode ser incluída nestes modelos computacionais com elevado sucesso.Para além disso, foi desenvolvida uma estratégia baseada em GANs que inclui uma componente de otimização. Este método é composto por duas técnicas de Deep Learning: um modelo Encoder-Decoder responsável por converter as moléculas em vetores do espaço latente, criando, desta forma, um novo tipo de representação molecular; e uma GAN com a capacidade de aprender e replicar a distribuição dos dados de treino para, posteriormente, gerar novos compostos. De modo a gerar moléculas otimizadas para uma determinada característica, a GAN treinada é conectada a um mecanismo de feedback que avalia as moléculas geradas a cada época e substitui os compostos do conjunto de treino que apresentam menor pontuação pelas novas moléculas com propriedades mais desejáveis. Desta forma, a distribuição dos compostos gerados vai-se aproximando sucessivamente do espaço químico de interesse, o que resulta na geração de um maior número de moléculas relevantes para o problema em estudo.<br>Drug discovery is a highly time-consuming, complex, and expensive process with low rates of success that can be mainly attributed to the high dimensionality of the chemical space. Evaluating the entire chemical space is prohibitively expensive, so it is of the utmost importance to find ways of narrowing down the search space. Deep Learning algorithms are emerging as a potential method to generate novel chemical structures since they can speed up the traditional process and decrease expenditure.Recurrent Neural networks (RNNs) and Generative Adversarial Networks (GANs) are two of the most promising methods for generating drug-like molecules from scratch.The proposed work resulted in two independent contributions. A comprehensive study on RNNs' architectures and parameters that resulted in an optimized model capable of generating up to 98.7% of valid non-specific drug-like molecules while maintaining high levels of diversity. This work also proved that stereo-chemical information, often overlooked in most works, can be successfully incorporated and learned by these models.Furthermore, a novel GAN-based framework that includes an optimization stage was developed. This approach incorporates two deep learning techniques: an Encoder-Decoder model that converts the string notations of molecules into latent space vectors, effectively creating a new type of molecular representation, and a GAN that is able to learn and replicate the training data distribution and, therefore, generate new compounds. In order to generate compounds with bespoken properties and once the GAN is replicating the chemical space, a feedback loop is incorporated that evaluates the generated molecules according to the desired property at every epoch of training and replaces the worst scoring entries in the training data by the best scoring generated molecules. This ensures a slow but steady shift of the generated distribution towards the space of the targeted property resulting in the generation of molecules that exhibit the desired characteristics.<br>Outro - This research has been funded by the Portuguese Research Agency FCT, through D4 - Deep Drug Discovery and Deployment (CENTRO-01-0145-FEDER029266). This work is funded by national funds through the FCT - Foundation for Science and Technology, I.P., within the scope of the project CISUC - UID/CEC/00326/2020 and by European Social Fund, through the Regional Operational Program Centro 2020.

APA, Harvard, Vancouver, ISO, and other styles

42

Lai, Yu-Ting, and 賴昱廷. "Industrial Anomaly Inspection based on Neural Networks and Generative Adversarial Networks." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/gx8ef6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Tammaro, Umberto. "GAN Hyperparameters search through Genetic Algorithm." Master's thesis, 2022. http://hdl.handle.net/10362/135552.

Full text

Abstract:

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science<br>Recent developments in Deep Learning are remarkable when it comes to generative models. The main reason for such progress is because of Generative Adversarial Networks (GANs) [1]. Introduced in a paper by Ian Goodfellow in 2014 GANs are machine learning models that are made of two neural networks: a Generator and a Discriminator. These two compete amongst each other to generate new, synthetic instances of data that resemble the real one. Despite their great potential, there are present challenges in their training, which include training instability, mode collapse, and vanishing gradient. A lot of research has been done on how to overcome these challenges, however, there was no significant proof found on whether modern techniques consistently outperform vanilla GAN. The performances of GANs are also highly dependent on the dataset they are trained on. One of the main challenges is related to the search for hyperparameters. In this thesis, we try to overcome this challenge by applying an evolutionary algorithm to search for the best hyperparameters for a WGAN. We use Kullback-Leibler divergence to calculate the fitness of the individuals, and in the end, we select the best set of parameters generated by the evolutionary algorithm. The parameters of the best-selected individuals are maintained throughout the generations. We compare our approach with the standard hyperparameters given by the state-of-art.

APA, Harvard, Vancouver, ISO, and other styles

44

Brito, João Pedro da Cruz. "Deep Adversarial Frameworks for Visually Explainable Periocular Recognition." Master's thesis, 2021. http://hdl.handle.net/10400.6/11850.

Full text

Abstract:

Machine Learning (ML) models have pushed stateoftheart performance closer to (and even beyond) human level. However, the core of such algorithms is usually latent and hardly understandable. Thus, the field of Explainability focuses on researching and adopting techniques that can explain the reasons that support a model’s predictions. Such explanations of the decisionmaking process would help to build trust between said model and the human(s) using it. An explainable system also allows for better debugging, during the training phase, and fixing, upon deployment. But why should a developer devote time and effort into refactoring or rethinking Artificial Intelligence (AI) systems, to make them more transparent? Don’t they work just fine? Despite the temptation to answer ”yes”, are we really considering the cases where these systems fail? Are we assuming that ”almost perfect” accuracy is good enough? What if, some of the cases where these systems get it right, were just a small margin away from a complete miss? Does that even matter? Considering the evergrowing presence of ML models in crucial areas like forensics, security and healthcare services, it clearly does. Motivating these concerns is the fact that powerful systems often operate as blackboxes, hiding the core reasoning underneath layers of abstraction [Gue]. In this scenario, there could be some seriously negative outcomes if opaque algorithms gamble on the presence of tumours in Xray images or the way autonomous vehicles behave in traffic. It becomes clear, then, that incorporating explainability with AI is imperative. More recently, the politicians have addressed this urgency through the General Data Protection Regulation (GDPR) [Com18]. With this document, the European Union (EU) brings forward several important concepts, amongst which, the ”right to an explanation”. The definition and scope are still subject to debate [MF17], but these are definite strides to formally regulate the explainable depth of autonomous systems. Based on the preface above, this work describes a periocular recognition framework that not only performs biometric recognition but also provides clear representations of the features/regions that support a prediction. Being particularly designed to explain nonmatch (”impostors”) decisions, our solution uses adversarial generative techniques to synthesise a large set of ”genuine” image pairs, from where the most similar elements with respect to a query are retrieved. Then, assuming the alignment between the query/retrieved pairs, the elementwise differences between the query and a weighted average of the retrieved elements yields a visual explanation of the regions in the query pair that would have to be different to transform it into a ”genuine” pair. Our quantitative and qualitative experiments validate the proposed solution, yielding recognition rates that are similar to the stateoftheart, while adding visually pleasing explanations.

APA, Harvard, Vancouver, ISO, and other styles

45

Parracho, João Oliveira. "JOINT CODING OF MULTIMODAL BIOMEDICAL IMAGES US ING CONVOLUTIONAL NEURAL NETWORKS." Master's thesis, 2020. http://hdl.handle.net/10400.8/6682.

Full text

Abstract:

The massive volume of data generated daily by the gathering of medical images with different modalities might be difficult to store in medical facilities and share through communication networks. To alleviate this issue, efficient compression methods must be implemented to reduce the amount of storage and transmission resources required in such applications. However, since the preservation of all image details is highly important in the medical context, the use of lossless image compression algorithms is of utmost importance. This thesis presents the research results on a lossless compression scheme designed to encode both computerized tomography (CT) and positron emission tomography (PET). Different techniques, such as image-to-image translation, intra prediction, and inter prediction are used. Redundancies between both image modalities are also investigated. To perform the image-to-image translation approach, we resort to lossless compression of the original CT data and apply a cross-modality image translation generative adversarial network to obtain an estimation of the corresponding PET. Two approaches were implemented and evaluated to determine a PET residue that will be compressed along with the original CT. In the first method, the residue resulting from the differences between the original PET and its estimation is encoded, whereas in the second method, the residue is obtained using encoders inter-prediction coding tools. Thus, in alternative to compressing two independent picture modalities, i.e., both images of the original PET-CT pair solely the CT is independently encoded alongside with the PET residue, in the proposed method. Along with the proposed pipeline, a post-processing optimization algorithm that modifies the estimated PET image by altering the contrast and rescaling the image is implemented to maximize the compression efficiency. Four different versions (subsets) of a publicly available PET-CT pair dataset were tested. The first proposed subset was used to demonstrate that the concept developed in this work is capable of surpassing the traditional compression schemes. The obtained results showed gains of up to 8.9% using the HEVC. On the other side, JPEG2k proved not to be the most suitable as it failed to obtain good results, having reached only -9.1% compression gain. For the remaining (more challenging) subsets, the results reveal that the proposed refined post-processing scheme attains, when compared to conventional compression methods, up 6.33% compression gain using HEVC, and 7.78% using VVC.

APA, Harvard, Vancouver, ISO, and other styles

46

Mopuri, Konda Reddy. "Deep Visual Representations: A study on Augmentation, Visualization, and Robustness." Thesis, 2018. https://etd.iisc.ac.in/handle/2005/5446.

Full text

Abstract:

Deep neural networks have resulted in unprecedented performances for various learning tasks. Particularly, Convolutional Neural Networks (CNNs) are shown to learn representations that can efficiently discriminate hundreds of visual categories. They learn a hierarchy of representations ranging from low level edge and blob detectors to semantic features such as object categories. These representations can be employed as off-the-shelf visual features in various vision tasks such as image classification, scene retrieval, caption generation, etc.In this thesis, we investigate three important aspects of the representations learned by the CNNs: (i) Augmentation: incorporating useful side and additional information to augment the learned visual representations, (ii) Visualization: providing visual explanations for the predicted inference, and (iii) Robustness: their susceptibility to adversarial perturbations during the test time. Augmenting: In the first part of this thesis, we present approaches that exploit the useful side and additional information to enrich the learned representations with more semantics. Specifically, we learn to encode additional discriminative information from (i) objectness prior over the image regions, and (ii) strong supervision offered by the captions given by human subjects that describe the image contents. Objectness prior: In order to encode comprehensive visual information from a scene, existing methods typically employ deep-learned visual representations in a sliding window framework. This approach is tedious which demands more computation, and is exhaustive. On the other hand, scenes are typically composed of objects, i.e., it is the objects that make a scene what it is. We exploit objectness information while aggregating the visual features from individual image regions into a compact image representation. Restricting the description to only object like regions drastically reduces the number of image patches to be considered and automatically takes care of the scale. Owing to the robust object representations learned by the CNNs, our aggregated image representations exhibit improved invariance to general image transformations such as translation, rotation and scaling. The proposed representation could discriminate images even under extreme dimensionality reduction, including binarization. Strong supervision: In a typical supervised learning setting for object recognition, labels offer only weak supervision. All that a label provides is presence or absence of an object in an image. It neglects a lot of useful information about the actual object, such as, attributes, context, etc. Image captions on the other hand, provide rich information about image contents. Therefore, in order to enhance the reprsentations, we exploit the image captions as strong supervision for the application of object retrieval. We show that strong supervision when served with pairwise constraints, can help the representations to better learn the graded (non-binary) relevances between pairs of images. Visualization: Despite their impressive performance, CNNs offer limited transparency, therefore, are treated as black boxes. Increasing depth, intricate architectures and sophisticated regularizers make them complex machine learning models. One way to make them transparent is to provide visual explanations for their predictions, i.e., visualizing the image regions that guide their predictions and thereby making them explainable. In the second part of the thesis, we develop a novel visualization method to locate the evidence in the input for a given activation at any layer in the architecture. Unlike most existing methods that rely on gradient computation, we directly exploit the dependencies across the learned representations to make the CNNs more interactive. Our method enables various applications such as visualizing the evidence for a given activation (e.g. predicted label), grounding a predicted caption, object detection, etc. in a weakly-supervised setup. Robustness: Along with successful adaption across various vision tasks, the learned representations are also observed to be unstable to addition of special noise of small magnitude, called adversarial perturbations. Thus, the third and final part of the thesis focuses on the stability of the representations to this additive perturbations. Generalizable data-free objectives: These additive perturbations make the CNNs susceptible to produce inaccurate predictions with high confidence and threaten their deployability in the real world. In order to craft these perturbations (either image specific or agnostic), existing methods solve complex fooling objectives that require samples from target data distribution. Also, the existing methods to craft image-agnostic perturbations are task specific, i.e, the objectives are designed to suit the underlying task. For the first time, we introduce generalizable and data-free objectives to craft imageagnostic adversarial perturbations. Our objective generalizes across multiple vision tasks such as object recognition, semantic segmentation, depth estimation and can efficiently craft perturbations that can effectively fool. Our objective exposes the fragility of the learned representations even in the black-box attacking scenario, where no information about the target model is known. In spite of being data-free, our objectives can exploit the minimal available prior information about the training distribution such as the dynamic range of the images in order to craft stronger attacks. Modeling the adversaries: Most existing methods present optimization approaches to craft adversarial perturbations. Also, for a given classifier, they generate one perturbation at a time, which is a single instance from a possibly big manifold of adversarial perturbations. Further, in order to build robust models, it is essential to explore the manifold of adversarial perturbations. We propose for the first time, a generative approach to model the distribution of such perturbations in both “data dependent” and “data-free” scenarios. Our generative model is inspired from Generative Adversarial Networks (GAN) and is trained using fooling and diversity objectives. The proposed generator network captures the distribution of adversarial perturbations for a given classifier and readily generates a wide variety of such perturbations. We demonstrate that perturbations crafted by our model (i) achieve state-of-the-art fooling rates, (ii) exhibit wide variety and (iii) deliver excellent cross model generalizability. Our work can be deemed as an important step in the process of inferring about the complex manifolds of adversarial perturbations. This knowledge of adversaries can be exploited to learn better representations that are robust to various attacks.

APA, Harvard, Vancouver, ISO, and other styles

47

"Robust Object Detection under Varying Illuminations and Distortions." Doctoral diss., 2020. http://hdl.handle.net/2286/R.I.57367.

Full text

Abstract:

abstract: Object detection is an interesting computer vision area that is concerned with the detection of object instances belonging to specific classes of interest as well as the localization of these instances in images and/or videos. Object detection serves as a vital module in many computer vision based applications. This work focuses on the development of object detection methods that exhibit increased robustness to varying illuminations and image quality. In this work, two methods for robust object detection are presented. In the context of varying illumination, this work focuses on robust generic obstacle detection and collision warning in Advanced Driver Assistance Systems (ADAS) under varying illumination conditions. The highlight of the first method is the ability to detect all obstacles without prior knowledge and detect partially occluded obstacles including the obstacles that have not completely appeared in the frame (truncated obstacles). It is first shown that the angular distortion in the Inverse Perspective Mapping (IPM) domain belonging to obstacle edges varies as a function of their corresponding 2D location in the camera plane. This information is used to generate object proposals. A novel proposal assessment method based on fusing statistical properties from both the IPM image and the camera image to perform robust outlier elimination and false positive reduction is also proposed. In the context of image quality, this work focuses on robust multiple-class object detection using deep neural networks for images with varying quality. The use of Generative Adversarial Networks (GANs) is proposed in a novel generative framework to generate features that provide robustness for object detection on reduced quality images. The proposed GAN-based Detection of Objects (GAN-DO) framework is not restricted to any particular architecture and can be generalized to several deep neural network (DNN) based architectures. The resulting deep neural network maintains the exact architecture as the selected baseline model without adding to the model parameter complexity or inference speed. Performance results provided using GAN-DO on object detection datasets establish an improved robustness to varying image quality and a higher object detection and classification accuracy compared to the existing approaches.<br>Dissertation/Thesis<br>Doctoral Dissertation Electrical Engineering 2020

APA, Harvard, Vancouver, ISO, and other styles

48

Del, Chiaro Riccardo. "Anthropomorphous Visual Recognition: Learning with Weak Supervision, with Scarce Data, and Incrementally over Transient Tasks." Doctoral thesis, 2021. http://hdl.handle.net/2158/1238101.

Full text

Abstract:

In the last eight years the computer vision field has experienced dramatic improvements thanks to the widespread availability of data and affordable parallel computing hardware like GPUs. These two factors have contributed to making possible the training of very deep neural network models in reasonable times using millions of labeled examples for supervision. Humans do not learn concepts in this way. We do not need a massive number of labeled examples to learn new concepts; instead we rely on a few (or even zero) examples, infer missing information, and generalize. Moreover, we retain previously learned concepts without the need to re-train. We can easily ride a bicycle after years of not doing so, or recognize an elephant even though we may not have seen one recently. These characteristics of human learning, in fact, stand in stark contrast to how deep models learn: they require massive amounts of labeled data for training due to overparameterization, they have limited generalization capabilities, and they easily forget previously learned tasks or concepts when trained on new ones. These characteristics limit the applicability of deep learning in some scenarios in which these problems are more evident. In this thesis we study some of these and propose strategies to overcome some of the negative aspect of deep neural network training. We still use the gradient-based learning paradigm, but we adapt it to address some of these differences between human learning and learning in deep networks. Our goal is to achieve better learning characteristics and improve performance in some specific applications. We first study the artwork instance recognition problem, for which it is very difficult to collect large collections of labeled images. Our proposed approach relies on web search engines to collect examples, which results in the two related problems of domain shift due to biases in search engines and noisy supervision. We propose several strategies to mitigate these problems. To better mimic the ability of humans to learn from compact semantic description of tasks, we then propose a zero-shot learning strategy to recognize never-seen artworks, instead relying solely on textual descriptions of the target artworks. Then we look at the problem of learning from scarce data for the no-reference image quality assessment (NR-IQA) problem. IQA is an application for which data is notoriously scarce due to the elevated cost for annotation. Humans have an innate ability to inductively generalize from a limited number of examples, and to better mimic this we propose a generative model able to generate controlled perturbations of the input image, with the goal of synthetically increase the number of training instances used to train the network to estimate input image quality. Finally, we focus on the problem of catastrophic forgetting in recurrent neural networks, using image captioning as problem domain. We propose two strategies for defining continual image captioning experimental protocols and develop a continual learning framework for image captioning models based on encoder-decoder architectures. A task is defined by a set of object categories that appears in the images that we want the model to be able to describe. We observe that catastrophic forgetting is even more pronounced in this setting and establish several baselines by adapting existing state-of-the-art techniques to our continual image captioning problem. Then, to mimic the human ability to retain and leverage past knowledge when acquiring new tasks, we propose to use a mask-based technique that allocates specific neurons to each task only during backpropagation. This way, novel tasks do not interfere with the previous ones and forgetting is avoided. At the same time, past knowledge is exploited thanks to the ability of the network to use neurons allocated to previous tasks during the forward pass, which in turn reduces the number of neurons needed to learn each new task.

APA, Harvard, Vancouver, ISO, and other styles

49

Almahairi, Amjad. "Advances in deep learning with limited supervision and computational resources." Thèse, 2018. http://hdl.handle.net/1866/23434.

Full text

Abstract:

Les réseaux de neurones profonds sont la pierre angulaire des systèmes à la fine pointe de la technologie pour une vaste gamme de tâches, comme la reconnaissance d'objets, la modélisation du langage et la traduction automatique. Mis à part le progrès important établi dans les architectures et les procédures de formation des réseaux de neurones profonds, deux facteurs ont été la clé du succès remarquable de l'apprentissage profond : la disponibilité de grandes quantités de données étiquetées et la puissance de calcul massive. Cette thèse par articles apporte plusieurs contributions à l'avancement de l'apprentissage profond, en particulier dans les problèmes avec très peu ou pas de données étiquetées, ou avec des ressources informatiques limitées. Le premier article aborde la question de la rareté des données dans les systèmes de recommandation, en apprenant les représentations distribuées des produits à partir des commentaires d'évaluation de produits en langage naturel. Plus précisément, nous proposons un cadre d'apprentissage multitâches dans lequel nous utilisons des méthodes basées sur les réseaux de neurones pour apprendre les représentations de produits à partir de textes de critiques de produits et de données d'évaluation. Nous démontrons que la méthode proposée peut améliorer la généralisation dans les systèmes de recommandation et atteindre une performance de pointe sur l'ensemble de données Amazon Reviews. Le deuxième article s'attaque aux défis computationnels qui existent dans l'entraînement des réseaux de neurones profonds à grande échelle. Nous proposons une nouvelle architecture de réseaux de neurones conditionnels permettant d'attribuer la capacité du réseau de façon adaptative, et donc des calculs, dans les différentes régions des entrées. Nous démontrons l'efficacité de notre modèle sur les tâches de reconnaissance visuelle où les objets d'intérêt sont localisés à la couche d'entrée, tout en maintenant une surcharge de calcul beaucoup plus faible que les architectures standards des réseaux de neurones. Le troisième article contribue au domaine de l'apprentissage non supervisé, avec l'aide du paradigme des réseaux antagoniste génératifs. Nous introduisons un cadre fléxible pour l'entraînement des réseaux antagonistes génératifs, qui non seulement assure que le générateur estime la véritable distribution des données, mais permet également au discriminateur de conserver l'information sur la densité des données à l'optimum global. Nous validons notre cadre empiriquement en montrant que le discriminateur est capable de récupérer l'énergie de la distribution des données et d'obtenir une qualité d'échantillons à la fine pointe de la technologie. Enfin, dans le quatrième article, nous nous attaquons au problème de l'apprentissage non supervisé à travers différents domaines. Nous proposons un modèle qui permet d'apprendre des transformations plusieurs à plusieurs à travers deux domaines, et ce, à partir des données non appariées. Nous validons notre approche sur plusieurs ensembles de données se rapportant à l'imagerie, et nous montrons que notre méthode peut être appliquée efficacement dans des situations d'apprentissage semi-supervisé.<br>Deep neural networks are the cornerstone of state-of-the-art systems for a wide range of tasks, including object recognition, language modelling and machine translation. In the last decade, research in the field of deep learning has led to numerous key advances in designing novel architectures and training algorithms for neural networks. However, most success stories in deep learning heavily relied on two main factors: the availability of large amounts of labelled data and massive computational resources. This thesis by articles makes several contributions to advancing deep learning, specifically in problems with limited or no labelled data, or with constrained computational resources. The first article addresses sparsity of labelled data that emerges in the application field of recommender systems. We propose a multi-task learning framework that leverages natural language reviews in improving recommendation. Specifically, we apply neural-network-based methods for learning representations of products from review text, while learning from rating data. We demonstrate that the proposed method can achieve state-of-the-art performance on the Amazon Reviews dataset. The second article tackles computational challenges in training large-scale deep neural networks. We propose a conditional computation network architecture which can adaptively assign its capacity, and hence computations, across different regions of the input. We demonstrate the effectiveness of our model on visual recognition tasks where objects are spatially localized within the input, while maintaining much lower computational overhead than standard network architectures. The third article contributes to the domain of unsupervised learning with the generative adversarial networks paradigm. We introduce a flexible adversarial training framework, in which not only the generator converges to the true data distribution, but also the discriminator recovers the relative density of the data at the optimum. We validate our framework empirically by showing that the discriminator is able to accurately estimate the true energy of data while obtaining state-of-the-art quality of samples. Finally, in the fourth article, we address the problem of unsupervised domain translation. We propose a model which can learn flexible, many-to-many mappings across domains from unpaired data. We validate our approach on several image datasets, and we show that it can be effectively applied in semi-supervised learning settings.

APA, Harvard, Vancouver, ISO, and other styles

50

Sarvadevabhatla, Ravi Kiran. "Deep Learning for Hand-drawn Sketches: Analysis, Synthesis and Cognitive Process Models." Thesis, 2018. https://etd.iisc.ac.in/handle/2005/5351.

Full text

Abstract:

Deep Learning-based object category understanding is an important and active area of research in Computer Vision. Most work in this area has predominantly focused on the portion of depiction spectrum consisting of photographic images. However, depictions at the other end of the spectrum, freehand sketches, are a fascinating visual representation and worthy of study in themselves. In this thesis, we present deep-learning approaches for sketch analysis, sketch synthesis and modelling sketch-driven cognitive processes. On the analysis front, we first focus on the problem of recognizing hand-drawn line sketches of objects. We propose a deep Recurrent Neural Network architecture with a novel loss formulation for sketch object recognition. Our approach achieves state-of-the-art results on a large-scale sketch dataset. We also show that the inherently online nature of our framework is especially suitable for on-the- fly recognition of objects as they are being drawn. We then move beyond object-level label prediction to the relatively harder problem of parsing sketched objects, i.e. given a freehand object sketch, determine its salient attributes (e.g. category, semantic parts, pose). To this end, we propose SketchParse, the first deep-network architecture for fully automatic parsing of freehand object sketches. We subsequently demonstrate SketchParse's abilities (i) on two challenging large-scale sketch datasets (ii) in parsing unseen, semantically related object categories (iii) in improving fine-grained sketch-based image retrieval. As a novel application, we also illustrate how SketchParse's output can be used to generate caption-style descriptions for hand-drawn sketches. On the synthesis front, we design generative models for sketches via Generative Adversarial Networks (GANs). Keeping the limited size of sketch datasets in mind, we propose DeLi- GAN, a novel architecture for diverse and limited training data scenarios. In our approach, we reparameterize the latent generative space as a mixture model and learn the mixture model's parameters along with those of GAN. This seemingly simple modification to the vanilla GAN framework is surprisingly e ective and results in models which enable diversity in generated samples although trained with limited data. We show that DeLiGAN generates diverse samples not just for hand-drawn sketches but for other image modalities as well. To quantitatively characterize intra-class diversity of generated samples, we also introduce a modi ed version of \inception-score", a measure which has been found to correlate well with human assessment of generated samples. We subsequently present an approach for synthesizing minimally discriminative sketch-based object representations which we term category-epitomes. The synthesis procedure concurrently provides a natural measure for quantifying the sparseness underlying the original sketch, which we term epitome-score. We show that the category-level distribution of epitome-scores can be used to characterize level of detail required in general for recognizing object categories. On the cognitive process modelling front, we analyze the results of a free-viewing eye fixation study conducted on freehand sketches. The analysis reveals that eye relaxation sequences exhibit marked consistency within a sketch, across sketches of a category and even across suitably grouped sets of categories. This multi-level consistency is remarkable given the variability in depiction and extreme image content sparsity that characterizes hand-drawn object sketches. We show that the multi-level consistency in the fixation data can be exploited to predict a sketch's category given only its fixation sequence and to build a computational model which predicts part-labels underlying the eye fixations on objects. The ability of machine-based agents to play games in human-like fashion is considered a benchmark of progress in AI. Motivated by this observation, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, an elementary version of Visual Question Answering task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data and gathering open-ended guess-words from human guessers. To mimic humans playing Pictionary, we propose a deep neural model which generates guess-words in response to temporally evolving human-drawn sketches. The model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate the model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. The promising experimental results demonstrate the challenges and opportunities in building computational models for Pictionary and similarly themed games.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!